CN114253950A - Method and device for managing database - Google Patents

Method and device for managing database Download PDF

Info

Publication number
CN114253950A
CN114253950A CN202210185977.XA CN202210185977A CN114253950A CN 114253950 A CN114253950 A CN 114253950A CN 202210185977 A CN202210185977 A CN 202210185977A CN 114253950 A CN114253950 A CN 114253950A
Authority
CN
China
Prior art keywords
primary key
data
sstable
database
key range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210185977.XA
Other languages
Chinese (zh)
Other versions
CN114253950B (en
Inventor
曹晖
杨涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Oceanbase Technology Co Ltd
Original Assignee
Beijing Oceanbase Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Oceanbase Technology Co Ltd filed Critical Beijing Oceanbase Technology Co Ltd
Priority to CN202210185977.XA priority Critical patent/CN114253950B/en
Publication of CN114253950A publication Critical patent/CN114253950A/en
Application granted granted Critical
Publication of CN114253950B publication Critical patent/CN114253950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system

Abstract

A method and apparatus for managing a database is provided. The database includes an incrementally sorted string table SSTable and a baseline SSTable, the method comprising: scanning target data in the database to obtain a first primary key range where a primary key of the target data is located, wherein the target data comprises part or all of data in the increment SSTable; splitting a merging task of the data in the first main key range into one or more subtasks according to the first main key range; and merging the data within the range of the first primary key according to the one or more subtasks.

Description

Method and device for managing database
Technical Field
The present disclosure relates to the field of databases, and more particularly, to a method and apparatus for managing a database.
Background
In a database based on a sorted string table (SSTable, which may be abbreviated as SST), merging of data is a heavy operation and takes a long time. In order to shorten the merging time, the conventional art uniformly splits a merging task into a plurality of subtasks at a time based on information of the baseline SSTable, thereby processing the plurality of subtasks in parallel. However, in many scenarios, the task volumes of the multiple subtasks obtained based on the information of the baseline SSTable are not balanced, and the task of one or some subtasks is heavy, so that the execution time of the merged task is still long.
Disclosure of Invention
In view of the above problems, the present disclosure provides a method and apparatus for managing a database.
In a first aspect, there is provided a method of managing a database, the database including an incremental SSTable and a baseline SSTable, the method comprising: scanning target data in the database to obtain a first primary key range where a primary key of the target data is located, wherein the target data comprises part or all of data in the increment SSTable; splitting a merging task of the data in the first main key range into one or more subtasks according to the first main key range; and merging the data within the range of the first primary key according to the one or more subtasks.
Optionally, as a possible implementation manner, the target data is data in a second primary key range, the starting primary key of the second primary key range is the ending primary key of the baseline SSTable, and the ending primary key of the second primary key range is a primary key maximum value.
Optionally, as a possible implementation manner, the database further includes data in a third primary key range, a starting primary key of the third primary key range is a primary key minimum, and an ending primary key of the third primary key range is an ending primary key of the baseline SSTable, and the method further includes: and splitting the merging task of the data in the third primary key range into one or more subtasks according to the third primary key range so as to merge the data in the third primary key range.
Optionally, as a possible implementation manner, the target data is incremental data in the incremental SSTable, and the first primary key range is a primary key range covered by the incremental data.
Optionally, as a possible implementation manner, the primary keys of the remaining data except the incremental data in the incremental SSTable are located within a range of a fourth primary key, and the method further includes: reusing data in the baseline SSTable that is within the fourth primary key range.
Optionally, as a possible implementation, the target data is all of the incremental SSTable and the baseline SSTable.
Optionally, as a possible implementation manner, the database is a distributed database, the database includes a plurality of baseline SSTable copies, the plurality of baseline SSTable copies are distributed on a plurality of database nodes, and merging of different database nodes in the plurality of database nodes is performed independently.
In a second aspect, there is provided an apparatus for managing a database, the database including an incremental SSTable and a baseline SSTable, the apparatus comprising: the scanning module is used for scanning target data in the database to obtain a first primary key range where a primary key of the target data is located, wherein the target data comprises part or all of data in the increment SSTable; the splitting module is used for splitting the merging task of the data in the first main key range into one or more subtasks according to the first main key range; and the merging module is used for merging the data in the range of the first primary key according to the one or more subtasks.
Optionally, as a possible implementation manner, the target data is data in a second primary key range, the starting primary key of the second primary key range is the ending primary key of the baseline SSTable, and the ending primary key of the second primary key range is a primary key maximum value.
Optionally, as a possible implementation manner, the database further includes data in a third primary key range, a starting primary key of the third primary key range is a primary key minimum value, an ending primary key of the third primary key range is an ending primary key of the baseline SSTable, and the splitting module is further configured to split the merging task of the data in the third primary key range into one or more subtasks according to the third primary key range, so as to merge the data in the third primary key range.
Optionally, as a possible implementation manner, the target data is incremental data in the incremental SSTable, and the first primary key range is a primary key range covered by the incremental data.
Optionally, as a possible implementation manner, the primary keys of the remaining data except the incremental data in the incremental SSTable are located within a range of a fourth primary key, and the apparatus further includes: a reuse module to reuse data in the baseline SSTable that is within the fourth primary key range.
Optionally, as a possible implementation, the target data is all of the incremental SSTable and the baseline SSTable.
In a third aspect, an apparatus for managing a database is provided, the apparatus comprising: a memory for storing execution instructions; a processor configured to execute the execution instructions stored in the memory to perform the method according to the first aspect or any one of the possible implementation manners of the first aspect.
Optionally, as a possible implementation manner, the database is a distributed database, the database includes a plurality of baseline SSTable copies, the plurality of baseline SSTable copies are distributed on a plurality of database nodes, and merging of different database nodes in the plurality of database nodes is performed independently.
In a fourth aspect, a computer-readable storage medium is provided, on which instructions for performing the method of the first aspect or any one of the possible implementations of the first aspect are stored.
In a fifth aspect, a computer program product is provided, which comprises instructions for performing the method of the first aspect or any one of the possible implementations of the first aspect.
The embodiment of the disclosure provides a data scanning-based partitioning mode, which scans part or all of data in increment SSTable and splits a merging task based on a scanned primary key range. Unlike conventional techniques, embodiments of the present disclosure consider the primary key range of data in incremental SSTable, and after considering this factor, the split manner of the subtasks is more reasonable.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present disclosure, the drawings required to be used in the embodiments or the background art of the present disclosure will be described below.
FIG. 1 is an exemplary diagram of a process for dumping MemTable and merging incremental SSTable with baseline SSTable.
Fig. 2 is an exemplary diagram of parallel processing of a merged task using multiple sub-tasks.
Fig. 3 is an exemplary diagram of a subtask splitting manner of the conventional art.
Fig. 4 is a diagram of an example implementation process of the subtask splitting manner in the import data scenario shown in fig. 3.
Fig. 5 is a flowchart illustrating a method for managing a database according to an embodiment of the disclosure.
Fig. 6 is an exemplary diagram of a specific implementation manner of the second embodiment.
Fig. 7 is an exemplary diagram of a specific implementation manner of the third embodiment.
Fig. 8 is a schematic structural diagram of an apparatus for managing a database according to an embodiment of the present disclosure.
Fig. 9 is a schematic structural diagram of an apparatus for managing a database according to another embodiment of the present disclosure.
Detailed Description
The embodiments of the present disclosure are described below with reference to the drawings in the embodiments of the present disclosure. In the following description, reference is made to the accompanying drawings which form a part hereof and in which is shown by way of illustration specific aspects of embodiments of the disclosure or in which aspects of embodiments of the disclosure may be practiced. It should be understood that the disclosed embodiments may be used in other respects, and may include structural or logical changes not depicted in the drawings. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims. For example, it should be understood that the disclosure in connection with the described methods may equally apply to the corresponding apparatus or system for performing the methods, and vice versa. For example, if one or more particular method steps are described, the corresponding apparatus may comprise one or more units, such as functional units, to perform the described one or more method steps (e.g., a unit performs one or more steps, or multiple units, each of which performs one or more of the multiple steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a particular apparatus is described based on one or more units, such as functional units, the corresponding method may comprise one step to perform the functionality of the one or more units (e.g., one step performs the functionality of the one or more units, or multiple steps, each of which performs the functionality of one or more of the plurality of units), even if such one or more steps are not explicitly described or illustrated in the figures. Further, it is to be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless explicitly stated otherwise.
The big data era generates massive and diversified information assets, and higher requirements are put forward on the data storage and data management capacity. The data may be records used in computers to characterize things, such as text, graphics, images, sounds, etc. The data can have various expression forms, and can be stored in a computer after being digitalized. A database may be a collection of data stored on a computer storage device that may hold data that is the object and result of database system operations.
Database systems may include databases, database management systems, application systems, database administrators and users, and the like. A database management system may be the core of an overall database system, which may be data management software that helps users build, use, and manage databases. The database management system can also be used to maintain the database and ensure the security and integrity of the data. The user may be the ultimate use entity of the database, which in some embodiments may be used by the user through a user interface of the application system.
SSTable is a storage structure that possesses persistent, ordered, and immutable key-value pairs (KV). The keys and/or values in SST may each be an array of arbitrary bytes. SSTable may provide a function to look up data by pressing a designated key. SSTable may also provide the ability to iteratively traverse data corresponding to keys within a specified range. SSTable is capable of efficiently storing a large number of key-value pairs while enabling high throughput of sequential read/write operations. Therefore, many mainstream databases currently implement data storage based on SSTable. Such a database may be referred to as an incremental + inventory storage engine based database.
In an incremental + inventory storage engine based database, one key-value pair may form a row of the database. The data in the database can be stored in order according to keys. Data in a database is roughly divided into two parts, MemTable and SSTable. MemTable is stored in memory and SSTable is stored on disk. SSTable is typically composed of several fixed-length blocks of data. Incremental data is first written into MemTable, and when the size of MemTable exceeds a certain threshold, the data in MemTable can be transferred to incremental SSTable to release the memory, which is called dump. When the number of incremental SStables exceeds a certain threshold, the database will combine the baseline SSTable with the incremental SStables to form a new baseline SSTable, a process referred to as merging.
For ease of understanding, the merging process of baseline SSTable and incremental SSTable is illustrated below using figure 1 as an example.
Referring to fig. 1, data above the dotted line 10 is data in the memory 11, and data below the dotted line 10 is data in the disk 12. As can be seen from the indication of the indication line 13 in the vertical direction, the closer the data is to the arrow 131 above the indication line 13, the newer the data is; the closer the data is to the arrow 132 below the indicator line 13, the older the data. As can be seen from the indication of the indication line 14 in the horizontal direction, the closer the data is to the left arrow 141 of the indication line 14, the smaller the primary key (rowkey) representing the data; the closer the data is to the right arrow 142 of the indicator line 14, the larger the primary key (rowkey) representing the data.
As can be seen from fig. 1, MemTable is stored in the memory 11 and SSTable is stored on the disk 12. When the size of MemTable exceeds a certain threshold, the data in MemTable is dumped into incremental SSTable to release the memory. In the example of FIG. 1, a total of 3 dumps occur. The 3 dumps sequentially generate increment SSTable 1, increment SSTable2 and increment SSTable 3 according to time.
Also stored within disk 12 is a baseline SSTable 1. This baseline SSTable 1 may be understood as SSTable that results after the database last performed a merge operation. Assuming that the number of current increments SSTable exceeds a threshold, a new round of merge operations may be performed. That is, incremental SSTable 1, incremental SSTable2, incremental SSTable 3 are merged with baseline SSTable 1 to obtain a new baseline SSTable, i.e., baseline SSTable2 in FIG. 1.
For databases based on delta + inventory storage engines, consolidation is an important operation. The merging process is described in more detail below.
Merging is usually a relatively heavy operation (i.e. the merging process usually requires more operations to be performed and consumes a lot of resources), and therefore the merging time is relatively long. In order to shorten the merging time as a whole, the multithreading capability of the machine can be fully utilized, and one merging task is split into a plurality of independent subtasks. The multiple subtasks can be executed in parallel, so that the aim of shortening the merging time as a whole is fulfilled.
In order to split a merging task into multiple independent subtasks, a merging operation is required that determines which data in the database each subtask handles. By utilizing the characteristic that data are stored in an SSTable in order according to the main keys and the characteristic that the SSTable is composed of a plurality of fixed-length data blocks, the aim of splitting the molecular task can be achieved by dividing the range of the main keys. That is, each subtask may be responsible for merging data within a certain primary key range, the primary key ranges between different subtasks do not overlap, and the set of primary key ranges for all subtasks is (MIN, MAX), where MIN represents the primary key minimum and MAX represents the primary key maximum. The merging result of one subtask is a group of data blocks formed by all data in the main key range corresponding to the subtask, and the data blocks merged by all subtasks are combined to obtain a new baseline SSTable containing all data.
FIG. 2 depicts how the 3 incremental SStables shown in FIG. 1 are merged to baseline SSTable 1 using a number of subtasks, thereby forming baseline SSTable 2. As shown in fig. 2, the entire merged task may be split into 4 subtasks, namely subtask 1, subtask 2, subtask 3, and subtask 4 in fig. 2. Each subtask may be responsible for merging data within a certain primary key range, and a new baseline SSTable (i.e., baseline SSTable 2) may be combined from the merged results of the 4 subtasks.
If it is desired to split a merged task into one or more sub-tasks, a stable way of partitioning (i.e. to partition which primary key range data each sub-task handles) is crucial. A stable partitioning is especially critical in a distributed database that supports multiple copies (e.g., OceanBase database). This is because, in such databases, all baseline SSTable copies must be identical on each database node, and the merging operation of baseline SSTable on different nodes is performed separately, which is a distributed and multi-copy database, and thus, has higher requirements on the operational stability and the result consistency of various database nodes.
To achieve the division of the subtasks, an intuitive solution is to divide the subtasks using the information provided by the baseline SSTable. The related art provides a way of dividing the range of the primary key uniformly according to the number of data blocks in the baseline SSTable. This division has one obvious drawback: when the baseline SSTable is too small to cover most of the data that needs to be merged, the partition bounds can be skewed much, resulting in some or some of the subtasks needing to process a much larger amount of data than others, resulting in a longer overall execution time for the merged task.
FIG. 3 gives one example of evenly dividing the range of primary keys based on the number of data blocks in baseline SSTable. As shown in fig. 3, a merged task can be uniformly split into 3 subtasks, i.e., subtask 1, subtask 2, and subtask 3, according to the data block in baseline SSTable 1. However, the primary key range of incremental SSTable is actually far beyond the primary key range of baseline SSTable 1, which results in the merging of data beyond the primary key range of baseline SSTable 1 being performed by subtask 3, resulting in subtask 3 requiring significantly more data to be merged than subtasks 1 and 2. In this way, subtask 3 becomes the bottleneck for the entire merged task.
The above phenomenon is particularly significant in an imported data scene or an additionally inserted scene of data.
Fig. 4 shows an example of an import data scenario. In fig. 4, baseline SSTable 1 is empty. When a merging task needs to be executed, the empty baseline SSTable 1 can only divide one subtask, and the range of the primary key corresponding to the subtask is (MIN, MAX), that is, the subtask needs to merge all data, so that the merging task is changed into a serial task again, and the time consumption of the whole merging process is long.
Consider the following: suppose a bank has 20 hundred million lines of new data to insert into the database for consolidation. However, since the baseline SSTable is empty before merging, only one sub-task can be divided according to the division manner of data based on the data blocks in the baseline SSTable, which may make the entire merging process last for a very long time, for example, for more than 10 hours.
In view of the above problems, the following describes an embodiment of the present disclosure in detail with reference to fig. 5. The method 500 of fig. 5 may be applied to a database. The database may be the aforementioned delta + inventory storage engine based database. The database may be, for example, a distributed database. The database may include an incremental SSTable and a baseline SSTable. The database needs to merge the incremental SSTable and the baseline SSTable when certain conditions are met. The embodiment of the present disclosure does not specifically limit the trigger condition of the merged task. For example, the trigger condition may be that the number of increments SSTable in the database reaches a preset threshold. As another example, the trigger condition may be that a user of the database actively initiates a manual merge task. As one example, the database is a distributed database with multiple copies (e.g., an OceanBase database). For example, the database may include multiple baseline SSTable copies that are distributed among different database nodes of the database. The baseline SSTable copies may be identical on each database node. The merge operations between the plurality of database nodes are performed independently of one another. As mentioned above, how to obtain a stable partition of the primary key range in such a database is crucial.
Referring to fig. 5, in step S510, target data in the database is scanned to obtain a first primary key range where a primary key of the target data is located. The target data may include some or all of the data in the incremental SSTable.
In step S520, the merged task of the data in the first primary key range is split into one or more sub-tasks according to the first primary key range.
In step S530, data within the first primary key range is merged according to one or more subtasks.
For example, the first primary key range may be split evenly into multiple sub-ranges. The sub-ranges correspond to the sub-tasks one by one, and each sub-task is used for merging the data in the corresponding sub-range.
Unlike the aforementioned way of partitioning the primary key range based on baseline SSTable, the embodiments of the present disclosure provide a way of partitioning based on data scanning that takes into account the primary key range of some or all of the data in incremental SSTable, and the result of the partitioning is more reasonable.
There are several implementations of the method shown in fig. 5, and three examples are given below. It is to be understood that the described embodiments are merely a subset of the disclosed embodiments and not all embodiments.
Example one
In one embodiment, the target data being scanned is all of the incremental SSTable and the baseline SSTable. In other words, the data scanning method provided by the first embodiment is a method for scanning the full amount of data in the database. In the first embodiment, the first primary key range mentioned above is the primary key range of the full amount of data.
The data scanning mode provided by the first embodiment can easily divide a uniform and stable main key range, and is not influenced by scenes. The full data scanning mode provided by the first embodiment is applied to the distributed database, the data scanning results of each database node are consistent, the division mode is fixed, and the operation is simple.
Example two
In the second embodiment, the scanned target data is data within the range of the second primary key. Therefore, in the second embodiment, the aforementioned first primary key range is the primary key range formed by the primary keys of all data in the second primary key range. The initiating primary key of the second primary key range may be the terminating primary key of baseline SSTable (endkey, which may be derived from the meta-information of baseline SSTable). The terminating primary key of the second primary key range may be the primary key maximum (i.e., the maximum of all possible values of the primary key). Taking MAX as an example of the maximum value of the primary key, in the second embodiment, the target data may be data in which the primary key in the database falls within (end, MAX). It can be seen that the second embodiment does not scan the full amount of data, but scans a part of the data in the database. The scanning method has the advantages that the scanning result is fixed, so that the division result of the main key range is also fixed, and the stable main key range can be divided. The second embodiment has the advantage that the whole amount of data in the database does not need to be scanned, and the overhead of data scanning can be reduced to a certain extent.
In addition to the second primary key range described above, the data in the database includes data in a third primary key range. The starting primary key of the third primary key range may be the primary key minimum (i.e., the minimum of all possible values of the primary key). The terminating primary key of the third primary key range may be the terminating primary key of the baseline SSTable. Taking the example that the MIN represents the minimum value of the primary key, the data in the third primary key range may be the data in the (MIN, endkey) range of the primary key in the database.
In effect, the third primary key range is the primary key range of baseline SSTable. Therefore, for the merging manner of the data in the range of the third primary key, the aforementioned dividing manner "evenly divide the range of the primary key based on the number of data blocks in the baseline SSTable" may be adopted. After dividing the plurality of subtasks, the data within the third primary key range can be merged in parallel by using the plurality of subtasks.
Next, example two will be described by taking fig. 6 as an example. First, the terminating primary key of the baseline SSTable can be obtained from the meta information of the baseline SSTable. Then, data whose primary key falls within the second primary key range (endkey, MAX) is scanned, thereby obtaining the above-mentioned first primary key range. The result of the scanning by the scanning method is fixed, so that the division result of the first main key range is also fixed, and a stable main key range can be divided. As shown in fig. 6, after the first main key range is obtained by scanning, the first main key range may be uniformly divided to obtain a plurality of sub-ranges. Then, the merging task of the data within the first primary key range may be split into a plurality of subtasks, i.e., subtask 4, subtask 5, subtask 6, and subtask 7 in fig. 6, so that the data within the first primary key range may be merged in parallel.
For data whose primary keys fall within the third primary key range (MIN, endkey), stable sub-ranges may be evenly divided according to the number of data blocks in the baseline SSTable, and then corresponding sub-tasks may be set based on the number of the sub-ranges, as shown in fig. 6, the third primary key range is divided into 3 sub-ranges, and merging of data within the 3 sub-ranges is performed by the sub-task 1, the sub-task 2, and the sub-task 3, respectively, the two divided ranges are combined, so that the final stable range, i.e., the sub-task 1 to the sub-task 7 in fig. 6, may be obtained.
EXAMPLE III
In the third embodiment, the scanned target data is incremental data in incremental SSTable. Thus, the first primary key range mentioned above is the primary key range covered by the incremental data. That is, the third embodiment scans the incremental data within the incremental SSTable. The result of the incremental data scan is fixed, the division mode is also fixed, and therefore the divided range is also stable. The data scanning mode provided by the third embodiment is more accurate in data scanning (i.e., accurate positioning of incremental data in incremental SSTable), and thus can be unaffected by baseline SSTable.
The primary keys for the remaining data in the incremental SSTable other than the incremental data are located within a fourth primary key range (i.e., the primary key range not covered by the incremental data). In the third embodiment, the manner of merging the data (or data blocks) in the range of the fourth primary key is not particularly limited. As one example, the portion of data may be split into one or more subtasks, which are then merged in parallel. As another example, since this portion of data is not overwritten by incremental data in SSTable, it represents that the data in baseline SSTable is not modified within the range of the fourth primary key. Thus, data within the fourth primary key range in baseline SSTable may be reused. The computational overhead of data reuse is essentially negligible. Therefore, by adopting a data reuse mode, the resource overhead of the merging task can be reduced, and the merging time can be reduced.
Next, a third embodiment will be described by taking fig. 7 as an example. Referring to fig. 7, the primary key range to the left of the broken line 71 is the fourth primary key range, i.e., the primary key range not covered by the incremental data. In other words, there is no newly added data among the data in the range of the fourth primary key. Thus, for data within this fourth primary key range, the data in baseline SSTable may be reused. The overhead of data reuse is low and can be basically ignored. The primary key range in the right part of the dashed line 71 is the primary key range covered by the incremental data. The incremental data may be scanned to map the molecular tasks based on the scanned primary keys. In the example of fig. 7, the primary key range covered by the incremental data is divided into 4 sub-ranges, which respectively correspond to 4 sub-tasks (i.e., sub-task 1, sub-task 2, sub-task 3, and sub-task 4 in fig. 7), where each sub-task is responsible for merging work of data in the corresponding sub-range. In addition, subtask 1 may further complete data reuse within the fourth primary key range.
Method embodiments of the present disclosure are described in detail above in conjunction with fig. 1-7, and apparatus embodiments of the present disclosure are described in detail below in conjunction with fig. 8-9. It is to be understood that the description of the method embodiments corresponds to the description of the apparatus embodiments, and therefore reference may be made to the preceding method embodiments for parts not described in detail.
Fig. 8 is a schematic structural diagram of an apparatus for managing a database according to an embodiment of the present disclosure. The database may include an incremental SSTable and a baseline SSTable. The apparatus 800 for managing a database in fig. 8 may include a scanning module 810, a splitting module 820, and a merging module 830.
The scanning module 810 may be configured to scan target data in the database to obtain a first primary key range where a primary key of the target data is located. Wherein the target data comprises some or all of the data in the incremental SSTable.
The splitting module 820 may be configured to split the merged task of data in the first primary key range into one or more sub-tasks according to the first primary key range.
The merge module 830 may be configured to merge data within the first primary key range according to the one or more subtasks.
Optionally, the target data is data in a second primary key range, the starting primary key of the second primary key range is the ending primary key of the baseline SSTable, and the ending primary key of the second primary key range is the primary key maximum.
Optionally, the database further includes data in a third primary key range, a starting primary key of the third primary key range is a primary key minimum value, an ending primary key of the third primary key range is an ending primary key of the baseline SSTable, and the splitting module is further configured to split the merging task of the data in the third primary key range into one or more subtasks according to the third primary key range, so as to merge the data in the third primary key range.
Optionally, the target data is incremental data in the incremental SSTable, and the first primary key range is a primary key range covered by the incremental data.
Optionally, the primary keys of the remaining data in the incremental SSTable other than the incremental data are within a fourth primary key range, the apparatus further comprising: a reuse module to reuse data in the baseline SSTable that is within the fourth primary key range.
Optionally, the target data is all of the incremental SSTable and the baseline SSTable.
Optionally, the database is a distributed database, the database includes a plurality of baseline SSTable copies, the plurality of baseline SSTable copies are distributed on a plurality of database nodes, and the merging of different database nodes in the plurality of database nodes is performed independently.
Fig. 9 is a schematic structural diagram of an apparatus for managing a database according to another embodiment of the present disclosure. The management database apparatus 900 depicted in fig. 9 may include a memory 910 and a processor 920, where the memory 910 may be used to store instructions (or "execute instructions"). The processor 920 may be configured to execute instructions stored in the memory 810 to implement the steps of the various methods described above. In some embodiments, the apparatus 900 may further include a network interface 930, and data exchange between the processor 920 and an external device may be implemented through the network interface 930.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware or any other combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the disclosure are, in whole or in part, generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., Digital Video Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (15)

1. A method of managing a database, the database including an incrementally sorted string table SSTable and a baseline SSTable, the method comprising:
scanning target data in the database to obtain a first primary key range where a primary key of the target data is located, wherein the target data comprises part or all of data in the increment SSTable;
splitting a merging task of the data in the first main key range into one or more subtasks according to the first main key range;
and merging the data within the range of the first primary key according to the one or more subtasks.
2. The method of claim 1, the target data being data within a second primary key range, the originating primary key of the second primary key range being the terminating primary key of the baseline SSTable, the terminating primary key of the second primary key range being a primary key maximum.
3. The method of claim 2, the database further comprising data within a third primary key range, the starting primary key of the third primary key range being a primary key minimum, the ending primary key of the third primary key range being an ending primary key of the baseline SSTable,
the method further comprises the following steps:
and splitting the merging task of the data in the third primary key range into one or more subtasks according to the third primary key range so as to merge the data in the third primary key range.
4. The method of claim 1, the target data being incremental data in the incremental SSTable, the first primary key range being a primary key range covered by the incremental data.
5. The method of claim 4, the primary keys for the remaining data in the incremental SSTable other than the incremental data are located within a fourth primary key range,
the method further comprises the following steps:
reusing data in the baseline SSTable that is within the fourth primary key range.
6. The method of claim 1, the target data being all of the incremental SSTable and the baseline SSTable.
7. The method of claim 1, the database is a distributed database comprising a plurality of baseline SSTable copies distributed across a plurality of database nodes, the merging of different ones of the plurality of database nodes occurring independently.
8. An apparatus for managing a database, the database including an incrementally sorted string table SSTable and a baseline SSTable, the apparatus comprising:
the scanning module is used for scanning target data in the database to obtain a first primary key range where a primary key of the target data is located, wherein the target data comprises part or all of data in the increment SSTable;
the splitting module is used for splitting the merging task of the data in the first main key range into one or more subtasks according to the first main key range;
and the merging module is used for merging the data in the range of the first primary key according to the one or more subtasks.
9. The device of claim 8, the target data being data within a second primary key range, the originating primary key of the second primary key range being the terminating primary key of the baseline SSTable, the terminating primary key of the second primary key range being a primary key maximum.
10. The apparatus of claim 9, the database further comprising data within a third primary key range, the starting primary key of the third primary key range being a primary key minimum, the ending primary key of the third primary key range being an ending primary key of the baseline SSTable, the splitting module further configured to split the task of merging the data within the third primary key range into one or more subtasks according to the third primary key range to merge the data within the third primary key range.
11. The apparatus of claim 8, the target data being incremental data in the incremental SSTable, the first primary key range being a primary key range covered by the incremental data.
12. The device of claim 11, the primary keys for the remaining data in the incremental SSTable other than the incremental data are located within a fourth primary key range,
the device further comprises:
a reuse module to reuse data in the baseline SSTable that is within the fourth primary key range.
13. The device of claim 8, the target data being all of the incremental SSTable and the baseline SSTable.
14. The apparatus of claim 8, the database being a distributed database, the database comprising a plurality of baseline SSTable copies, the plurality of baseline SSTable copies being distributed across a plurality of database nodes, consolidation of different ones of the plurality of database nodes occurring independently.
15. An apparatus for managing a database, the apparatus comprising:
a memory for storing execution instructions;
a processor for executing the execution instructions stored in the memory to perform the method of any of claims 1-7.
CN202210185977.XA 2022-02-28 2022-02-28 Method and device for managing database Active CN114253950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210185977.XA CN114253950B (en) 2022-02-28 2022-02-28 Method and device for managing database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210185977.XA CN114253950B (en) 2022-02-28 2022-02-28 Method and device for managing database

Publications (2)

Publication Number Publication Date
CN114253950A true CN114253950A (en) 2022-03-29
CN114253950B CN114253950B (en) 2022-06-03

Family

ID=80800062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210185977.XA Active CN114253950B (en) 2022-02-28 2022-02-28 Method and device for managing database

Country Status (1)

Country Link
CN (1) CN114253950B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116643300A (en) * 2023-07-25 2023-08-25 齐鲁空天信息研究院 Satellite navigation data distributed real-time processing method and system based on map mapping

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032549A (en) * 2019-01-28 2019-07-19 阿里巴巴集团控股有限公司 Subregion splitting method, device, electronic equipment and readable storage medium storing program for executing
CN110059090A (en) * 2019-04-19 2019-07-26 阿里巴巴集团控股有限公司 A kind of write-in/dump/merging/the querying method and device of bitmap index
CN110457350A (en) * 2019-07-24 2019-11-15 阿里巴巴集团控股有限公司 For carrying out the method and device of aggregate query in inquiry database
US20210374157A1 (en) * 2020-05-29 2021-12-02 Nutanix, Inc. System and method for near-synchronous replication for object store

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032549A (en) * 2019-01-28 2019-07-19 阿里巴巴集团控股有限公司 Subregion splitting method, device, electronic equipment and readable storage medium storing program for executing
CN110059090A (en) * 2019-04-19 2019-07-26 阿里巴巴集团控股有限公司 A kind of write-in/dump/merging/the querying method and device of bitmap index
CN110457350A (en) * 2019-07-24 2019-11-15 阿里巴巴集团控股有限公司 For carrying out the method and device of aggregate query in inquiry database
US20210374157A1 (en) * 2020-05-29 2021-12-02 Nutanix, Inc. System and method for near-synchronous replication for object store

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
樊秋实 等: "基线与增量数据分离架构下的分布式连接算法", 《计算机学报》 *
衣舞晨风: "OceanBase 架构初探", 《CSDN博客-HTTPS://BLOG.CSDN.NET/JIANKUNKING/ARTICLE/DETAILS/84020030》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116643300A (en) * 2023-07-25 2023-08-25 齐鲁空天信息研究院 Satellite navigation data distributed real-time processing method and system based on map mapping
CN116643300B (en) * 2023-07-25 2023-10-10 齐鲁空天信息研究院 Satellite navigation data distributed real-time processing method and system based on map mapping

Also Published As

Publication number Publication date
CN114253950B (en) 2022-06-03

Similar Documents

Publication Publication Date Title
US11080260B2 (en) Concurrent reads and inserts into a data structure without latching or waiting by readers
US6651075B1 (en) Support for multiple temporal snapshots of same volume
EP2618279B1 (en) Cross-ACL multi-master replication
CN102629247B (en) Method, device and system for data processing
US20070239725A1 (en) Active cache offline access and management of project files
US10013312B2 (en) Method and system for a safe archiving of data
US9519653B2 (en) Techniques for efficiently enforcing resource quotas in a multi-tenant cloud storage system
US8214377B2 (en) Method, system, and program for managing groups of objects when there are different group types
US11375015B2 (en) Dynamic routing of file system objects
US20090300303A1 (en) Ranking and Prioritizing Point in Time Snapshots
EP3314469B1 (en) Cloud-native documents integrated with legacy tools
US20210182160A1 (en) System and method for generating file system and block-based incremental backups using enhanced dependencies and file system information of data blocks
KR20210058118A (en) Casedb: low-cost put-intensive key-value store for edge computing
CN114253950B (en) Method and device for managing database
EP3314459A1 (en) Multimodal sharing of content between documents
US11429311B1 (en) Method and system for managing requests in a distributed system
CN114968111A (en) Data deleting method, device, equipment and computer readable storage medium
US20210303412A1 (en) Backup housekeeping operations between database management systems and external storage
US10747438B1 (en) Reporting using archived data
US9305007B1 (en) Discovering relationships using deduplication metadata to provide a value-added service
CN115878625A (en) Data processing method and device and electronic equipment
US20220237176A1 (en) Method and system for managing changes of records on hosts
US10713226B1 (en) Managing data using archiving
CN114297196A (en) Metadata storage method and device, electronic equipment and storage medium
JP2013088920A (en) Computer system and data management method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant