CN114510539A

CN114510539A - Method for generating and applying consistency check point of distributed database

Info

Publication number: CN114510539A
Application number: CN202210401171.XA
Authority: CN
Inventors: 刘博�; 范振勇; 李东卫; 何振兴; 莫荻; 武新
Original assignee: Beijing Yijingjie Information Technology Co ltd
Current assignee: Beijing Yijingjie Information Technology Co ltd
Priority date: 2022-04-18
Filing date: 2022-04-18
Publication date: 2022-05-17
Anticipated expiration: 2042-04-18
Also published as: CN114510539B

Abstract

The invention discloses a method for generating and applying a consistency check point of a distributed database, belonging to a method for checking consistency of the distributed database, wherein the method comprises the steps of taking a main key in the database as a key word, fragmenting data in the database to obtain a plurality of data fragments; each data fragment writes a mark queue through a transaction maintained by the data fragment to assist in generating a consistency timestamp of the data fragment; when the transaction is submitted and the data is successfully written, the logic instruction is transmitted to the processor by the data fragment; a global timestamp is generated by the aggregator as a consistency checkpoint of the entire distributed database. By adopting a clock mechanism to generate the global timestamp as a consistency check point of the whole distributed database, the network transceiving bottleneck in the global transaction manager is avoided, the problem of performance consumption caused by data synchronization in a main/standby mode is solved, and the performance of the distributed database is improved.

Description

Method for generating and applying consistency check point of distributed database

Technical Field

The invention relates to a distributed database technology, in particular to a method for generating and applying a consistency check point of a distributed database.

Background

Consistency checkpointing is a very important technology in the field of databases. In a traditional database product, the main form of the database product is a global transaction ID, and the working mode is to completely brush data before a certain ID onto a disk so as to ensure the consistency of transactions. For a single-machine database, it is simple to generate a global transaction ID, it is only necessary to maintain a self-added serial number in the system, and when concurrent transactions are executed, the transaction ID is obtained from the single thread, so that no conflict is generated.

For distributed databases, the global transaction ID is typically provided by a global transaction manager. In a distributed database cluster, only one global transaction manager can provide services at the same time, and in order to avoid a single point of failure, the high availability of the global transaction manager needs to be realized in a master-slave mode. Other nodes or services in the database cluster are connected to the global transaction manager through network requests to obtain a global transaction ID, and the global transaction manager returns an ID for other services to use. Whether a stand-alone or clustered version of the database, the global transaction ID is used as a consistency checkpoint, which is then used to perform other functions such as snapshot, data recovery, etc. However, this method has inevitable disadvantages: the first is the network transceiving bottleneck. There are typically multiple data nodes in a distributed database, with only one global transaction manager. In a scenario with high concurrency, a plurality of data nodes send requests to the global transaction manager to acquire the global transaction ID at the same time, so that the network of the global transaction manager reaches a bottleneck, and at the moment, the data nodes often have residual network resources and computing resources, so that the overall performance of the database cannot be improved even if concurrency is increased. Second is the implementation of the global transaction manager itself. To avoid data loss due to a single point of failure, the global transaction manager needs to be deployed in a primary-standby manner. When the primary service assigns a transaction ID, it needs to synchronize this ID to the backup service, so that it can be guaranteed that the backup service can continue to provide consistent service in the event of a failure of the primary service. The process of synchronizing the transaction ID from the primary service to the standby service also occupies system resources, further limiting database performance. There is therefore a need for further research and improvement with respect to the generation and application of consistency checkpoints for distributed databases.

Disclosure of Invention

One of the objectives of the present invention is to provide a method for generating a consistency check point of a distributed database based on a clock mechanism, so as to solve the technical problem that in the prior art, when a consistency check point method based on a global transaction ID is used, a network transceiving bottleneck is likely to occur, and a global transaction manager is deployed in a master-slave manner to occupy too many system resources, thereby further limiting the performance of the database.

In order to solve the technical problems, the invention adopts the following technical scheme:

the invention provides a method for generating a consistency checkpoint of a distributed database.

And step A, taking a main key in the distributed database as a key word, and fragmenting storage units in the distributed database to obtain a plurality of data fragments.

And step B, each data fragment assists to generate a consistency time stamp of the data fragment through a transaction write marking queue maintained by the data fragment, and the list of the transaction write marking queue comprises a transaction ID, a transaction time stamp and a reference count.

And step C, when the transaction is submitted and the data is successfully written, generating a logic instruction for each data fragment, and transmitting the logic instruction to the processor by the data fragment.

And D, generating an event based on the current logic instruction by the processor, sending the event to a transaction writing and marking queue, sending the event to the transaction writing and marking queue, pushing a consistency timestamp of the data fragment forward through the event consuming the logic instruction by the event, and transmitting the new timestamp to the aggregator after obtaining the new timestamp.

And E, performing deduplication and sequencing processing on the plurality of new timestamps by the aggregator, and then generating a global timestamp which is used as a consistency check point of the whole distributed database.

Preferably, the further technical scheme is as follows: in the step A, the system detects the data volume in each data fragment in real time, and when the number in one data fragment is overlarge, the data fragment is split into two new fragments; when the data volume of two adjacent data fragments in the range is too small, merging the two data fragments into one data fragment; and the size of the data volume in the data fragment is determined by a preset threshold value.

The further technical scheme is as follows: in step B, the transaction write tag queue comprises a consuming logic instruction and a recalc timestamp; when data updating occurs in the transaction, the consumption logic instruction finds the corresponding element in the ordered transaction list and updates the reference count of the corresponding element, so that the consistency timestamp of the current data fragment is advanced; recalculating the timestamp is to take out the element with the smallest timestamp from the list of the transaction write tag queue, compare the element with the current consistency timestamp, and if the comparison is advanced forward, update the consistency timestamp.

The further technical scheme is as follows: in step B, when the consumption logic instruction becomes 0 at the reference count of the transaction, the corresponding transaction is deleted from the list.

The further technical scheme is as follows: in the recalculating of the time stamp operation in step B, the consistency time stamp is also advanced using the natural time stamp.

The further technical scheme is as follows: in step D, the processor first waits for registration of the data fragment after starting, the data fragment outputs a registration request to the processor, and the processor processes a logic instruction transmitted by the data fragment after the registration of the data fragment is completed.

In another aspect, the present invention provides a method for applying a distributed database consistency checkpoint, which includes the following steps.

And step F, transmitting the obtained consistency check points and the sequence number to each database node through a majority of consistency dispatching protocols.

G, after the database node receives the consistency check point and the serial number, judging whether the serial number is larger than the serial number of the consistency check point received at the previous time in a temporary storage queue, if not, temporarily storing the current consistency check point and the serial number into a receiving queue; if the judgment result is yes, the current consistency check point is saved and the temporary storage queue is requested to be empty.

Preferably, the technical scheme is as follows: the consistency check point stored in the step G at this time is as follows: after the sequence number is judged to be larger than the sequence number of the consistency check point received at the previous time, whether the data fragment number of the consistency check point in the temporary storage queue is consistent with the fragment number on the current database node or not is continuously judged; if the judgment result is yes, the current consistency check point is stored on the current database node; if the judgment result is negative, whether the fragment quantity caused by the data fragment of the current database node is not initialized is consistent or not is continuously judged.

If the judgment result is yes, the current consistency check point is stored on the current database node; if the judgment result is negative, whether the number of the fragments is consistent or not because the data fragments on the current database node are split in the process of waiting for GC cleaning or in the transmission process of a consistency check point is continuously judged.

If the judgment result is yes, the current consistency check point is stored on the current database node; and if the judgment result is negative, the consistency check point at this time is abandoned to be saved.

Compared with the prior art, the invention has the following beneficial effects: by adopting a clock mechanism to generate the global timestamp as a consistency check point of the whole distributed database, the network transceiving bottleneck in the global transaction manager is avoided, the problem of performance consumption caused by data synchronization in a main/standby mode is solved, and the performance of the distributed database is improved. And the symmetric cluster architecture is adopted to realize the generation of the distributed transaction timestamp, the generation process is transparent to the service, and the user cannot perceive the working task of the consistency check point, so that the experience brought by the database product to the user is improved.

Drawings

FIG. 1 is a schematic diagram illustrating a transaction write tag queue in one embodiment of the invention.

FIG. 2 is a flow chart illustrating the generation of a consistency checkpoint in one embodiment of the present invention.

FIG. 3 is a flow chart illustrating the transmission of a consistency checkpoint in another embodiment of the present invention.

FIG. 4 is a flow chart illustrating the receipt of a database node for a consistency checkpoint in accordance with another embodiment of the present invention.

FIG. 5 is a flow chart illustrating the saving of a database node pair to a consistency checkpoint in accordance with another embodiment of the present invention.

Detailed Description

The terms used in the present invention are explained below.

MVCC: the MVCC is called Multi-Version Concurrency Control (Multi-Version Concurrency Control), i.e., information added with Version numbers during data storage, and controls visibility through the Version numbers during query. The version number is realized mainly in two ways, one is that the activity of the transaction is assisted through the global transaction ID; and the other is a time stamp, and time information is added to each data.

Writing a mark: representing a state in which the transaction is temporarily uncommitted. After creating the write tag, the database checks if there is a transaction conflict, and the conflict restarts the current transaction. If the transaction ends for other reasons (e.g., violations of constraints), the transaction is terminated.

Logic instructions: after the transaction is written, statistical information of the transaction, including a transaction ID, a transaction timestamp, a write flag, and the like, is generated.

And (3) fragment change feedback: the database storage data are partitioned according to the main key, and the partitions are independent. When a slice has data modifications thereon, a corresponding feedback action is generated, which is represented by generating a write tag corresponding to the data and recording it in a queue maintained within the slice change, which records a timestamp and a count of the write tag for each write transaction.

The invention is further elucidated with reference to the drawing.

In distributed databases based on a clock mechanism, instead of using a global transaction ID, a timestamp is used as a consistency checkpoint. Each node runs the transaction according to the time of the node, and in the distributed transaction, the validity of the transaction time stamp and the consistency of the transaction are ensured through the constant alignment between the nodes. This avoids performance problems caused by a single point of global service.

The method for generating the consistency check point of the distributed database is designed and realized aiming at the distributed database based on a clock mechanism, and the database uses a storage engine with a KV structure.

The characteristics of the transaction ACID in the database refer to atomicity, consistency, isolation and durability. By using a clock mechanism, atomicity and isolation can be guaranteed. Atomicity means that the transaction is complete, i.e., the data in the transaction is either not written at all or is written at all. Isolation refers to the isolation, independence, between transactions. The time stamp records the execution sequence of the transaction, can be sequenced according to the cause and effect relationship, also realizes the MVCC multi-version recording of the data, allows the same data to be read and written, realizes concurrency and greatly improves the performance in certain scenes.

The novel distributed database almost adopts a storage engine with a KV structure, and due to the advantages of elasticity, expandability, simple deployment and management and the like, the performance can be ensured after a large amount of optimization. When a database stores a piece of data, the combination of the primary key and the time stamp is used as a key, and the value is the data content after serialization.

The design idea and the framework of the invention are as follows: the transaction has three states: pending, Committed and Aborted. When a transaction is created, the state of the transaction is Pending, and the reference count of the transaction to data is increased by n; when a transaction commits or cancels, its status is set to either Committed or Aborted, and the corresponding reference count is decremented by n. Where n is the number of write markers for KV involved in the transaction. Because the database management of write tags is asynchronous, the counters are maintained independently of the transactions themselves. Therefore, the main idea of the method related by the invention is to identify the execution state of each transaction through a control mechanism of the transaction by the distributed database, strictly divide the transaction by using the timestamp, and the final result of the division is the check point. The timestamp is gradually pushed forward as the transaction state changes. And pushing the time stamp each time, taking out the minimum value, and storing the minimum value as a consistency check point.

Based on the design concept and the architecture, the consistency check point is generated mainly through data fragmentation, a transaction write mark queue and data fragmentation change feedback.

Step S1 is data slicing.

The database adopts the main key as a key word to segment the storage of the data. The primary key may be automatically generated by the system or may be designated by the user. In actual use, the automatic generation mode of the use system is more friendly, and the usability is improved.

The key values in the KV stored data are stored in order, so that the data fragmentation is easy to realize division in a range mode. This way, on one hand, can join better support to the business, on the other hand, can scan the data more efficiently compared with the distribution of the Hash way.

In this step, it is preferable that the data fragment can be automatically split and merged. When the data in one slice is too large, the system will detect and complete its split into two new slices. This ensures that the data in each slice is more balanced. And two fragments which are adjacent in range and have smaller data volume can automatically complete the merging operation by the system. The size of the fragmentation data volume is controlled by a preset threshold parameter.

The data fragmentation can also be automatically scheduled according to the load. If a plurality of data fragments on the same node become service access hot spots, which causes a large pressure on a certain node, the data fragments with high load on the node need to be migrated to other nodes, and the operating pressure of the node is reduced.

Data balancing and load balancing are important functions of data fragmentation, and fragmentation is used as a minimum unit for data access, supports complex transaction implementation of an upper layer, and also provides a generation environment of a consistency check point.

Step S2 writes a tag queue for the transaction.

As already mentioned above, the write tag represents a state where the transaction is temporarily uncommitted; in this step, each data slice maintains a transaction write tag queue, which is used to assist in generating a consistent timestamp for the corresponding slice, and the structure is shown in fig. 1. There is an ordered transaction list in the transaction write tag queue, and the transactions are sorted from small to large according to their timestamps, that is, the first element in the list is the transaction with the smallest timestamp. Each element in the list includes three important fields: transaction ID, transaction timestamp, and reference count.

In this step, it is preferable that the transaction write tag queue mainly includes two operations of consuming logic instruction and recalculating the time stamp.

Wherein the consuming logic instruction occurs when a data update occurs in a transaction, such as a transaction writing a piece of data or an asynchronous clean write flag. Once there is a data update, the corresponding element is found in the ordered transaction list and its reference count is updated. And in this operation, if the reference count becomes 0, the corresponding transaction is deleted from the list. At this point the consistency timestamp of the data slice advances.

Recalculation timestamp operation the calculation of the consistency timestamp for each data slice is performed by tracking the write tag queue, taking the first element from the queue, i.e., the element with the smallest timestamp, and comparing it with the old consistency timestamp, and if there is a push, updating the consistency timestamp. If no transaction exists in the ordered transaction list, another natural timestamp can be used for pushing the consistency timestamp, so that the problem that the consistency timestamp cannot be pushed when no service runs is avoided.

Step S3 is data slice change feedback.

As mentioned above, the shard change feedback is that the database storage data is sharded according to the primary key, and the shards are independent from each other. In this step, when the transaction is completed and the data is successfully written, a logic instruction is generated for each related data fragment, and the instruction is transmitted to the fragment change feedback module.

Data slice change feedback is logically divided into three large modules: a data fragmentation request, a processor and an aggregator.

As shown in fig. 2, the change feedback module starts a processor first and waits for registration of a data slice. The request sent by the data slice registered to the processor is received and processed. After the logic instruction is transmitted to the processor, the transaction consistency timestamp of the corresponding data fragment, i.e. the new timestamp, is generated through the steps of generating the event, consuming and pushing the timestamp. And sending the new time stamp of each data fragment to the aggregator, and performing uniform processing such as duplicate removal and sequencing by the aggregator to finally generate a global time stamp. The global timestamp is the consistency checkpoint for the current distributed database.

After the consistency check point of the distributed database is obtained in the above manner, another embodiment of the present invention is an application method of the consistency check point of the distributed database, where the method sends the generated consistency check point to each node database in the distributed database to be stored, and uses the generated consistency check point as a consistency check basis of each node database.

Step S4 is a consistency checkpoint transfer, and the distributed database typically transfers user data using a majority consensus protocol (e.g., Paxos, Raft protocol). The protocol adopts a majority algorithm, and when most nodes in the cluster reach an agreement, the data writing is considered to be successful.

After the consistency checkpoint is generated, it needs to be transmitted to the individual nodes. If the data is transmitted and stored by using a common network protocol, a few dispatching nodes which do not reach the agreement are caused, and when the consistency timestamp is received, the state updating of the transaction is not synchronized to the nodes, and the final data is inconsistent when a fault occurs. For the reasons stated above, a consistency checkpoint also needs to be transmitted using the majority dispatch consistency protocol, as shown in FIG. 3.

Step S5 is to save for the consistency check point, and each node will start a consistency point receiver to receive the consistency check point and the sequence number sent from the data fragment. In this step, the consistency check point is saved, and the reception flow and the saving flow are saved.

The above-described reception flow is shown in fig. 4.

Step S511, the receiver receives the consistency checkpoint and the sequence number.

Step S512, if the sequence number is larger than the last received value, it indicates that the batch may be ended, and tries to save the consistency check point of the batch; otherwise, the received data is stored in a temporary storage queue and then continues to be received.

Step S513, trying to save the consistency check point of the batch, where a specific process is shown in the following section.

Step S514, emptying the data in the temporary storage queue to prepare for receiving the data of the next batch.

And step S515, the flow ends.

The saving process begins with the receiver determining that the receipt of a lot consistency checkpoint is complete, and begins with an attempt to save, as shown in FIG. 5:

in step S521, the consistency check point in the temporary storage queue is also received from the data fragment, so it needs to determine whether the number of fragments in the temporary storage queue is consistent with the number of fragments on the node.

Step S522, if the matching is complete, the consistency point can be directly saved to the local node.

Step S523, if not, determine whether all the data fragments are caused by non-initialization of some data fragments of the node, if so, then the consistency check point may be saved to the node.

Step S524, if it is not caused by some data fragments of the node not being initialized, continuously determining whether the fragments are inconsistent because the fragments on the node are waiting for GC cleaning or because the fragments are split during the transmission of the consistency checkpoint, and if so, saving the consistency checkpoint to the node.

Step S525, if none is due to the above, abandoning the saving of the consistency checkpoint for the batch.

Step S526 ends the flow.

Based on the above embodiments of the present invention, one of the greatest improvements of the present invention is the generation mode and principle of the consistency check point, and the consistency check point is generated by maintaining a write tag reference count queue of a transaction for each data fragment and pushing a transaction timestamp in the queue; after the consistency check point is generated, the consistency check point is transmitted to all the node databases through a majority of consistency dispatching protocols; when the consistency check point is saved at the node database end, whether the consistency check point is consistent with the data fragment state on the node or not needs to be judged, and the inconsistent states can be ignored. The technical problem addressed by the present invention is therefore the generation of consistency checkpoints for distributed databases under a clock mechanism and how to save them consistently to the running mechanism on each node.

In addition to the foregoing, it should be noted that reference throughout this specification to "one embodiment," "another embodiment," "an embodiment," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment described generally throughout this application. The appearances of the same phrase in various places in the specification are not necessarily all referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with any embodiment, it is submitted that it is within the scope of the invention to effect such feature, structure, or characteristic in connection with other embodiments.

Although the invention has been described herein with reference to a number of illustrative embodiments thereof, it should be understood that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this disclosure. More specifically, various variations and modifications are possible in the component parts and/or arrangements of the subject combination arrangement within the scope of the disclosure, the drawings and the appended claims. In addition to variations and modifications in the component parts and/or arrangements, other uses will also be apparent to those skilled in the art.

Claims

1. A method for generating a consistency checkpoint of a distributed database, the method comprising:

taking a main key in a distributed database as a key word, fragmenting storage units in the distributed database to obtain a plurality of data fragments;

each data fragment passes through a transaction write marking queue maintained by the data fragment to generate a consistency timestamp of the data fragment in an auxiliary manner, and a list of the transaction write marking queue comprises a transaction ID, a transaction timestamp and a reference count;

when the transaction is submitted and the data is successfully written, generating a logic instruction for each data fragment, and transmitting the logic instruction to the processor by the data fragment;

generating an event by the processor based on the current logic instruction, sending the event to a transaction write marking queue, then sending the event to the transaction write marking queue, pushing forward the consistency time stamp of the data fragment through the event consuming the logic instruction, obtaining a new time stamp, and then transmitting the new time stamp to the aggregator;

and after the new timestamps are subjected to de-duplication and sequencing processing by the aggregator, generating a global timestamp, and taking the global timestamp as a consistency check point of the whole distributed database.

2. The method of generating a distributed database consistency checkpoint of claim 1 wherein: the system detects the data amount in each data fragment in real time, and when the amount in one data fragment is overlarge, the data fragment is split into two new fragments; when the data volume of two adjacent data fragments in the range is too small, merging the two data fragments into one data fragment;

and the size of the data volume in the data fragment is determined by a preset threshold value.

3. The method of generating a distributed database consistency checkpoint of claim 1 wherein: the transaction write tag queue comprises a consume logic instruction and a recompute timestamp;

when data updating occurs in the transaction, the consumption logic instruction finds the corresponding element in the ordered transaction list and updates the reference count of the corresponding element, so that the consistency timestamp of the current data fragment is advanced;

and the recalculation timestamp is an element with the smallest timestamp taken out from the list of the transaction write tag queue, and is compared with the current consistency timestamp, and if the comparison is advanced forward, the consistency timestamp is updated.

4. The method of generating a distributed database consistency checkpoint of claim 3 wherein: the consume logic instructions delete the corresponding transaction from the list when the reference count of the transaction becomes 0.

5. The method of generating a distributed database consistency checkpoint of claim 3 or 4 wherein: in the recalculating timestamps, the consistency timestamp is also advanced with a natural timestamp.

6. The method of generating a distributed database consistency checkpoint of claim 1 wherein: the processor firstly waits for registration of the data fragments after starting, the data fragments output registration requests to the processor, and the processor processes logic instructions transmitted by the data fragments after the registration of the data fragments is completed.

7. A method for applying a distributed database consistency checkpoint, the method comprising:

transmitting the consistency check point obtained by any one of claims 1 to 6, together with the serial number, to each database node by a majority consensus protocol;

after receiving the consistency check point and the serial number, the database node judges whether the serial number is larger than the serial number of the consistency check point received at the previous time in a temporary storage queue, and if the judgment result is negative, temporarily stores the current consistency check point and the serial number into a receiving queue; if the judgment result is yes, the current consistency check point is saved and the temporary storage queue is requested to be empty.

8. The method for applying a consistency check point to a distributed database according to claim 7, wherein the consistency check point stored this time is: after the sequence number is judged to be larger than the sequence number of the consistency check point received at the previous time, whether the data fragment number of the consistency check point in the temporary storage queue is consistent with the fragment number on the current database node or not is continuously judged;

if the judgment result is yes, the current consistency check point is stored on the current database node;

if the judgment result is negative, continuously judging whether the fragment quantity caused by the uninitialized data fragments of the current database node is consistent or not;

if the judgment result is negative, continuously judging whether the number of fragments is consistent or not because the data fragments on the current database node are split in the process of waiting for GC (gas chromatography) cleaning or in the transmission process of a consistency check point;

and if the judgment result is negative, the consistency check point at this time is abandoned to be saved.