WO2018014650A1

WO2018014650A1 - Distributed database data synchronisation method, related apparatus and system

Info

Publication number: WO2018014650A1
Application number: PCT/CN2017/085486
Authority: WO
Inventors: 陶维忠; 吴刚
Original assignee: 华为技术有限公司
Priority date: 2016-07-20
Filing date: 2017-05-23
Publication date: 2018-01-25
Also published as: CN107644030B; CN107644030A

Abstract

In the data synchronisation method provided by the present invention, a source data node processes normally the data operating commands sent by an application client and generates a data replication log to send to a target data node. When updating, the target data node receives the data replication log sent by the source data node. When the version of the source data node is higher than the version of the target data node, the target data node first caches the replication log, and after implementing version update of the target data node version, implements data replication on the basis of the cached replication log, thus implementing synchronisation between the source data node and the target data node. Compared to the prior art, the upgrading method provided in the embodiments of the present invention enables a source data node and a target data node to process data operating commands sent by an application client whilst upgrading, improving the reliability of the distributed database.

Description

Distributed database data synchronization method, related device and system

This application claims priority to Chinese Patent Application No. 201610578730.9, entitled "Distributed Database Data Synchronization Method, Related Devices and Systems" on July 20, 2016, the entire contents of which are incorporated by reference. In this application.

Technical field

The present invention relates to the field of database technologies, and in particular, to a distributed database data synchronization method, related device and system.

Background technique

In recent years, with the rapid growth of data volume, distributed database technology has also developed rapidly, and traditional relational databases have begun to evolve from centralized to distributed. Distributed database specifically refers to the use of high-speed computer networks to connect physically dispersed multiple data storage nodes to form a logically unified database. The basic idea of a distributed database is to distribute the data in the original centralized database to multiple data storage nodes connected through the network to obtain larger storage capacity and higher concurrent access.

The prior art provides a distributed database. As shown in FIG. 1, the same data is stored on the data node 1 and the data node 2, and both the data node 1 and the data node 2 are in an active state. The data operation instruction may be sent to the data node 1 or the data node 2 for execution. After the data node executes the data operation instruction, the execution result is synchronized to another data node by the form of the replication log, and the other data node performs data replication according to the replication log. , thereby achieving data synchronization between data nodes 1 and 2.

However, in the process of data synchronization in distributed databases, there are scenarios in which the database needs to be upgraded, such as adding fields to the data model in the database and deleting fields. The prior art provides a data node-based data synchronization method. The method is described below with a data node 2 upgrade. The method mainly includes the following steps:

S1. Node 1 suspends processing of data operation commands sent by the application client and suspension of data replication log transmission between node 1 and node 2.

S2, node 2 receives the upgrade instruction, performs a version upgrade operation, and then node 2 receives the data operation command sent by the application client and executes, and records the data replication log.

After the upgrade is completed, the node 2 sends the upgrade command and the data replication log to the node 1, and the node 1 receives the data replication log and the upgrade command sent by the node 2.

S4 and node 1 first upgrade the version according to the upgrade instruction, and then perform data replication according to the replication log. After the replication is completed, the data operation instruction sent by the application client is normally received, and the normal data between the node 1 and the node 2 is restored. Synchronize.

When the data synchronization method of the prior art is upgraded in the database, the data node 1 cannot process the data operation command sent by the application client, and only the data node 2 can process the data operation command sent by the application client, thereby reducing the reliability of the distributed database. .

Summary of the invention

Embodiments of the present invention provide a data synchronization method and apparatus for improving reliability of a distributed database.

In an aspect, an embodiment of the present invention provides a distributed database data synchronization method, which is applied to a destination data node in the distributed database, where the destination data node further includes a data operation instruction sent by an application client. :

Receiving a data replication log sent by the source data node, where the replication log carries data that needs to be synchronized to the destination node;

Obtaining a version of the source data node, and determining that the version of the source data node is higher than its own version, and buffering the replication log;

Receiving an upgrade instruction sent by the metadata server, where the upgrade instruction is used to instruct the destination data node to perform a version upgrade;

The upgrade is performed according to the upgrade instruction, and after the upgrade is completed, data replication is performed according to the cached replication log. The upgrade instruction carries a field for modifying a data model of the destination data node or an instruction for adding a field of the data model.

In this embodiment, the source data node can normally process the data operation instruction sent by the application client, and generate a data replication log to be sent to the destination data node. When the destination data node is upgraded, it can receive the data replication log sent by the source data node. When the version of the source data node is higher than the version of the source data node, the data replication log is cached first, and after the version upgrade is completed, the data replication is performed according to the cached replication log, and the synchronization between the source data node and the destination data node is completed. . Compared with the prior art, the upgrade method provided by the embodiment of the present invention enables the source data node and the destination data node to process data operation commands sent by the application client while upgrading, thereby improving the reliability of the distributed database.

In combination with the first aspect, in a possible implementation, the destination data node obtains the version of the source data node.

include:

Obtaining, according to the data replication log, a version of the source data node, where the data replication log carries a version of the source data node; or

Receiving a notification message sent by the source data node, where the notification message carries a version of the source data node, and obtains a version of the source data node according to the notification message.

With reference to the first aspect, in a possible implementation, the method further includes:

The destination data node generates a data replication log when the data operation instruction is executed, where the data replication log carries data that needs to be synchronized to a data node corresponding to the destination data node;

The data replication log is sent to the source data node to enable data synchronization between the destination data node and the source data node.

With reference to the first aspect, in a possible implementation, when the destination data node further determines that the version of the source data node is lower than its own version, it indicates that it can process the replication log sent by the source data node, and the data. The node performs data copying according to the copy log, thereby implementing data synchronization from a low version to a high version.

With reference to the first aspect, in a possible implementation, the sending, by the destination data node, the data replication log to the source data node includes:

The replication log is placed in a local transmit queue and sent to the source data node. Compared with the synchronous mode (the synchronous mode requires the peer to process the replication log and the synchronization is finished), the replication log can be passed to improve the availability of the database system.

In a second aspect, an embodiment of the present invention provides a destination data node of a distributed database, including:

a data operation instruction processing unit, configured to execute a data operation instruction sent by the application client;

a receiving unit, configured to receive a data replication log sent by the source data node, where the replication log carries data that needs to be synchronized to the destination node;

a log cache unit, when the version of the source data node is obtained, and the version of the source data node is determined to be higher than its own version, the replication log is cached;

The receiving unit is further configured to receive an upgrade instruction sent by a metadata server, where the upgrade instruction is used to indicate the item Data node for version upgrade;

An upgrade unit, configured to perform an upgrade according to the upgrade instruction;

The log synchronization unit is configured to perform data replication according to the cached replication log after the upgrade is completed.

With reference to the second aspect, in a possible implementation, the obtaining, by the log cache unit, the version of the source data node includes:

The log cache unit acquires a version of the source data node according to the data replication log, where the data replication log carries a version of the source data node; or

The log buffering unit receives the notification message sent by the source data node, where the notification message carries the version of the source data node, and obtains the version of the source data node according to the notification message.

With reference to the second aspect, in a possible implementation, the destination data node further includes:

a log generating unit, configured to: when the data operation instruction is executed, generate a data replication log, where the data replication log carries data that needs to be synchronized to a data node corresponding to the destination data node;

And a sending unit, configured to send the data replication log to the source data node.

With reference to the second aspect, in a possible implementation, the log synchronization unit of the destination data node is further configured to perform data replication according to the replication log when determining that the version of the source data node is lower than its own version.

With reference to the second aspect, in a possible implementation, the sending, by the sending unit of the destination data node, the data replication log to the source data node includes:

The sending unit sends the replication log to a local sending queue to the source data node.

In a third aspect, an embodiment of the present invention provides a distributed database data synchronization method, which is applied to a destination data node in the distributed database, where the destination data node further includes a data operation instruction sent by the application client. :

Receiving an upgrade pre-notification instruction sent by the metadata server, where the upgrade pre-notification instruction carries an entry that needs to be deleted;

Receiving a data replication log sent by the source data node, where the data replication log carries data that needs to be synchronized to the destination node;

When it is determined that the data corresponding to the entry to be deleted is included in the replication log, the data corresponding to the entry to be deleted is filtered when the data is copied according to the replication log, and the data is already in the destination data node. The existing entries are processed according to default values to achieve data synchronization between the source data node and the destination data node.

In conjunction with the third aspect, in a possible implementation, the method further includes:

Receiving an upgrade command sent by the metadata server, where the upgrade command is used to delete an entry of the data node, and deleting an entry of the data node according to the upgrade instruction. In this embodiment, since the upgrade pre-notification command has been received in advance, and the upgrade pre-notification command carries the entry to be deleted, the destination data node can know in advance which data corresponding to the entry needs to be deleted, thereby receiving the replication log. Or when the replication log is sent to the source data node, the data is synchronized in advance.

In combination with the third aspect, in a possible implementation, the method further includes:

When the data replication log is generated according to the execution result of the data operation instruction, the data corresponding to the entry to be deleted is filtered, and the data replication log is sent to the source data node.

In a fourth aspect, the embodiment of the present invention provides a distributed database destination data node, where the destination data node specifically includes:

a receiving unit, configured to receive an upgrade pre-notification instruction sent by the metadata server, where the upgrade pre-notification instruction is Carry the entry that needs to be deleted;

The receiving unit is further configured to receive a data replication log sent by the source data node, where the data replication log carries data that needs to be synchronized to the destination node;

a log processing unit, configured to filter data corresponding to the entry to be deleted when the data is copied according to the copy log, when the data corresponding to the entry to be deleted is included in the copy log, The entries already existing in the destination data node are processed according to default values, thereby implementing data synchronization between the source data node and the destination data node.

With reference to the fourth aspect, in a possible implementation, the destination data node further includes:

And an upgrade unit, configured to receive an upgrade command sent by the metadata server, where the upgrade command is used to delete an entry of the data node, and delete an entry of the data node according to the upgrade instruction. In this embodiment, since the upgrade pre-notification command has been received in advance, and the upgrade pre-notification command carries the entry to be deleted, the destination data node can know in advance which data corresponding to the entry needs to be deleted, thereby receiving the replication log. Or when the replication log is sent to the source data node, the data is synchronized in advance.

And a log sending unit, configured to: when the data replication log is generated according to the execution result of the data operation instruction, filter the data corresponding to the entry to be deleted, and send the data replication log to the source data node.

With reference to the third aspect, the fourth aspect, in a possible implementation, the sending, by the destination data node, the data replication log to the source data node includes:

The replication log is placed in the local send queue and sent to the source data node.

In a fifth aspect, an embodiment of the present invention provides a distributed database, including the destination data node according to the second aspect or the fourth aspect, and a source data node corresponding to the destination data node.

The distributed data provided by the embodiment of the present invention can complete data synchronization between the source data node and the destination data node while upgrading the source data node and the destination data node, thereby further improving the reliability of the distributed database.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work. among them:

1 is a schematic structural diagram of a distributed database provided by the prior art;

2 is a schematic structural diagram of a distributed database according to Embodiment 1 of the present invention;

3 is a schematic diagram of data storage of a data node according to an embodiment of the present invention;

4 is a flowchart of a data synchronization method according to Embodiment 2 of the present invention;

5 is a flowchart of a data synchronization method according to Embodiment 3 of the present invention;

FIG. 6 is a schematic diagram of data storage of a data node according to an embodiment of the present invention; FIG.

7 is a data synchronization method of a distributed database according to Embodiment 4 of the present invention;

8 is a schematic structural diagram of a destination data node according to Embodiment 5 of the present invention;

9 is a schematic structural diagram of a destination data node according to Embodiment 6 of the present invention;

FIG. 10 is a schematic structural diagram of a destination data node in a distributed database according to Embodiment 7 of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without departing from the inventive scope are the scope of the present invention.

The present invention provides a data synchronization method, a related device, and a database system for a distributed database. Referring to FIG. 2, FIG. 2 is a schematic structural diagram of a database system according to Embodiment 1 of the present invention.

As shown in FIG. 2, the data processing system of the present invention includes an application client, a metadata server, and a data node server from top to bottom.

The application client is an application that uses a database, such as a billing application. The application client can access the data stored in the data node server.

The metadata server is responsible for the distributed management capabilities of the database system. The metadata server can be deployed independently or in conjunction with the data node server. The metadata server can be deployed using a minicomputer, an X86 computer, or a personal computer server PC server.

The application client communicates with the metadata server and the data node server respectively through the IP network, wherein the communication interface between the client and the metadata server or the data node server may be a Transmission Control Protocol (TCP) interface or User Datagram Protocol (UDP) interface.

The data node server includes a plurality of data nodes (also referred to as physical nodes), and the data nodes may be minicomputers, X86 computers, or personal computer servers, PC servers, etc., and data in the data nodes may be stored in a storage medium located in the storage network. A plurality of data nodes and a storage network are read and written by a block IO (block IO), that is, a storage medium is read and written by a block. The storage medium may be a hard disk drive (HDD) or a solid state. Hard State Drives (SSD), etc.

In addition, the client also deploys a driver driver, which caches routing information. In this way, when the client sends a data operation instruction to the physical node, the client can complete the route judgment through the cached routing information and access the corresponding physical node.

Referring to Figure 3, Figure 3 shows a schematic diagram of data storage on a data node. Data node 1 and data node 2 are backups of each other, a slice located on data node 1 (which may also be referred to as a virtual node) 1, and a slice 2 located on data node 2 as a primary slice, and a segment on data node 1 Slice 2 and slice 1 located on data node 2 are spare slices. Data node 1 and data node 2 may be referred to as a source data node and a destination data node, respectively. The names of the source data node and the destination data node are relative.

The metadata server stores the mapping relationship between the virtual node and the physical node, and the definition of the primary fragment and the backup slice of the virtual node. The metadata server is also used to control data synchronization between data nodes.

With continued reference to FIG. 4, FIG. 4 is a flowchart of a data synchronization method according to Embodiment 2 of the present invention. The data node 1 is a source data node, and the data node 2 is a destination data node. The data synchronization method of the database provided by this embodiment can be applied to the scenario of database upgrade, and the method mainly includes the following steps:

Step 301: The data node 1 receives an upgrade instruction sent by the metadata server, where the upgrade instruction is used to indicate that the source data node performs version upgrade.

The data nodes 1 and 2 successively receive the upgrade instruction sent by the metadata server to notify the upgrade from the version V1 to the V2, and the upgrade instruction may modify the field of the data model (for example, a table structure) or add a field of the data model. Due to the existence of the time when the upgrade command is received, the data models of data nodes 1 and 2 are inconsistent. In the embodiment of the present invention, the data node 1 first receives an upgrade instruction as an example.

Step 302: The data node 1 performs an upgrade according to the upgrade instruction.

Step 303: After the upgrade succeeds, the data node 1 receives the data operation command sent by the application client, performs a data operation according to the data operation command, and generates a replication log.

In this embodiment, the data node 1 can receive data operation instructions sent by the application client, such as querying, modifying, and the like of the data of the primary fragment 1 while performing the upgrade. After performing the data operation, the data node 1 generates a replication log, which carries data that needs to be synchronized to the data node corresponding to the backup slice 1 (ie, the data node 2).

Step 304, the data node 1 sends the generated replication log to the data node 2.

Before the upgrade starts or before the upgrade is completed, Data Node 1 can also receive and process the data operation instructions sent by the client and generate a replication log. At this time, the versions of the data node 1 and the data node 2 are the same, and the data node 1 directly transmits the copy log to the data node 2 to realize data synchronization between the data node 1 and the data node 2.

After the upgrade is completed, the data node 1 can also receive and process the data operation instructions sent by the client and generate a replication log. At this time, the version of the data node 1 is higher than the version of the data node 2, that is, the data model is out of synchronization between the data node 1 and the data node 2, and the data node 2 in the prior art cannot process the copy received from the data node 1. The log is described in the embodiment of the present invention.

In this embodiment, the replication log can be transmitted asynchronously between the data node 1 and the data node 2, that is, the data node 1 can put the replication log into the local transmission queue, and the local processing ends. Passing the replication log relative to the synchronous mode (the destination data node handles the replication log successfully) can improve the availability and partition fault tolerance of the database system.

Step 305: The data node 2 receives the data replication log sent by the data node 1, and the replication log carries data that needs to be synchronized to the data node 1.

Step 306: The data node 2 acquires the version of the data node 1, and determines that the version of the data node 1 is higher than its own version, and caches the replication log.

In this embodiment, the copy log may carry the version of the data node 1, and the data node 2 may obtain the version of the data node 1 from the copy log, and compare the version of the data node 1 with its own version. In addition, the data node 2 can also receive the notification message sent by the data node 1, the notification message carrying the version of the source data node, and the data node 2 acquiring the version of the data node 1 according to the notification message.

In this embodiment, if the version of the data node 1 is higher than its own version, it indicates that the copy log is generated according to the new data model, and the data node 2 temporarily does not process the copy log and cache the copy log.

In another embodiment, if the version of the data node 1 is lower than its own version, ie, the data model on the data node 2 is newer, the data node 2 can process the replication log sent by the data node 1, that is, according to the replication log. data synchronization. If the version of data node 1 is the same as its own version, data node 2 can process the replication log sent by data node 1, ie, synchronize data according to the replication log.

Step 307: The data node 2 receives an upgrade instruction sent by the metadata server, where the upgrade instruction is used to instruct the data node 2 to perform a version upgrade.

In this embodiment, the data nodes 1, 2 receive the upgrade instructions sent by the source data server in succession.

Step 308, the data node 2 is upgraded according to the upgrade instruction.

Step 309: After the upgrade is completed, the data node 2 performs data replication according to the cached replication log.

In the embodiment of the present invention, after the upgrade is completed, the version of the data node 2 is the same as the version of the data node 1, and the data node 2 starts to process the replication log, that is, performs data replication according to the replication log cached in step 306 (also referred to as data replication). Data redo) to ensure data synchronization between Data Node 1 and Data Node 2.

In this embodiment, the source data node (data node 1) can normally process the data operation instruction sent by the application client, and generate a data replication log to be sent to the destination data node (data node 2). When the destination data node is upgraded, it can receive the data replication log sent by the source data node. When the version of the source data node is higher than the version of the source data node, the data replication log is cached first, and after the version upgrade is completed, the data replication is performed according to the cached replication log, and the synchronization between the source data node and the destination data node is completed. . Compared with the prior art, the upgrade method provided by the embodiment of the present invention enables the source data node and the destination data node to process data operation commands sent by the application client while upgrading, thereby improving the reliability of the distributed database.

In the embodiment of the present invention, the data node 2 may perform the data operation instruction sent by the application client while performing the foregoing upgrade, and the data node 2 may further perform the following steps:

Step 310: When executing the data operation instruction, the data node 2 generates a data replication log, where the data replication log carries data that needs to be synchronized to the data node corresponding to the data node 2.

The data node corresponding to the data node 2 is the data node 1. For example, if the data on the primary fragment 2 on the data node 2 is modified, the modified data carries the modified data, and the data needs to be synchronized to the backup slice 2 of the data node 1.

Step 311: The data node 2 sends the data replication log to the data node 1.

In this embodiment, the replication log can be transmitted asynchronously between the data node 2 and the data node 1, that is, the data node 2 can put the replication log into the local transmission queue, and the local processing ends. Passing replication logs synchronously can improve database system availability and partition fault tolerance.

It should be noted that steps 310-311 and the previous steps 305-309 are timing-independent, that is, the data node 1 and the data node 2 receive the data operation instruction is timing-independent.

Referring to FIG. 5, FIG. 5 is a flowchart of a data synchronization method according to Embodiment 3 of the present invention. The data node 1 is a source data node, and the data node 2 is a destination data node. The database data synchronization method provided in this embodiment is mainly described by taking the data node 2 as an example. When the data node 2 executes the data operation instruction sent by the application client, the method further includes the following steps:

Step 401: The data node 2 receives an upgrade pre-notification command sent by the metadata server, where the upgrade pre-notification command carries an entry that needs to be deleted.

In this embodiment, the metadata server sends an upgrade pre-notification instruction to the data nodes 1 and 2 in advance before sending the upgrade instruction, where the upgrade pre-notification instruction is used to notify the data node that the data model needs to be modified in advance. The notification instruction carries an entry (also referred to as a field) that needs to be deleted in the data model (for example, a table structure), indicating that the entry in the data model needs to be deleted. In this embodiment, the data node 2 (destination data node) is taken as an example for description.

Step 402: The data node 2 receives a data replication log sent by the data node 1, where the data replication log carries data that needs to be synchronized to the data node 2.

The data synchronization between the data node 1 and the data node 2 does not stop. After receiving the data operation instruction sent by the application client, the data node 1 performs a corresponding data operation, generates a data replication log, and synchronizes to the data node. 1. The data replication log carries data that needs to be synchronized to data node 2.

Step 403: When it is determined that the data corresponding to the entry to be deleted is included in the replication log, the data node 2 filters the data corresponding to the entry to be deleted when performing data replication according to the replication log, for the purpose. The existing entries in the data node are processed according to the default values.

Specifically, since the data node 2 receives the upgrade pre-notification instruction in advance, the upgrade pre-notification instruction carries the need to delete In addition to the entry, the data node 2 may determine whether the data corresponding to the entry to be deleted is included in the replication log, and if so, the data node 2 filters the entry to be deleted when performing data replication according to the received replication log. Corresponding data is processed according to the default value of the existing entries in the data node 2, that is, the data corresponding to the entry to be deleted in the data node 2 is not modified (avoiding modification first, then deleting when upgrading), thereby realizing Data synchronization between data nodes 1 and 2 improves the efficiency of data synchronization.

Step 404: The data node 2 receives the upgrade command sent by the metadata server, where the upgrade command is used to delete the entry of the data node, and delete the entry of the data node according to the upgrade instruction.

In this embodiment, the data node 2 can execute the upgrade command sent by the metadata server while processing the copy log, that is, the step 404 and the steps 401-403 are time-independent, and the step 404 can also be performed before the step 401. .

In this embodiment, the metadata server sends an upgrade instruction for deleting the entry of the data node to the data nodes 1 and 2. After receiving the upgrade command, the data node 1, 2 deletes the entry in the data node 1, 2 according to the upgrade instruction.

Step 405: The data node 2, when generating the data replication log according to the execution result of the data operation instruction, filters the data corresponding to the entry to be deleted, and sends the data replication log to the source data node.

In this embodiment, after executing the data operation instruction sent by the application client, the data node 2 itself generates a data replication log according to the execution result of the data operation instruction. The data node 2 can filter the data corresponding to the entry to be deleted when the data replication log is generated, so that the data corresponding to the entry to be deleted is not carried in the replication log, and the transmission efficiency of the data replication log is improved.

It should be noted that step 405 and steps 402-404 are time-independent, for example, step 405 can be performed before step 402.

Further, the replication log can be transmitted asynchronously between the data node 1 and the data node 2 in this embodiment, that is, the data node 2 can put the replication log into the local transmission queue, and the processing ends even if the local processing is completed. Passing replication logs synchronously can improve database system availability and partition fault tolerance.

For a more detailed understanding of the data synchronization method provided by the embodiment of the present invention, a specific application scenario of the embodiment of the present invention is given below. As shown in FIG. 6, FIG. 6 is a schematic diagram of data storage of a data node according to an embodiment of the present invention.

Among them, the distributed database consists of four physical nodes 1, 2, 3 and 4, each node has 3 main shards (the part filled with slashes in the figure) and 3 spare shards (unfilled parts in the figure) .

For example, slice 3 is copied from node 1 to node 2, and slice 6 is copied from node 2 to node 1. Two-way replication between node 1 and node 2.

Each shard in the distributed database consists of two table structures, Table_A and Table_B, which are currently in V1 version. The table structure is as follows:

Table_A table structure is

{

Cust_id int,

Cust_Name varchar(128),

Cust_bank varchar(128),

}

Table_A uses cust_id as the primary key.

Table_B table structure is

{

Product_id int,

product_Name varchar(128),

Product_price int,

}

Table_B uses product_id as the primary key.

After the upgraded target V2 version, the table structure changes as follows:

Table_A table to delete the field cust_bank, Table_B table to add the field Product_discount int.

Taking the bidirectional replication between Node 1 (DataNode 1) and Node 2 (DataNode 2) as an example, the data synchronization during the online upgrade process from V1 to V2 is introduced.

Referring to FIG. 7, FIG. 7 is a data synchronization method of a distributed database according to Embodiment 4 of the present invention. The above synchronization methods mainly include:

Step 601: The metadata server receives a table structure upgrade instruction sent by the application client.

The table structure upgrade instruction includes a Table_A delete field and a Table_B add field.

Step 602: The metadata server sends an upgrade pre-notification command to all the data nodes, where the upgrade pre-notification command carries the entry to be deleted.

The upgrade pre-notification command is used to inform the data node to upgrade from version V1 to V2, and inform the table_A in the data node that the field to be deleted (also referred to as a heterogeneous field) cust_bank. In this embodiment, the data node 1 and the data node 2 are taken as an example for illustration.

In step 603, the data nodes 1, 2 demote the synchronous replication to the asynchronous replication mode.

The steps of the data nodes 1 and 2 processing the upgrade pre-notification command include demoting the synchronous replication to the asynchronous replication mode. The replication is degraded to ensure that the log replication process changes during the upgrade process, and does not affect or block online services, ensuring high availability of the system.

At the same time, each data node also identifies the field Table_A.cust_bank that needs to be deleted during the upgrade.

Step 604: The data node 1, 2 receives the data operation instruction sent by the application client, performs a corresponding data operation, and generates a replication log.

Online business was not interrupted throughout the upgrade process. The online service generates a copy log for the write operation of the data nodes 1, 2. At this time, the data node can directly filter out the field Table_A.cust_bank according to the identified field that needs to be deleted, and the manner of generating the replication log can be heterogeneous replication.

Step 605: The data node 1 asynchronously transmits the generated replication log to the data node 2.

In this embodiment, the data of the primary fragment 3 of the data node 1 is modified, and the replication log generated by the data node 1 carries the data on the backup slice 3 that needs to be synchronized to the data node 2.

Step 606: The data node 2 performs data synchronization according to the replication log.

The heterogeneous field Table_A.cust_bank exists in the replication log, and the data node 2 performs filtering when performing data synchronization (also referred to as data redo processing), and the heterogeneous field Table_A.cust_bank is also processed according to the default value (nul l). .

For the fragment 6, the data node 1 also receives the replication log sent by the data node 2, and performs data synchronization according to the replication log.

Step 607: The metadata server sends an upgrade instruction to the data nodes 1 and 2 to instruct to perform a version upgrade.

In this embodiment, the metadata server sends an upgrade instruction through a Data Definition Language (DDL) operation, and the upgrade content includes a Table_A table deletion field cust_bank, and a Table_B table adds a field. Product_discount int.

Since the data nodes 1 and 2 receive the sequence of the upgrade command, the default data node 1 in the embodiment first receives the upgrade command as an example.

Step 608, the data node 1 first receives and processes the DDL upgrade instruction.

After the data node 1 executes the DDL upgrade instruction, its version number V2 is instantaneously newer than the version number V1 of the data node 2.

Step 609, the data node 1 sends the generated replication log to the data node 2.

During the upgrade, the copy processing of slice 3 is as follows:

Due to the uninterrupted online business during the upgrade, the data operation was not interrupted. The replication log generated by the online service to the primary fragment 3 write operation of the data node 1 is synchronously synchronized to the number node 2, and the replication log is matched with the new version V2 (the field is added in the replication log Table_B.Product_discount, and the field Table_A is deleted. Cust_bank).

Step 610: The data node 2 performs data synchronization according to the replication log.

Data node 2 obtains version V2 of data node 1 from the replication log. Since version V1 of data node 2 is low (no field Table_B.Product_discount), the replication log of V2 version cannot be processed (the log information of the added field Table_B.Product_discount cannot be recognized). ). At this point, data node 2 first caches this portion of the replication log.

Step 611, the data node 2 sends the generated replication log to the data node 1.

During the upgrade, the copy processing of slice 6 is as follows:

Since the data node 2 writes the primary fragment 6 of the data node 2 before the DDL upgrade command is received, the replication log is generated and synchronized to the data node 1 in an asynchronous manner. The replication log is matched with the old version V1.

Step 612: The data node 1 performs data synchronization according to the replication log.

Since the table structure V2 of the data node 1 is newer, the copy log of the V1 version fragment 6 can be processed. At this time, the data node 1 normally processes the fragment 6 copy log transmitted from the data node 2, and the new field Table_B.Product_discount that is not involved in the V1 version of the data node 2, according to the default value nul when performing the copy processing. l processing. For the heterogeneous field Table_A.cust_bank, processing is performed according to the processing logic of step 606.

Step 613, the data node 2 processes the DDL upgrade instruction.

In this embodiment, the data node 2 receives the DDL upgrade instruction sent by the metadata server with respect to the data node 1. The data node 2 executes the DDL upgrade instruction to complete the version upgrade. In the upgraded V2 version, the Table_A table deletes the field cust_bank, and the Table_B table adds the field Product_discount int.

Step 614: The data node 2 performs data synchronization according to the cached replication log.

The upgraded data node has the version V2, and can process the previously cached replication logs to complete data synchronization between the data nodes 1 and 2.

Step 615: After the upgrade is completed, the data nodes 1, 2 send a notification message of successful upgrade to the metadata server.

After identifying that all nodes have completed the upgrade, the metadata server enters the post-upgrade processing, for example, notifying each data node to restore the replication level to synchronous replication. At the same time, heterogeneous replication is also cancelled, and the replication log is generated normally.

Referring to FIG. 8, FIG. 8 is a schematic structural diagram of a destination data node according to Embodiment 5 of the present invention.

The destination data node may be the data node 2 shown in Figure 3-5. The destination data node employs general purpose computer hardware including a processor 101, a memory 102, a bus 103, an input device 104, an output device 105, and a network interface 106.

In particular, memory 102 can include computer storage media in the form of volatile and/or nonvolatile memory, such as read only memory and/or random access memory. The memory 102 can store an operating system, an application, and other programs Modules, executable code, and program data.

The input device 104 can be used to input commands and information to a destination data node, such as a keyboard or pointing device, such as a mouse, trackball, touchpad, microphone, joystick, game pad, round dish satellite television antenna, scanner Or similar equipment. These input devices can be connected to the processor 101 via a bus 103.

The output device 105 can be used for the destination data node to output information. In addition to the monitor, the output device 105 can also be configured for other peripheral outputs, such as speakers and/or printing devices, which can also be connected to the processor via the bus 103. 101.

The destination data node can be connected to the network through the network interface 106, for example to a local area network (LAN). In a networked environment, computer-executed instructions stored in a destination data node may be stored in a remote storage device, and are not limited to being stored locally.

When the processor 101 in the destination data node executes the executable code or application stored in the memory 102, the destination data node may perform the method steps on the destination data node side in the second, third, and fourth embodiments above, for example, performing steps 305-311, 401-405, 603, 606, 610, and the like. For details, refer to the second, third, and fourth embodiments, and details are not described herein again.

Referring to FIG. 9, FIG. 9 is a schematic structural diagram of a destination data node according to Embodiment 6 of the present invention.

As shown in the figure, the destination data node provided by the embodiment of the present invention includes:

The data operation instruction processing unit 710 is configured to execute a data operation instruction sent by the application client;

The receiving unit 720 is configured to receive a data replication log sent by the source data node, where the replication log carries data that needs to be synchronized to the destination node.

The log buffering unit 730 is configured to obtain a version of the source data node, and determine that the version of the source data node is higher than its own version, and cache the replication log.

The receiving unit 720 is further configured to receive an upgrade instruction sent by the metadata server, where the upgrade instruction is used to indicate that the destination data node performs a version upgrade.

The upgrading unit 740 is configured to perform an upgrade according to the upgrade instruction;

The log synchronization unit 750 is configured to perform data replication according to the cached replication log after the upgrade is completed.

The destination data node provided by the embodiment of the present invention may be used in the foregoing method embodiments 2 and 4, which is between the data operation instruction unit 710, the receiving unit 720, the log buffer unit 730, the upgrading unit 740, and the log synchronization unit 750. The cooperation steps are performed to complete the method steps on the data node side of the second and fourth embodiments. Compared with the destination data node in the prior art, the destination data node provided by this embodiment has the same beneficial effects as the foregoing method embodiment when performing data synchronization.

Specifically, the version of the source data node obtained by the log cache unit 730 in the destination data node includes:

The log cache unit 730 obtains a version of the source data node according to the data replication log, where the data replication log carries a version of the source data node; or

The log buffering unit 730 receives the notification message sent by the source data node, where the notification message carries the version of the source data node, and obtains the version of the source data node according to the notification message.

Further, the destination data node described in FIG. 5 further includes:

The log generating unit 760 is configured to generate a data replication log, where the data replication log carries data that needs to be synchronized to a data node corresponding to the destination data node, when the data operation instruction is executed;

The sending unit 770 is configured to send the data replication log to the source data node. Wherein, the sending unit 770 can The copy log is sent to the local data node by placing it in a local transmit queue.

In the embodiment, the specific process of the log generation unit 760 generating the replication log and the sending unit 770 sending the replication log to the source data node may refer to the description of steps 310-311 in the foregoing method embodiment.

In this embodiment, the log synchronization unit 750 of the destination data node is further configured to perform data replication according to the replication log when determining that the version of the source data node is lower than its own version.

Referring to FIG. 10, FIG. 10 is a schematic structural diagram of a destination data node in a distributed database according to Embodiment 7 of the present invention.

As shown in the figure, the destination data node specifically includes:

The data operation instruction processing unit 810 is configured to execute a data operation instruction sent by the application client;

The receiving unit 820 is configured to receive an upgrade pre-notification command sent by the metadata server, where the upgrade pre-notification command carries an entry that needs to be deleted;

The receiving unit 820 is further configured to receive a data replication log sent by the source data node, where the data replication log carries data that needs to be synchronized to the destination node;

The log processing unit 830 is configured to: when the data corresponding to the entry to be deleted is included in the replication log, filter the data corresponding to the entry to be deleted when performing data replication according to the replication log, The table items already existing in the destination data node are processed according to default values, thereby implementing data synchronization between the source data node and the destination data node.

The destination data node provided by the embodiment of the present invention may be used in the foregoing method embodiments 3 and 4, and the third embodiment is implemented by the cooperation between the data operation instruction unit 810, the receiving unit 820, and the log processing unit 830. Method steps on the data node side of the fourth. Compared with the destination data node in the prior art, the destination data node provided by this embodiment has the same beneficial effects as the foregoing method embodiment when performing data synchronization.

With further reference to FIG. 10, the destination data node provided by the embodiment of the present invention further includes:

The upgrading unit 840 is configured to receive an upgrade command sent by the metadata server, where the upgrade command is used to delete an entry of the data node, and delete an entry of the data node according to the upgrade instruction. In this embodiment, since the upgrade pre-notification command has been received in advance, and the upgrade pre-notification command carries the entry to be deleted, the destination data node can know in advance which data corresponding to the entry needs to be deleted, thereby receiving the replication log. Or when the replication log is sent to the source data node, the data is synchronized in advance.

The log sending unit 850 is configured to filter data corresponding to the entry to be deleted when the data replication log is generated according to the execution result of the data operation instruction, and send the data replication log to the source data node. The log sending unit 850 can send the replication log to the local sending queue to the source data node.

In the embodiment, the specific process of the upgrade unit 840 to upgrade the data node itself and the log sending unit 850 to send the copy log to the source data node is described in steps 404-405 in the foregoing method embodiment.

In this embodiment, the destination data node is presented in the form of a functional unit. A "unit" herein may refer to an application-specific integrated circuit (ASIC), circuitry, a processor and memory that executes one or more software or firmware programs, integrated logic circuitry, and/or other functions that provide the functionality described above. Device. In a simple embodiment, those skilled in the art will appreciate that the destination data node can also take the form shown in FIG. For example, the functions implemented by the data operation instruction unit 710, the reception unit 720, the log buffer unit 730, the upgrade unit 740, and the log synchronization unit 750 can all be implemented by the processor 101 and the memory 102 in FIG. For example, the data operation instruction processing unit 710 executing the data operation instruction sent by the application client may be executed by the processor 101. The code stored in the line memory 102 is implemented.

The processor for implementing the above-mentioned data node of the present invention may be a central processing unit (CPU), a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), and a field programmable gate array (FPGA). Or other programmable logic device, transistor logic device, hardware component, or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure. The processor may also be a combination of computing functions, for example, including one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.

Those of ordinary skill in the art will appreciate that various aspects of the present invention, or possible implementations of various aspects, may be embodied as a system, method, or computer program product. Thus, aspects of the invention, or possible implementations of various aspects, may be in the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, etc.), or a combination of software and hardware aspects, They are collectively referred to herein as "circuits," "modules," or "systems." Furthermore, aspects of the invention, or possible implementations of various aspects, may take the form of a computer program product, which is a computer readable program code stored in a computer readable medium.

The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims

A distributed database data synchronization method is applied to the destination data node in the distributed database, and the destination data node further includes: when executing the data operation instruction sent by the application client,

Receiving a data replication log sent by the source data node, where the replication log carries data that needs to be synchronized to the destination node;

Obtaining a version of the source data node, and determining that the version of the source data node is higher than its own version, and buffering the replication log;

Receiving an upgrade instruction sent by the metadata server, where the upgrade instruction is used to instruct the destination data node to perform a version upgrade;

The upgrade is performed according to the upgrade instruction, and after the upgrade is completed, data replication is performed according to the cached replication log.
The method according to claim 1, wherein the obtaining the version of the source data node comprises:

Obtaining, according to the data replication log, a version of the source data node, where the data replication log carries a version of the source data node; or

Receiving a notification message sent by the source data node, where the notification message carries a version of the source data node, and obtains a version of the source data node according to the notification message.
The method according to claim 2, wherein the method further comprises: [data log synchronization to the source data node]

When the data operation instruction is executed, generating a data replication log, where the data replication log carries data that needs to be synchronized to a data node corresponding to the destination data node;

Sending the data replication log to the source data node.
The method according to claim 1, further comprising: [the source node version is low]

When it is determined that the version of the source data node is lower than its own version, data replication is performed according to the replication log.
The method according to claim 3, wherein the sending the data replication log to the source data node comprises:

The replication log is placed in a local transmit queue and sent to the source data node.
A destination data node of a distributed database, comprising:

a data operation instruction processing unit, configured to execute a data operation instruction sent by the application client;

a receiving unit, configured to receive a data replication log sent by the source data node, where the replication log carries data that needs to be synchronized to the destination node;

a log cache unit, when the version of the source data node is obtained, and the version of the source data node is determined to be higher than its own version, the replication log is cached;

The receiving unit is further configured to receive an upgrade instruction sent by the metadata server, where the upgrade instruction is used to indicate that the destination data node performs a version upgrade.

An upgrade unit, configured to perform an upgrade according to the upgrade instruction;

The log synchronization unit is configured to perform data replication according to the cached replication log after the upgrade is completed.
The destination data node according to claim 6, wherein the obtaining, by the log cache unit, the version of the source data node comprises:

The log cache unit acquires a version of the source data node according to the data replication log, where the data replication log carries a version of the source data node; or

The log buffer unit receives a notification message sent by the source data node, where the notification message carries the source number According to the version of the node, the version of the source data node is obtained according to the notification message.
The destination data node according to claim 7, wherein the data node further comprises:

a log generating unit, configured to: when the data operation instruction is executed, generate a data replication log, where the data replication log carries data that needs to be synchronized to a data node corresponding to the destination data node;

And a sending unit, configured to send the data replication log to the source data node.
The destination data node according to claim 6, wherein the log synchronization unit is further configured to perform data replication according to the replication log when determining that the version of the source data node is lower than its own version.
The destination data node according to claim 8, wherein the sending, by the sending unit, the data replication log to the source data node comprises:

The sending unit sends the replication log to a local sending queue to the source data node.