CN115396454A

CN115396454A - Data copying method and device, storage node and readable storage medium

Info

Publication number: CN115396454A
Application number: CN202211166317.3A
Authority: CN
Inventors: 林杰
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2022-11-25

Abstract

The invention relates to the technical field of distributed storage, and provides a data replication method, a device, a storage node and a readable storage medium, wherein the method comprises the following steps: receiving a data synchronization request sent by a first slave node, wherein data required to be synchronized by the data synchronization request is synchronized to the first slave node by a master node based on an update request sent by a client; recording a log of data synchronization requests; and updating the data required to be synchronized by the data synchronization request to the local. The invention can provide high throughput and high available service on the premise of ensuring strong consistency, and is well suitable for application scenes of high copy redundancy, cross-data center and cross-region data distribution.

Description

Data copying method and device, storage node and readable storage medium

Technical Field

The invention relates to the technical field of distributed storage, in particular to a data replication method, a data replication device, a storage node and a readable storage medium.

Background

In order to improve the disaster tolerance redundancy capability of data, expand the overall throughput of the system, or reduce the data access delay, the distributed storage system is designed, and data is usually copied and stored across nodes, data centers, or even regions.

The existing data replication mode ensures strong consistency, and the mode is not suitable for application scenes with high copy redundancy and cross-data center and cross-region data distribution at the cost of sacrificing performance and availability to a certain extent.

Disclosure of Invention

The invention aims to provide a data replication method, a data replication device, a storage node and a readable storage medium, which can provide high throughput and high available service on the premise of ensuring strong consistency and are well suitable for application scenarios of high replica redundancy, cross-data center and cross-region data distribution.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides a data replication method, which is applied to a second slave node in a distributed storage system, where the second slave node is in communication connection with both a first slave node and a master node, and the master node is in communication connection with a client, where the method includes: receiving a data synchronization request sent by the first slave node, wherein data required to be synchronized by the data synchronization request is synchronized to the first slave node by the master node based on an update request sent by the client, the master node and the first slave node are synchronously copied, and the first slave node and the second slave node are asynchronously copied; recording a log of the data synchronization request; and updating the data required to be synchronized by the data synchronization request to the local.

Optionally, the data synchronization request sent by the first slave node includes a plurality of requests sent in a sending order, and the step of recording a log of the data synchronization request includes:

judging whether a normal request or an abnormal request exists according to the receiving sequence and the sending sequence of the received requests;

if the normal request exists, recording a log of the normal request;

if the abnormal request exists, sending a retransmission request indicating that the abnormal request is retransmitted to the first slave node so as to record a log of normal requests when the second slave node receives the normal requests.

Optionally, the step of determining whether there is a normal request or an abnormal request according to the receiving order and the sending order of the received multiple requests includes:

determining a request, of the received requests, whose receiving order is consistent with the corresponding sending order, as a normal request;

and judging the rest requests except the normal request in the sent requests as the abnormal requests.

Optionally, the master node stores a history identifier of a data synchronization request last processed in a last cycle of the second slave node; the method further comprises the following steps:

acquiring the current identifier of the data synchronization request processed at the last in the period;

and sending the current identification to the master node so that the master node deletes the locally stored log of the data synchronization request between the current identification and the historical identification.

Optionally, the second slave node is a target second slave node in a plurality of second slave nodes in the distributed storage system, the target second slave node is further in communication connection with a monitoring node, and a data synchronization request processed by the target second slave node last time is latest, and the method further includes:

receiving a switching request sent by the monitoring node, wherein the switching request is initiated when the monitoring node detects that a fault node exists in the distributed storage system;

and establishing a synchronization relationship between the target second slave node and a non-failure node in the distributed storage system based on the switching request.

Optionally, the failed node is the master node, and the step of establishing a synchronization relationship between the target second slave node and a non-failed node in the distributed storage system based on the handover request includes:

establishing a synchronous replication relationship with the first slave node based on the switching request;

taking over the first slave node to cause the first slave node to take over the master node.

Optionally, the failed node is the first slave node, and the step of establishing a synchronization relationship between the target second slave node and a non-failed node in the distributed storage system based on the handover request includes:

establishing a synchronous replication relationship with the main node based on the switching request;

taking over the first slave node.

Optionally, the failed node is the master node and the first slave node, and the step of establishing a synchronization relationship between the target second slave node and a non-failed node in the distributed storage system based on the handover request includes:

taking over the master node based on the handover request;

establishing a synchronous replication relationship with a second slave node proxied among the second slave nodes except the target second slave node to enable the proxy second slave node to take over the first slave node, wherein the proxy second slave node is the latest second slave node for the data synchronization request processed most recently in the second slave nodes except the target second slave node.

Optionally, the method further comprises:

receiving node information of newly added nodes in the distributed storage system, which is sent by the monitoring node;

and synchronizing the local data to the newly added node in a full synchronization mode according to the node information.

In a second aspect, the present invention provides a data replication apparatus, applied to a second slave node in a distributed storage system, where the second slave node is communicatively connected to both a first slave node and a master node, and the master node is communicatively connected to a client, the apparatus including:

a receiving module, configured to receive a data synchronization request sent by the first slave node, where data to be synchronized by the data synchronization request is synchronized to the first slave node by the master node based on an update request sent by the client, a synchronous copy is performed between the master node and the first slave node, and an asynchronous copy is performed between the first slave node and the second slave node;

the recording module is used for recording the log of the data synchronization request;

and the updating module is used for updating the data required to be synchronized by the data synchronization request to the local.

In a third aspect, the present invention provides a storage node, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the data replication method as described above when executing the computer program.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data replication method as described above.

Compared with the prior art, the storage nodes in the distributed storage system are divided into the master node, the first slave node and the second slave node, the client sends the update request to the master node, the master node synchronizes the update synchronization request to the first slave node, after the first slave node returns the synchronization success to the master node, the master node returns the update request to the client for processing, the first slave node sends the data synchronization request to the second slave node in an asynchronous mode, and the second slave node updates the data required to be synchronized by the update synchronization request to the local.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic diagram of a master/backup copy protocol according to an embodiment of the present invention.

Fig. 2 is a schematic diagram illustrating a chain replication protocol according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a ring replication protocol according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of an extended ring replication protocol according to an embodiment of the present invention.

Fig. 5 is a schematic block diagram of a storage node according to an embodiment of the present invention.

Fig. 6 is a first flowchart illustrating a data replication method according to an embodiment of the present invention.

Fig. 7 is a second flowchart illustrating a data replication method according to an embodiment of the present invention.

Fig. 8 is a third flowchart illustrating a data replication method according to an embodiment of the present invention.

Fig. 9 is a diagram illustrating interaction in processing an Update request according to an embodiment of the present invention.

FIG. 10 is a diagram illustrating interaction processing of a Query request according to an embodiment of the present invention.

Fig. 11 is a flowchart illustrating a data replication method according to an embodiment of the present invention.

Fig. 12 is a block diagram illustrating a data replication apparatus according to an embodiment of the present invention.

Icon: 10-a second slave node; 20-a first slave node; 30-a master node; 40-a client; 50-a storage node; 51-a processor; 52-a memory; 53-bus; 54-a communication interface; 100-a data replication device; 110-a receiving module; 120-a recording module; 130-update module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.

Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

In a distributed storage system, in order to improve data disaster tolerance redundancy capability, or extend overall throughput of the system, or reduce data access delay, a commonly used replication principle includes a Leader-based replication protocol and a Leader replication protocol, in which the former divides a storage node into a master node and a slave node, and the latter does not divide the storage node into the master node and the slave node.

In the Primary/Backup replication protocol, a Primary node (Primary node) is responsible for determining an Update/Query request execution sequence, and maintains data synchronization with a Backup node (Backup node) in a synchronous/asynchronous replication manner, please refer to fig. 1, which is a schematic diagram of the Primary/Backup replication protocol provided by the embodiment of the present invention, in fig. 1, one replica storage node is selected as the Primary node, the remaining replica storage nodes are all used as Backup nodes, and the Primary node is mainly responsible for:

i) Receiving and determining an execution sequence of an Update/Query request sent by a Client;

II) distributing an Update request to the Backup node and waiting for a response of a normal Backup node;

III) responding to the Client request.

Service failure recovery in the Primary/Backup replication protocol is classified into the following two categories:

1) Primary node failure recovery

The Primary node failure can cause Client foreground service interruption until a monitoring node reselects and informs a Backup node to upgrade to a new Primary node, the service interruption duration is different within tens of seconds, and the main overhead is service failure detection and message interaction. Moreover, in the case of asynchronous replication between Primary/Backup, the Update request that has been received but not distributed before the Primary node failure is lost, resulting in inconsistent data.

2) Backup node fault recovery

The failure of any Backup node can cause the delay of the Update/Query request to be increased until the monitoring node detects the failure of the Backup node and informs the Primary. The reason why the Query request is delayed is that it needs to wait for the previous Update request to complete. The time delay of the Update/Query request is increased by several seconds, and the main overhead is service failure detection and message interaction. In particular, in the case of synchronous replication between Primary/Backup, failure of any Backup node may cause interruption of foreground traffic. Therefore, in practical applications, a semi-synchronization policy is usually configured.

The Primary/Backup replication protocol has the advantage of low replication delay, replication demonstration only depends on a Backup node which responds slowest, but the delay of a Query request is high, and the risk of service interruption and data loss exists when the Primary node is down.

The Chain Replication protocol of Chain Replication belongs to the leader Replication protocol, in the Chain Replication protocol, storage nodes of a plurality of data copies sequentially form a Chain table, an Update request is processed by a HEAD storage node and is propagated In a serial mode along a First-In First-Out (FIFO) Chain, and a Query request and a Reply response are responsible for a TAIL storage node. Referring to fig. 2, fig. 2 is a schematic diagram illustrating a chain replication protocol according to an embodiment of the present invention, in fig. 2, all the replica storage nodes sequentially form a FIFO chain table, a HEAD node of the chain table is referred to as HEAD node, a TAIL node of the chain table is referred to as TAIL node, and the main responsible processes are:

i) All Reply responses are taken charge of by the TAIL node;

II) the HEAD node is responsible for receiving and confirming the execution sequence of the Update requests sent by the Client and transmitting the Update requests along the FIFO chain until reaching the TAIL node;

and III) the TAIL node is responsible for receiving the Query request sent by the Client and processing the Update/Query requests according to a certain sequence. After the Update request is processed, an ACK message is sent to its upstream node, and the ACK message is passed along the reverse FIFO chain until the HEAD node.

The service failure recovery of the Chain Replication protocol is divided into the following three types:

1) HEAD node failure recovery

The HEAD node fails, the Query request is not affected, but the Update request service is interrupted until the monitoring node detects the HEAD node failure and informs the downstream nodes of the monitoring node to promote to a new HEAD.

2) MIDDLE node (the rest nodes except HEAD node and TAIL node) fault recovery

And if any MIDDLE node fails, the Query request is not influenced, but the time delay of the Update request is increased until the monitoring node detects the MIDDLE node failure and informs the downstream node and the upstream node of the MIDDLE node failure. To ensure strong consistency, it is necessary to ensure that Update requests received but not passed before the MIDDLE node failure can continue to pass along the FIFO linked list because they have completed processing in all nodes in front of the linked list.

3) TAIL node failure recovery

Failure of a TAIL node may cause foreground traffic to be interrupted until the monitoring node notifies its upstream node to promote a new TAIL node. The reason for the interruption of the service of the Update request is that the new TAIL node cannot confirm which requests need to Reply a Reply response.

The Chain Replication protocol Chain has the advantages of strong consistency and low Query request delay, N-1 nodes are allowed to be down at the same time without affecting the data availability, but the Replication delay is high and is equal to the sum of the Replication delay of all the replica storage nodes.

In view of the above, the embodiment provides a data replication method, apparatus, storage node, and readable storage medium, which can provide high throughput and high available service on the premise of ensuring strong consistency, and is particularly suitable for application scenarios of high copy redundancy, cross-data center, and cross-region data distribution, and the following describes details thereof.

Different from the Primary/Backup Replication protocol and the Chain Replication protocol described above, the data Replication method, apparatus, storage node, and readable storage medium proposed in this embodiment propose an annular Replication Ring Replication protocol based on this embodiment, and have a main innovation point in that a foreground Update/Query request processing logic is separated from a background data Replication logic, so as to implement load relief offload on a Primary copy storage node, and finally, on the premise of ensuring strong consistency, high throughput and high available service can be provided. Referring to fig. 3, fig. 3 is a First schematic diagram illustrating a ring replication protocol according to an embodiment of the present invention, in fig. 3, storage nodes of two optional data copies are respectively used as a Master node (Master node 30) and a First Slave node (First Slave node 20). As a better implementation manner, the Master node and the First Slave node may be located in the same machine room or the same data center. The Master node and the First Slave node are in communication connection, data synchronization is kept by adopting a synchronous replication mode, storage nodes of other data copies are all used as Second Slave nodes (10), the Second Slave nodes can be located in different data centers and different regions, the First Slave nodes and the Second Slave nodes are in communication connection, and data synchronization is kept by adopting an asynchronous replication mode.

All Update requests are handled by the Master node. The Master node receives an Update request sent by a Client (Client 40), records a copy log firstly, executes modification operation in a background, and synchronously copies the copy log to the First Slave node. And replying the Client request response until receiving the response of the First Slave node.

And the First Slave node receives the Update request of Master node synchronization, records the replication log firstly, and replies a Master node synchronization request response after the replication log is completed. Then, the background task performs the modification operation and synchronizes all Second Slave nodes with asynchronous replication.

And the Second Slave node receives the Update request synchronized with the First Slave node, records the copy log and executes modification operation in the background.

The Query request is processed by the Master node and the First Slave node together. The Master node or the First Slave node can process the Query request according to the Client selection. The Master node and the First Slave node have completely consistent data copies, so the Ring Replication protocol has strong consistency. Under the scene that the requirement on strong consistency is not high and the requirement on the whole throughput of the system is high, the Query request can be distributed to a Second Slave node for processing, but the strong consistency is sacrificed at the moment, and only monotonous reading consistency guarantee is provided.

And the Second Slave node periodically sends a Log Trim Log deletion request to the Master node, wherein the request carries the identifier of the currently completed Update request. The Master node records the current latest copy log identification of all Second Slave nodes, then deletes the locally expired copy log, and carries the copy log identification in the synchronization request message sent to the First Slave next time.

Fig. 3 is a schematic diagram of an example of the principle of an extended ring replication protocol according to an embodiment of the present invention, and fig. 4 is a schematic diagram of the principle of the extended ring replication protocol, where in fig. 4, one of the Second Slave nodes is further extended, and the Second Slave node is simultaneously used as a Master node of an extended ring and is synchronously replicated between the extended first Slave node and the extended Second Slave node, and is asynchronously replicated between the extended first Slave node and the extended Second Slave node.

It should be noted that fig. 4 is only an example of an extension manner, and in fact, corresponding extension may be performed according to a scene requirement, for example, a Second Slave node is simultaneously used as an extended Master node and an extended first Slave node.

It should be further noted that the storage nodes for data backup in the above 3 replication protocols do not necessarily include all storage nodes in the distributed storage system, and may include only some storage nodes in the distributed storage system, where when a data copy exists in all storage nodes in the distributed storage system, the storage nodes for data backup are all storage nodes in the distributed storage system, and when a data copy exists in some storage nodes in the distributed storage system, the storage nodes for data backup are storage nodes for storing a data copy.

Fig. 5 is a schematic block diagram of a storage node 50 according to an embodiment of the present invention, where the storage node 50 may be the second slave node in fig. 3 and 4. The storage node 50 comprises a processor 51, a memory 52, a bus 53, a communication interface 54. The processor 51, the memory 52 are connected by a bus 53, and the processor 51 communicates with the first slave node 20 or the master node 30 via a communication interface 54.

The processor 51 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 51. The Processor 51 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.

The memory 52 is used for storing programs, such as the data copying apparatus 100 in the embodiment of the present invention, and each data copying apparatus 100 includes at least one software functional module which can be stored in the memory 52 in the form of software or firmware (firmware), and the processor 51 executes the programs after receiving the execution instruction to implement the data copying method in the embodiment of the present invention.

The Memory 52 may include a Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory). Alternatively, the memory 52 may be a storage device built in the processor 51, or may be a storage device independent of the processor 51.

The bus 53 may be an ISA bus, a PCI bus, an EISA bus, or the like. Fig. 5 is indicated by only one double-headed arrow, but does not indicate only one bus or one type of bus.

Based on the principle of the ring replication protocol in fig. 3 and fig. 4, this embodiment further provides a data replication method implemented by using the principle of the ring replication protocol, where the data replication method is applied to the second slave node in fig. 3 and fig. 4, please refer to fig. 6, and fig. 6 is a first flowchart of the data replication method provided by the embodiment of the present invention, where the method includes the following steps:

step S100, receiving a data synchronization request sent by a first slave node, where data to be synchronized by the data synchronization request is synchronized to the first slave node by a master node based on an update request sent by a client, the master node and the first slave node are synchronously copied, and the first slave node and a second slave node are asynchronously copied.

In this embodiment, the client sends an update request to the master node, the master node records a log, and sends a data synchronization request to the first slave node, where the data to be synchronized by the data synchronization request is data that needs to be updated by the master node according to the update request, the first slave node sends a response message to the master node after receiving the data synchronization request and records the log, and the master node responds to the client after receiving the response message of the first slave node. At this time, the update request is processed completely, that is, the master node and the first slave node are synchronously copied, which is a processing process of the update request to ensure strong consistency of data.

In the embodiment, the background task of the first slave node asynchronously synchronizes the data to be synchronized to the local, and then sends the data synchronization request to the second slave node to synchronize the data to be synchronized to the second slave node, that is, the data to be synchronized is written into the first slave node and is synchronized from the first slave node to the second slave node, both processes are asynchronous replication and are data synchronization processes, so that the processing process of the update request and the data synchronization process are separated, and the high delay caused by the fact that the data to be synchronized can not respond to the client after all the data to be synchronized are synchronized to the first slave node and the second slave node is avoided.

Step S101, records a log of the data synchronization request.

In the present embodiment, the purpose of logging is to correctly update the data to be synchronized according to the logged log when a node failure occurs. As a specific implementation manner, an identification ID may be set for each update request, the log ID of each update request and the log ID of the corresponding data synchronization request are the same, for example, the client sends an update request a to the master node, the log ID corresponding to the request a is 1, the master node sends a data synchronization request a 'to the first slave node to synchronize data that the a needs to synchronize, the first slave node records a log after receiving the a', the corresponding log ID is also 1, the first slave node sends a synchronization request a 'to the second slave node, the second slave node records a log after receiving the a', and the corresponding log ID is also 1.

Step S102, updating the data required to be synchronized by the data synchronization request to the local.

In this embodiment, after the second slave node updates the data to be synchronized to the local, the master node, the first slave node, and the second slave node all update the data to be synchronized, and the data of the master node, the data of the first slave node, and the data of the second slave node are consistent.

The method provided by the embodiment separates the update request and the data synchronization request, and can not only realize that the update request can be responded in time and ensure high throughput, but also realize data replication among the master node, the first slave node and the second slave node and ensure strong consistency and high availability.

In this embodiment, a first slave node may send a request to a second slave node in batch for data synchronization, and in order to avoid that a failure occurs when the first slave node sends the request to the second slave node in batch, which causes the second slave node to fail to correctly update data to be synchronized to the local, and causes data on the first slave node and the second slave node to be inconsistent, on the basis of fig. 6, this embodiment further provides a specific implementation manner for recording a log of the data synchronization request, please refer to fig. 7, fig. 7 is a second example of a flow of a data replication method provided by an embodiment of the present invention, and step S101 includes the following sub-steps:

the substep S1010 determines whether there is a normal request or an abnormal request according to the receiving order and the sending order of the received requests.

In this embodiment, the data synchronization request sent by the first slave node includes a plurality of requests sent according to a sending sequence, where the sending sequence is a sequence in which the second slave node needs to update data according to the data synchronization request, and if the receiving sequence of the second slave node is not consistent with the sending sequence, an exception may occur in the request sending process, and at this time, the request in which the exception occurs needs to be identified, so as to process the request for the exception, for example, resend the request for the exception.

As a specific implementation manner, the process of determining whether there is a normal request or an abnormal request may be:

and judging the rest requests except the normal requests in the sent requests as abnormal requests.

In this embodiment, all the request receiving orders in the plurality of requests may be consistent with the corresponding sending orders, at this time, all the requests in the plurality of requests are normal requests, or part of the request receiving orders in the plurality of requests may be consistent with the corresponding sending orders, at this time, the consistent part of the requests is normal requests, and the rest of the requests are abnormal requests. For example, if the first slave node sends requests a, b, c, and d in the sending order, and the second slave node receives requests a, b, and d, then a and b are normal requests, and c and d are abnormal requests.

In sub-step S1011, if there is a normal request, a log of the normal request is recorded.

Sub-step S1012, if there is an abnormal request, sends a retransmission request indicating to retransmit the abnormal request to the first slave node so as to record a log of the normal request when the second slave node receives the normal request.

In this embodiment, if a and b are normal requests and c and d are abnormal requests, logs of the requests a and b are recorded, and a first slave node is requested to resend the requests c and d, and when the receiving sequence of the requests c and d received by a second slave node is consistent with the sending sequence of the requests c and d, logs of the requests c and d are recorded.

In this embodiment, as a specific implementation manner, the log for recording the normal request may be implemented in a Write-Ahead Logging (WAL) manner, where the WAL technique refers to writing the log and the memory first, and then writing the hard disk when the log and the memory are not busy, and functions to change random writing into sequential writing and reduce the consumption of writing the disk. And allocating a unique self-increment identification ID for each request, wherein the log IDs of the same request, recorded by the master node, the first slave node and the second slave node and corresponding to the request, are the same, and the log recording mode can be a WAL mode.

In this embodiment, in order to delete the synchronously completed log recorded on the master node in time, a corresponding processing manner is further provided in this embodiment, please refer to fig. 8, where fig. 8 is a third exemplary flow chart of the data replication method provided in this embodiment of the present invention, and the method includes the following steps:

step S103, obtain the current identifier of the data synchronization request processed last in this period.

In this embodiment, the number of the second slave nodes may be one or more, and each second slave node periodically sends a Log deletion Trim Log request to the master node, where the request carries an identifier corresponding to a data synchronization request that is last processed by the second slave node in this period, that is, a current identifier.

And step S104, sending the current identification to the master node so that the master node deletes the log of the data synchronization request between the locally stored current identification and the history identification.

In this embodiment, the master node stores the history identifier of the data synchronization request last processed in the last cycle of each second slave node, and the log between the current identifier and the history identifier is the log of the synchronized request, and may be deleted.

In this embodiment, as a specific implementation manner, the master node may record the log in a file, the size of the file may be preset to 64MB, when one file is full, a new file is automatically created, and then the log is recorded in the new file, and the log of the request that has been completed synchronously may be deleted, and when there are a plurality of second slave nodes, in order to avoid frequent file operation, the request corresponding to the log of the entire file may be completely processed by all the second slave nodes, and then the file may be deleted. For example, one file may record 10000 logs, the log recorded in file 1 is 1-10000, and after all the second slave nodes have completely updated the data of the 10000 logs that need to be updated, file 1 may be deleted.

It should be further noted that the master node carries the current identifier in the synchronization request message sent to the first slave node next time, so that the first slave node deletes the current identifier and the log before the current identifier.

In the method provided by this embodiment, the second slave node periodically notifies the master node of the latest updated log of the local node, so that the master node deletes the log that is no longer needed in time and releases the occupied space in time.

In order to more clearly illustrate the processing flow of the Update request and the Query request, this embodiment is described with a specific example.

Taking 5 storage nodes of the data copy as an example, 1 Master node, 1 First Slave node, and 3 Second Slave nodes in the 5 storage nodes. The rear end of each node adopts a RocksDB storage engine, and all key value pairs KV pair are stored in a mechanical hard disk. And the copy log is stored in the solid state disk in a WAL mode according to the granularity of 64MB size of each log file.

Referring to fig. 9, fig. 9 is a diagram illustrating an example of interaction in processing an Update request according to an embodiment of the present invention, where the interaction process is as follows:

s11: the Client sends an Update request r to the Master node;

s12: after receiving the request r, the Master node allocates a unique log ID for the Master node and records a replication log;

s13: synchronously copying the data to a First Slave node and waiting for a response;

s14: after receiving the request r, the First Slave node directly records a replication log;

s15: after the First Slave node finishes, an ACK (r) response is replied to the Master node;

s16: after receiving the ACK (r) response, the Master node responds to the Client request;

s17: the Master node replays the replication logs one by one in a background task mode and modifies the rear-end RocksDB;

s18: the First Slave node synchronizes the replication logs to all Second Slave nodes in an asynchronous replication mode, simultaneously replays the replication logs and modifies the rear-end rocksDB;

s19: and the Second Slave node receives the request r, directly records the replication log, and then modifies the backend rocksDB.

Referring to fig. 10, fig. 10 is a diagram illustrating an interaction example of processing a Query request according to an embodiment of the present invention, where the interaction process is as follows:

s21: the Client selects a Master node or a First Slave node according to the ID of the Client, and sends a Query request r' to the Master node by taking the Master node as an example;

s22: after receiving the request r', the Master node looks up in the RocksDB,

s23: responding to the Client request.

In this embodiment, based on the principle of the ring replication protocol in fig. 3 and fig. 4, in order to ensure high reliability without interrupting the service in the event of a failure of each node, this embodiment also provides a method for handling various failures. Referring to fig. 11, fig. 11 is a fourth exemplary flowchart of a data replication method according to an embodiment of the present invention, where the method includes the following steps:

step S105, receiving a switching request sent by the monitoring node, wherein the switching request is initiated when the monitoring node detects that a fault node exists in the distributed storage system.

In this embodiment, the monitoring node can be used to implement service failure detection and recovery, and the monitoring node is responsible for keeping the heartbeat alive of all the nodes. The monitoring node may be an independent device independent of the master node, the first slave node, and the second slave node, or may be software deployed in the master node, the first slave node, and the second slave node, such as Zookeeper, which can timely acquire the operating conditions of the nodes.

In this embodiment, the failed node may be at least one of the master node, the first slave node, and the second slave node.

And step S106, establishing a synchronous relation between the target second slave node and the non-fault node in the distributed storage system based on the switching request.

In this embodiment, the target second slave node is one of a plurality of second slave nodes in the distributed storage system, the target second slave node is further connected to the monitoring node in a communication manner, the data synchronization request processed by the target second slave node last time is latest, and taking as an example that each request corresponds to one log ID and the log ID is incremented, the log ID of the data synchronization request processed by the target second slave node last time is the largest.

In this embodiment, the non-failure node may also be at least one of the master node, the first slave node, and the second slave node. The synchronous relationship comprises a synchronous replication relationship between the master node and the first slave node and an asynchronous replication relationship between the first slave node and the second slave node and between the second slave node and the master node.

Service failures include at least the following four categories: (1) master node failure; (2) a failure of the first slave node; (3) both the master node and the first slave node fail; (4) The second slave node fails, and the four failure handling ways will be described below.

(1) The processing mode of the main node fault is as follows:

the first slave node is taken over so that the first slave node takes over the master node.

In this embodiment, the monitoring node sends a handover request to the target second slave node, the target second slave node establishes a synchronous replication relationship with the first slave node based on the handover request, the target second slave node takes over the first slave node, the target second slave node becomes a new first slave node, the monitoring node sends the handover request to the first slave node, the handover request carries information of the new first slave node, and the first slave node takes over the master node based on the handover request.

According to the processing flow, the main node is failed, the Query request is not affected, and the Update request service is interrupted for a short time until the monitoring node detects the main node failure, selects the target second slave node again and promotes the target second slave node to be the new first slave node, and informs the original first slave node to promote the target second slave node to be the new main node. Since the first slave node and the master node have completely consistent data copies, the service interruption duration is very short, and even 0 interruption can be achieved.

(2) The processing mode of the first slave node failure is as follows:

the first slave node is taken over.

The monitoring node sends a switching request to a target second slave node, the target second slave node establishes a synchronous replication relationship with the master node based on the switching request, takes over the first slave node, the target second slave node becomes a new first slave node, and the monitoring node sends information of the new first slave node to the master node.

According to the processing flow, the first slave node fails, the Query request is not affected, and the time delay of the Update request is increased until the monitoring node detects that the first slave node fails and reselects the target second slave node to be promoted to be a new first slave node.

(3) The processing mode of the failure of both the master node and the first slave node is as follows:

taking over the master node based on the switching request;

and establishing a synchronous replication relationship with a second slave node proxied in the second slave nodes except the target second slave node so that the second slave node is proxied to take over the first slave node, wherein the second slave node is the newest second slave node for the data synchronization request processed most recently in the second slave nodes except the target second slave node. Actually, the IDs of the data synchronization requests processed most recently by the target second slave node and the proxy second slave node may be the same or different, for example, there are 3 second slave nodes, which are respectively nodes a, b, and c, and if the ID of the data synchronization request processed most recently is 3,3,3, in this case, two of the 3 second slave nodes may be selected as the target second slave node and the proxy second slave node, respectively, and if the ID of the data synchronization request processed most recently is 3,2,1, node a is set as the target second slave node and node b is set as the proxy second slave node.

In this embodiment, the proxy second slave node is a latest second slave node, the monitoring node sends a handover request to the target second slave node, the target second slave node takes over the master node based on the handover request, the target second slave node establishes a synchronous replication relationship with the proxy second slave node and becomes a new master node, the monitoring node sends the handover request to the proxy second slave node, and the proxy second slave node takes over the first slave node and becomes a new first slave node based on the handover request.

It should be further noted that, in an application scenario where storage nodes of replica data are distributed across data centers and regions, usually to support remote disaster recovery, in principle, when all storage nodes of replica data in a local data center fail at the same time, it is best to manually recover the storage nodes of remote replicas. If to support dual active in different places to reduce access latency, completely different distribution techniques are required, such as: automatic collision detection techniques, and the like.

(4) Failure of the second slave node

When monitoring that any second slave node fails, the monitoring node informs the first slave node, and the subsequent first slave node does not send a data synchronization request to the second slave node any more so as to synchronize data to the second slave node.

According to the processing flow, the Update/Query request is not affected when any second slave node fails. The reason is that the ring replication protocol separates foreground service processing logic from background data replication logic, the master node is responsible for the foreground service processing logic, and the first slave node is mainly responsible for the background data replication logic.

As can be seen from the above fault processing flow, the Ring Replication-based data Replication method, the Primary/Backup protocol and the Chain Replication have obvious advantages in the following aspects:

1) Throughput capacity

For the Primary/Backup protocol, all Update/Query requests are processed by Primary, with the latency depending on the slowest responding Backup node, and Primary nodes tend to become bottlenecks. For the Chain Replication protocol, the Update/Query request is separated to be processed by the HEAD node and the TAIL node, the Query time delay is shorter, but the Update time delay is higher and is equal to the sum of the time delays of all the copy nodes. The Ring Replication protocol also adopts a strategy of separating Update/Query requests, the Update requests are processed by the Master node, the time delay only depends on the First Slave node, and the Query requests are distributed to the Master node and the First Slave node. Therefore, the overall throughput of the Ring Replication protocol is the highest (about 2 times), the Update request delay is slightly better than that of the Primary/Backup protocol and is far lower than that of the Chain Replication protocol, and the Query request delay is equivalent to that of the Chain Replication and is better than that of the Primary/Backup protocol.

2) Replication topology

For a Primary/Backup protocol and a Chain Replication protocol, foreground service processing logic and background data Replication logic are completely coupled, physical topology of replica nodes is not sensed, and overall throughput and time delay of foreground service can be seriously influenced in an application scene that replica data is distributed across data centers and regions. And Ring Replication separates foreground service processing logic from background data Replication logic, and can well process the application scene. In practical application, the Master and the First Slave can be deployed in the same data center, and the Second Slave can be deployed in different places to support cross-data center and cross-regional disaster recovery backup.

3) Fault tolerance capability

For the Primary/Backup protocol, a Primary node failure may result in data loss and service interruption with the lowest level of security level service available. For the Chain Replication, N-1 nodes are allowed to be down without affecting data availability, and the security level and the service availability level are highest. The Ring Replication is between the Master node and the First Slave node, and the foreground service can be influenced only under the condition that the Master node and the First Slave node simultaneously fail.

In this embodiment, after the failed node is recovered, in order to timely restore the failed node to normal or enable a new node to be added as a storage node of the copy data, so that the failed node can timely play a role of copy backup, this embodiment further provides a specific processing manner applied to the target second slave node:

firstly, receiving node information of a newly added node in a distributed storage system, which is sent by a monitoring node.

In this embodiment, the newly added node may be a node that is added to the distributed storage system for the first time, or may be a master node, a first slave node, a second slave node, and the like that have failed before, and whichever node is, the newly added node is processed according to the node that is added to the distributed storage system for the first time.

And secondly, synchronizing the local data to the newly added node in a full synchronization mode according to the node information.

In this embodiment, the target second slave node synchronizes data of the target second slave node to the newly added node in a full synchronization manner according to the node information, so that the newly added node is added to the distributed storage system as a new second slave node.

In order to perform the corresponding steps in the above-described embodiments and various possible implementations, an implementation of the data replication device 100 is given below. Referring to fig. 12, fig. 12 is a block diagram illustrating a data copying apparatus according to an embodiment of the present invention. It should be noted that the basic principle and the resulting technical effect of the data copying apparatus 100 provided in the present embodiment are the same as those of the above embodiments, and for the sake of brief description, no reference is made to this embodiment.

The data copying apparatus 100 includes a receiving module 110, a recording module 120 and an updating module 130.

The receiving module 110 is configured to receive a data synchronization request sent by a first slave node, where data to be synchronized by the data synchronization request is synchronized to the first slave node by the master node based on the update request sent by the client, the master node and the first slave node are in synchronous replication, and the first slave node and the second slave node are in asynchronous replication.

The recording module 120 is configured to record a log of the data synchronization request.

Optionally, the recording module 120 is specifically configured to: judging whether a normal request or an abnormal request exists according to the receiving sequence and the sending sequence of the received multiple requests; if the normal request exists, recording a log of the normal request; if the abnormal request exists, a retransmission request indicating that the abnormal request is retransmitted is sent to the first slave node so as to record a log of the normal request when the second slave node receives the normal request.

Optionally, when the recording module 120 is configured to determine whether a normal request or an abnormal request exists according to a receiving order and a sending order of the received multiple requests, the recording module is specifically configured to: determining a request, of the received requests, whose receiving order is consistent with the corresponding sending order, as a normal request; and judging the rest requests except the normal request in the sent requests as abnormal requests.

And the updating module 130 is used for updating the data required to be synchronized by the data synchronization request to the local.

Optionally, the master node stores a history identifier of a data synchronization request last processed in a last cycle of the second slave node; the update module 130 is further configured to: acquiring the current identifier of the data synchronization request processed at the last in the period; and sending the current identification to the master node so that the master node deletes the log of the data synchronization request between the locally stored current identification and the history identification.

Optionally, the second slave node is a target second slave node in a plurality of second slave nodes in the distributed storage system, the target second slave node is further communicatively connected to the monitoring node, the data synchronization request that is processed by the target second slave node last time is latest, and the update module 130 is further configured to: receiving a switching request sent by a monitoring node, wherein the switching request is initiated when the monitoring node detects that a fault node exists in the distributed storage system; and establishing a synchronous relation between the target second slave node and the non-fault node in the distributed storage system based on the switching request.

Optionally, the failed node is a master node, and the updating module 130 is specifically configured to, when configured to establish a synchronization relationship between the target second slave node and a non-failed node in the distributed storage system based on the switching request: establishing a synchronous replication relationship with the first slave node based on the switching request; the first slave node is taken over so that the first slave node takes over the master node.

Optionally, the failed node is a first slave node, and the updating module 130 is specifically configured to, when configured to establish a synchronization relationship between the target second slave node and a non-failed node in the distributed storage system based on the handover request: establishing a synchronous replication relationship with the main node based on the switching request; the first slave node is taken over.

Optionally, the failed node is a master node and a first slave node, and the update module 130 is specifically configured to, when configured to establish a synchronization relationship between a target second slave node and a non-failed node in the distributed storage system based on the switching request: taking over the master node based on the switching request; and establishing a synchronous replication relationship with a second slave node proxied in the second slave nodes except the target second slave node so that the second slave node is proxied to take over the first slave node, and the second slave node is the latest second slave node for the data synchronization request processed most recently in the second slave nodes except the target second slave node.

Optionally, the updating module 130 is further configured to: receiving node information of a newly added node in a distributed storage system sent by a monitoring node; and according to the node information, synchronizing the local data to the newly added node in a full synchronization mode.

The present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data replication method as described above.

In summary, embodiments of the present invention provide a data replication method, an apparatus, a storage node, and a readable storage medium, which are applied to a second slave node in a distributed storage system, where the second slave node is communicatively connected to both a first slave node and a master node, and the master node is communicatively connected to a client, where the method includes: receiving a data synchronization request sent by a first slave node, wherein the data required to be synchronized by the data synchronization request is synchronized to the first slave node by a master node based on an update request sent by a client, the master node and the first slave node are synchronously copied, and the first slave node and a second slave node are asynchronously copied; recording a log of data synchronization requests; and updating the data required to be synchronized by the data synchronization request to the local. Compared with the prior art, the data replication method provided by the embodiment has at least the following advantages: 1) On the premise of ensuring strong consistency, higher throughput and lower time delay are provided; 2) No single point of failure exists, and better fault-tolerant capability is provided; 3) On the premise of not sacrificing the performance of foreground service, higher copy redundancy can be supported; 4) Better network tolerance capability, and is suitable for application scenes of data distribution across data centers and regions.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A data replication method is applied to a second slave node in a distributed storage system, the second slave node is in communication connection with both a first slave node and a master node, and the master node is in communication connection with a client, and the method comprises the following steps:

receiving a data synchronization request sent by the first slave node, wherein data required to be synchronized by the data synchronization request is synchronized to the first slave node by the master node based on an update request sent by the client, the master node and the first slave node are synchronously copied, and the first slave node and the second slave node are asynchronously copied;

recording a log of the data synchronization request;

and updating the data required to be synchronized by the data synchronization request to the local.

2. The data replication method of claim 1, wherein the data synchronization request transmitted by the first slave node includes a plurality of requests transmitted in a transmission order, and the step of recording a log of the data synchronization request includes:

if the normal request exists, recording a log of the normal request;

3. The data replication method of claim 2, wherein the step of determining whether there is a normal request or an abnormal request according to the receiving order and the transmitting order of the received plurality of requests comprises:

and judging the rest requests except the normal request in the sent plurality of requests as the abnormal requests.

4. The data replication method of claim 1, wherein the master node stores a history identification of a last processed data synchronization request last cycle on the second slave node; the method further comprises the following steps:

5. The data replication method of claim 1, wherein the second slave node is a target second slave node of a plurality of second slave nodes in the distributed storage system, the target second slave node further communicatively connected to a monitoring node, the target second slave node having the most recently processed data synchronization request up to date, the method further comprising:

establishing a synchronization relationship between the target second slave node and a non-failed node in the distributed storage system based on the switchover request.

6. The data replication method of claim 5, wherein the failed node is the master node, and the step of establishing a synchronization relationship between the target second slave node and a non-failed node in the distributed storage system based on the switchover request comprises:

establishing a synchronous replication relationship with the first slave node based on the handover request;

7. The data replication method of claim 5, wherein the failed node is the first slave node, and the step of establishing a synchronization relationship between the target second slave node and a non-failed node in the distributed storage system based on the switchover request comprises:

establishing a synchronous replication relationship with the master node based on the switching request;

taking over the first slave node.

8. The data replication method of claim 5, wherein the failed node is the master node and the first slave node, and the step of establishing a synchronization relationship between the target second slave node and a non-failed node in the distributed storage system based on the switchover request comprises:

taking over the master node based on the handover request;

9. The data replication method of claim 5, the method further comprising:

10. A data replication apparatus for use in a second slave node in a distributed storage system, the second slave node communicatively coupled to both a first slave node and a master node, the master node communicatively coupled to a client, the apparatus comprising:

a receiving module, configured to receive a data synchronization request sent by the first slave node, where data required to be synchronized by the data synchronization request is synchronized by the master node to the first slave node based on an update request sent by the client, a synchronous copy is performed between the master node and the first slave node, and an asynchronous copy is performed between the first slave node and the second slave node;

11. A storage node comprising a memory and a processor, wherein the memory stores a computer program which, when executed by the processor, implements a data replication method as claimed in any one of claims 1 to 9.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data replication method according to any one of claims 1 to 9.