CN116668470A - Remote multi-main storage method and device based on conflict-free copy data type - Google Patents

Remote multi-main storage method and device based on conflict-free copy data type Download PDF

Info

Publication number
CN116668470A
CN116668470A CN202310648085.3A CN202310648085A CN116668470A CN 116668470 A CN116668470 A CN 116668470A CN 202310648085 A CN202310648085 A CN 202310648085A CN 116668470 A CN116668470 A CN 116668470A
Authority
CN
China
Prior art keywords
data center
main data
time
update
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310648085.3A
Other languages
Chinese (zh)
Inventor
洪定乾
徐锐波
幸福
卢文伟
刘方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yunsizhixue Technology Co ltd
Original Assignee
Beijing Yunsizhixue Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunsizhixue Technology Co ltd filed Critical Beijing Yunsizhixue Technology Co ltd
Priority to CN202310648085.3A priority Critical patent/CN116668470A/en
Publication of CN116668470A publication Critical patent/CN116668470A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a remote multi-main storage method based on conflict-free copy data types, which comprises the following steps: the data in the synchronization service of the main data center A and the data in the synchronization service of the main data center B are subjected to data synchronization through the load balancing service; the synchronous data comprises a hash set stored in a state, and the storage structure of the hash set comprises: the meta information comprises meta key and meta value, the meta key stores service key, the meta value comprises a version storage set, an update_time storage set and a size, the version storage set comprises a value A set by a main data center A and a value B set by a main data center B, and the update_time storage set comprises time stamps updated by the main data center A and the main data center B; field information including field key and field value, the field key storing service key, field and self_version of the main data center, the field value including update_time storing set storing timestamp set updated by the main data center a and the main data center B and actual value.

Description

Remote multi-main storage method and device based on conflict-free copy data type
Technical Field
The invention relates to the technical field of data storage, in particular to a remote multi-main storage method and device based on conflict-free copy data types.
Background
Many enterprises can make data centers with different places and multiple activities for disaster recovery in different places or for improving the response speed of services nearby. Although a plurality of data centers are mainly and intensively arranged in one place, the writing data is synchronized to other centers in real time by a special line, so that the disadvantage is that the writing delay across a machine room is large, and when the data center where the main is arranged is problematic, the main is also required to be switched to the other centers, the time consumed by the fault is also long, and the scheme is in fact one-way synchronization. If there are multiple masters, the problem of data collisions in the synchronization needs to be resolved.
Therefore, the patent application aims to solve conflict-free data synchronization of the multi-master data center in different places, and the data of the multi-master data center achieves the aim of final consistency.
Disclosure of Invention
In order to solve the technical problems, the invention provides a remote multi-main storage method and a device based on conflict-free copy data types, and specifically adopts the following technical scheme:
the remote multi-main storage method based on conflict-free copy data type comprises the following steps:
respectively deploying synchronous services on each machine of the main data center A and each machine of the main data center B;
the data in the synchronization service of the main data center A and the data in the synchronization service of the main data center B are subjected to data synchronization through the load balancing service;
the synchronized data comprises a hash set, the hash set is subjected to stateful storage, and a storage structure of the hash set comprises:
the meta information comprises meta key and meta value, the meta key stores service key, the meta value comprises a version storage set, an update_time storage set and a size, the version storage set comprises a value A set by a main data center A and a value B set by a main data center B, and the update_time storage set comprises time stamps updated by the main data center A and the main data center B;
field information including field key and field value, the field key storing service key, field and self_version of the main data center, the field value including update_time storing set storing timestamp set updated by the main data center a and the main data center B and actual value.
As an optional implementation manner of the present invention, the method for performing Hset command operation on the hash set according to the present invention includes:
the main data center A executes an Hset command operation, sets a version storage set and an update_time storage set in meta information, updates self_version of field key in field information, and updates local update_time time timestamp and value in field value, wherein meta value stored in meta information in the main data center A is version { A, B }, and update_time is { A, B };
after receiving synchronous data of the main data center A, the main data center B updates meta information, if the version of the main data center A side in the meta information locally stored by the main data center B is smaller than the version of the main data center A side in the meta information of the synchronous data, the version and the update_time of the Beijing side are updated, field information is updated, whether the update_time timestamp in the synchronous data is larger than the update_time of the main data center A side and the update_time of the main data center B side which are locally stored by the main data center B is judged, and if the update_time of the main data center A side is larger than any value, the update_time of the main data center A side is not executed.
As an optional implementation manner of the invention, the method for performing the Hdel command operation on the hash set based on the multi-main storage method of the present invention based on the conflict-free copy data type comprises:
the main data center A executes the Hdel command operation, deletes the corresponding field, and updates the update_time of the main data center A in meta information as the current time stamp;
after the main data center B receives synchronous data of the main data center A, meta information is unchanged, field information is updated, whether an update_time stamp in the synchronous data is larger than an update_time of the main data center A side and an update_time of the main data center B side which are locally stored in the main data center B is judged, and if the update_time of the main data center A side in the meta information is larger than the update_time of the main data center A side, the field is deleted, and meanwhile the update_time of the main data center A side in the meta information is updated as a current time stamp.
As an optional implementation manner of the invention, the method for performing Del command operation on the hash set based on the conflict-free copy data type comprises the following steps:
time T1: performing Del command operation by the main data center A, deleting the key, adding 1 to the main data center A side version in meta information, wherein update_time is T1, and deleting field information;
time T2: after receiving synchronous data of the main data center A, the main data center B scans all field information, and aiming at the same field, the update_time of the main data center A side is smaller than the update_time in meta information of the synchronous data, and the field is deleted;
wherein, the time T1 and the time T2 are two adjacent times.
As an optional implementation manner of the invention, the method for performing Del command operation on the hash set based on the conflict-free copy data type comprises the following steps:
time T1: performing Del command operation by the main data center A, deleting the key, adding 1 to the main data center A side version in meta information, wherein update_time is T1, and deleting field information;
time T2: after the main data center B receives the synchronous data of the main data center A at the moment T1, the moment T2 and the moment T1 have delay, the synchronous data needs to be judged and executed together according to version and update_time, if the version of the local side meta information in the main data center B is larger than the version of the main data center B side meta information in the synchronous data, the synchronous data is ignored.
As an optional implementation mode of the invention, the remote multi-main storage method based on the conflict-free copy data type carries out stateful storage aiming at the value, wherein the data type of the value comprises a string type;
the main data center A executes a set command operation aiming at the string type, when the main data center B receives the data synchronization of the main data center A, the string type data stored in the main data center B is written into the main data center B at the latest according to the update_time timestamp of the value according to the time-offset sequence.
As an optional implementation manner of the invention, the multi-main storage method based on the conflict-free copy data type performs the del operation after performing the counter operation on the value, and converts the del operation into the counter operation for the data before performing the counter operation when the local data center performs the del operation.
The invention also provides a multi-main storage device based on conflict-free copy data type, comprising:
the synchronous service module is respectively deployed on each machine of the main data center A and each machine of the main data center B;
the data in the synchronous service of the main data center A is subjected to data synchronization with the data in the synchronous service of the main data center B through the load balancing service module;
the synchronized data comprises a hash set, the hash set is subjected to stateful storage, and a storage structure of the hash set comprises:
the meta information comprises meta key and meta value, the meta key stores service key, the meta value comprises a version storage set, an update_time storage set and a size, the version storage set comprises a value A set by a main data center A and a value B set by a main data center B, and the update_time storage set comprises time stamps updated by the main data center A and the main data center B;
field information including field key and field value, the field key storing service key, field and self_version of the main data center a, the field value including update_time storing set storing timestamp set updated by the main data center a and the main data center B and actual value.
The invention also provides a computer readable storage medium storing a computer executable program which, when executed, implements the multi-master remote storage method based on conflict-free copy data types.
The invention also provides an electronic device comprising a processor and a memory for storing a computer executable program, which when executed by the processor performs the method of off-site multi-master storage based on conflict-free replication data types.
Compared with the prior art, the invention has the beneficial effects that:
the invention discloses a remote multi-main storage method based on a conflict-free copy data type, which aims at the conflict problem of synchronous data of a remote multi-main data center. CRDT needs to guarantee three laws, the exchange, the combination and idempotent laws. Idempotent law is guaranteed based on the logIndex of the raft, which is globally unique in the same raft cluster and will only be applied once, which is guaranteed by the raft protocol. The data synchronization (sync) is carried with the logIndex and the slice ID, the remote data center records the currently executed logIndex in the dimension of the slice ID for deduplication, and the swap judgment is carried out before the next execution, and if the judgment is new, the execution is carried out. Thus each request is performed only once, corresponding to the idempotent law being satisfied. For remotely stored records, the number of fragments of a cluster is limited, and the data type is also an integer, with the consumed memory being controllable.
The invention discloses a multi-main storage method based on conflict-free copy data types in different places, which uses OR-Set and LWW for data operation integrating stateful, uses counter and LWW for data operation integrating stateful of value, and converts the deletion operation of the counter into common addition and subtraction operation. By modifying the storage protocol for each data type, the data synchronization can be ensured to be collision-free and finally consistent under the condition of a plurality of masters.
Description of the drawings:
FIG. 1 is a schematic diagram of data synchronization of a multi-master storage method in a different place based on conflict-free copy data types;
FIG. 2 is a schematic diagram of a storage structure of a middle hash set of a multi-master storage method in a different place based on a conflict-free copy data type according to an embodiment of the present invention;
fig. 3 illustrates an example of performing Hset command operations on a hash set based on a multi-master storage method in place of a conflict-free copy data type in an embodiment of the present invention;
FIG. 4 illustrates an example of performing Del command operations on a hash set (without time delay of synchronous data) based on a multi-master storage method in place of a conflict-free copy data type in accordance with an embodiment of the present invention;
FIG. 5 illustrates an example of performing Del command operations (with a delay in synchronizing data) on a hash set based on a multi-master storage method in place of a conflict-free replication data type in accordance with an embodiment of the present invention;
FIG. 6 illustrates an example of operations of the data deletion principle of performing Del command operations on a hash set based on a multi-master storage method in place of a conflict-free replication data type in an embodiment of the present invention;
FIG. 7 illustrates an example of operations for performing a set command for a string type data structure based on a multi-master storage method in place of a conflict-free replication data type in an embodiment of the present invention;
FIG. 8 illustrates an example of operations for executing a del command after performing a counter type operation for a value based on a multi-master storage method in place of a conflict-free copy data type in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the invention.
Thus, the following detailed description of the embodiments of the invention is not intended to limit the scope of the invention, as claimed, but is merely representative of some embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, under the condition of no conflict, the embodiments of the present invention and the features and technical solutions in the embodiments may be combined with each other.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
In the description of the present invention, it should be noted that, the terms "upper", "lower", and the like indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, or an azimuth or a positional relationship conventionally put in use of the inventive product, or an azimuth or a positional relationship conventionally understood by those skilled in the art, such terms are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or element to be referred must have a specific azimuth, be constructed and operated in a specific azimuth, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
Referring to fig. 1, the method for storing multiple data types in different places based on conflict-free copy data according to this embodiment includes:
respectively deploying synchronous services on each machine of the main data center A and each machine of the main data center B;
the data in the synchronization service of the main data center A is data synchronized with the data in the synchronization service of the main data center B through the load balancing service.
Aiming at the problem of conflict of synchronous data of the remote multi-master data center, the invention designs a new remote multi-master storage method based on a conflict-free copy data type (CRDT) to realize a similar redis storage protocol. CRDT needs to guarantee three laws, the exchange, the combination and idempotent laws. Idempotent law is guaranteed based on the logIndex of the raft, which is globally unique in the same raft cluster and will only be applied once, which is guaranteed by the raft protocol. The data synchronization (sync) is carried with the logIndex and the slice ID, the remote data center records the currently executed logIndex in the dimension of the slice ID for deduplication, and the swap judgment is carried out before the next execution, and if the judgment is new, the execution is carried out. Thus each request is performed only once, corresponding to the idempotent law being satisfied. For remotely stored records, the number of fragments of a cluster is limited, and the data type is also an integer, with the consumed memory being controllable.
A synchronization service (sync) is a separate service deployed on each machine of the primary data center a and the primary data center B for synchronizing written data to another place. As shown in FIG. 1, the synchronization service (sync) is divided into 8 bucket bundles, each corresponding to a resident thread and a queue. Writing is carried out according to the logIndex module 8, a sequence number is calculated, and the sequence number is placed into a corresponding queue (queue). The data in the corresponding cooperative waiting queue (queue) is synchronized with the proxy service (proxy) at the far end through the load balancing service (LB). The synchronous command is added with the identification, so that the remote data is prevented from being synchronized back after being landed.
In the method for remote multi-master storage based on conflict-free copy data types of the present embodiment, the synchronized data includes a hash set, and the hash set is stored in a state, as shown in fig. 2, and the storage structure of the hash set in this embodiment includes:
the meta information comprises meta key and meta value, the meta key stores service key, the meta value comprises a version storage set, an update_time storage set and a size, the version storage set comprises a value A set by a main data center A and a value B set by a main data center B, and the update_time storage set comprises time stamps updated by the main data center A and the main data center B;
field information including field key and field value, the field key storing service key, field and self_version of the main data center, the field value including update_time storing set storing timestamp set updated by the main data center a and the main data center B and actual value.
Therefore, the embodiment is based on a conflict-free copy data type remote multi-main storage method, and indirectly meets the idempotent law by ensuring that the remote is only executed once through a sync service based on the log index of the raft. For data operation with state in collection, using version and time stamp as tag, realizing data structure (OR-Set: compared-removed Set) with time stamp as tag, finally achieving data agreement without conflict.
As an optional implementation manner of the present embodiment, in the multi-host storage method based on the conflict-free copy data type of the present embodiment, executing the Hset command operation with respect to the hash set includes:
the main data center A executes an Hset command operation, sets a version storage set and an update_time storage set in meta information, updates self_version of field key in field information, and updates local update_time time timestamp and value in field value, wherein meta value stored in meta information in the main data center A is version { A, B }, and update_time is { A, B };
after receiving synchronous data of the main data center A, the main data center B updates meta information, if the version of the main data center A side in the meta information locally stored by the main data center B is smaller than the version of the main data center A side in the meta information of the synchronous data, the version and the update_time of the Beijing side are updated, field information is updated, whether the update_time timestamp in the synchronous data is larger than the update_time of the main data center A side and the update_time of the main data center B side which are locally stored by the main data center B is judged, and if the update_time of the main data center A side is larger than any value, the update_time of the main data center A side is not executed.
The Hset command of the present embodiment is used to assign a value to a field in the hash table, and if the hash table does not exist, a new hash table is created and Hset operation is performed; if the field is already present in the hash table, the old value will be overwritten.
Referring to fig. 3, in this embodiment, a data center a is shown as beijing, and a data center B is shown as guangzhou. Beijing is executed locally, version and update_time in meta information are set, self_version of key in field information is updated, and value updates local update_time and value values. At this time, value of meta information storage is version { Beijing, guangdong }, update_time is { Beijing, guangdong }, and values on the east side of Guangdong are all 0.
After receiving the synchronization data of the Beijing side, the Guangdong side updates meta information, and if the version of the Beijing side is 0 in the meta information locally stored in the Guangdong side, the version and update_time of the Beijing side are updated. Updating field information, judging whether the synchronous time stamp is larger than the Beijing side update_time and the Guangdong side update_time stored locally by using a data structure (LWW: last Write Wins) strategy based on time stamp partial order, and if so, updating the Beijing side update_time is not executed until the synchronous time stamp is smaller than any value.
As an optional implementation manner of the present embodiment, in the multi-host storage method based on the conflict-free copy data type of the present embodiment, executing the Hdel command operation with respect to the hash set includes:
the main data center A executes the Hdel command operation, deletes the corresponding field, and updates the update_time of the main data center A in meta information as the current time stamp;
after the main data center B receives synchronous data of the main data center A, meta information is unchanged, field information is updated, whether an update_time stamp in the synchronous data is larger than an update_time of the main data center A side and an update_time of the main data center B side which are locally stored in the main data center B is judged, and if the update_time of the main data center A side in the meta information is larger than the update_time of the main data center A side, the field is deleted, and meanwhile the update_time of the main data center A side in the meta information is updated as a current time stamp.
The Hdel command of this embodiment is used to delete one or more specified fields in the hash table key, the non-existing fields being ignored. Similar to the Hset command, the embodiment uses the data center a as Beijing, the data center B as Guangzhou for illustration, and the Beijing is executed locally, the corresponding field is deleted, and the update_time of Beijing in the meta information is updated as the current timestamp. After the Guangdong receives the synchronous data of the Beijing side, the meta information is unchanged, field information is updated, an LWW strategy is used for judging whether the synchronous time stamp is larger than the Beijing side update_time and the Guangdong side update_time stored locally, and if so, the field is deleted, and meanwhile, the Beijing side update_time in the meta information is updated to be the current time stamp.
As an optional implementation manner of this embodiment, in the multi-host storage method based on the conflict-free copy data type of this embodiment, performing a Del command operation on the hash set, where the Del command is used to delete an existing key, the non-existing key may be ignored. The DEL command follows the principle of deleting only data seen by itself and needs to compare version in meta information in addition to update_time. The synchronization data contains information in meta.
Further, the performing the Del command operation on the hash set according to the present embodiment includes:
time T1: performing Del command operation by the main data center A, deleting the key, adding 1 to the main data center A side version in meta information, wherein update_time is T1, and deleting field information;
time T2: after receiving synchronous data of the main data center A, the main data center B scans all field information, and aiming at the same field, the update_time of the main data center A side is smaller than the update_time in meta information of the synchronous data, and the field is deleted;
wherein, the time T1 and the time T2 are two adjacent times.
In this embodiment, the data center a is Beijing, the data center B is Guangzhou, and as shown in fig. 4, the time T1 is: 3ns, time T2 is time: and 4ns, under the condition that delay exists between the Beijing side data synchronization and the Guangdong side data synchronization, the data synchronization is executed according to the LWW partial order. The concrete explanation is as follows:
time 1ns: the Beijing local setting k, field is f1, and value is v1.
Time 2ns: the Guangdong local setting k, field as f2, value as v2. Synchronous data, two-place data are consistent, field contains f1 and f2.meta information is also synchronized.
Time 3ns: beijing delete k, beijing side version of meta information plus 1, update_time is 3, field information delete.
Time 4ns: after Guangdong receives the synchronous data, scanning all field information, wherein the Beijing side update_time (2 ns) of f1 is smaller than the update_time (3 ns) of meta in the synchronous data, and deleting f1; f2 is deleted by the Guangdong side update_time (2 ns) being smaller than the update_time (3 ns) in the meta information.
The data is eventually consistent and k is deleted.
In the method for remote multi-master storage based on conflict-free replication data types of the present embodiment, performing the Del command operation on the hash set includes:
time T1: performing Del command operation by the main data center A, deleting the key, adding 1 to the main data center A side version in meta information, wherein update_time is T1, and deleting field information;
time T2: after the main data center B receives the synchronous data of the main data center A at the moment T1, the moment T2 and the moment T1 have delay, the synchronous data needs to be judged and executed together according to version and update_time, if the version of the local side meta information in the main data center B is larger than the version of the main data center B side meta information in the synchronous data, the synchronous data is ignored.
Referring to fig. 5, in this embodiment, a data center a is Beijing, a data center B is Guangzhou, and a time T1 is: 3ns, time T2 is time: and 5ns, under the condition that the delay exists between the Beijing side data synchronization and the Guangdong side data synchronization, judging and executing according to the version and the update_time together. The concrete explanation is as follows:
and the times 1ns and 2ns are respectively f1 and f2 of the k data of Beijing and Guangdong, and the data are normally synchronized.
Time 3ns: k is deleted in both places, at this time, meta.version plus 1 in Beijing local is 2, meta.update_time is 3. Meta.version is 2 and meta.update_time is 3 in Guangdong local. But Beijing deletes the data sync of k with a delay, is received by Guangdong at time 5 ns.
Time 4ns: beijing receives the synchronous data of Guangdong deletion k, scans all field information, but is empty, and only updates meta information. The Guangdong side is now again set with k, field as f2, value as v2_1, and update local meta.
Time 5ns: the Beijing side receives synchronous data of f2 set by Guangdong, the meta.version (2) of the local Guangdong is consistent with the meta.version (2) of the synchronous data of Guangdong, the synchronous data of meta.update_time (4 nd) is larger, f2 data is added, and the meta.update_time of the Guangdong is set to be 4; the Beijing delete k synchronous data when 3ns is received at the Guangdong side, the local Guangdong meta.version (2) is larger than the Guangdong meta.version (1) in the synchronous data, and the synchronous data is ignored.
Finally, beijing and Guangdong data are consistent, field of k is f2, and value is v2_1. The Del of this embodiment only deletes the data principle that can be seen by itself, as shown in fig. 6, and the description is given by taking the data center a as beijing and the data center B as guangzhou as an example:
time 1ns: the Beijing local setting k, field is f1, and value is v1.
Time 2ns: and f2 is locally set to field, v2 is set to value, and meanwhile, synchronous data is received, f1 data is added, and at the moment, f1 and f2 are contained in the data on the Guangdong side. meta.version is 1 and meta.update_time is 2. But f2 data is synchronized to the Beijing side with delay.
Time 3ns: beijing side delete k, meta. Version add 1, meta. Update_time is 3.
Time 4ns: the Beijing side receives data from Guangdong operation at 2ns, and adds f2 data. The Guangdong side receives the data of K deleted when Beijing is 3ns, scans all field, and f 1's Beijing field.update_time (1 nd) is smaller than meta.update_time (3 nd) in the synchronous data, and f 1's Guangdong field.update_time (2 nd) deleted f 1's f 2's is larger than meta.update_time (0) in the synchronous data, and does not delete f 2's. That is, the deletion operation of Beijing in 3ns, after synchronizing to Guangdong, will not delete the f2 data.
Finally, f2 data were retained for both Beijing and Guangdong.
As an optional implementation manner of the present embodiment, the remote multi-master storage method based on the conflict-free copy data type of the present embodiment includes: performing stateful storage for value, wherein the data type of the value comprises string type;
the main data center A executes a set command operation aiming at the string type, when the main data center B receives the data synchronization of the main data center A, the string type data stored in the main data center B is written into the main data center B at the latest according to the update_time timestamp of the value according to the time-offset sequence.
Referring to fig. 7, in this embodiment, a data center a is Beijing, and a data center B is illustrated as an example in Guangzhou: string type set is executed according to LWW partial order, v2 is set to be late, and the final result of value is v2.
As an alternative implementation of this example, the counter types incrby, zadd, and hicrby operate to support both exchange and binding laws natively. However, for del operation, the switching law is not satisfied. Therefore, in the multi-master storage method based on conflict-free copy data types of the present embodiment, the del operation is performed after the counter operation is performed on the value, when the local data center performs the del operation, the del operation is converted into the counter operation for the data before the del operation, for example, the incrby operation is +1 and +9, and then the del operation is converted into the decrby 10, so that the exchange law can be supported, and the final consistency is achieved.
Referring to fig. 8, the embodiment is described by taking data center a as beijing and data center B as guangzhou as an example: the Beijing side performs deletion operation at 3ns, and the deletion operation is converted into 7 reduction, so that the Beijing local result is 0. Eventually, both data can be consistent regardless of whether the synchronization data is delayed or not.
In summary, it can be known that, in the method for multi-master storage in different places based on conflict-free copy data types in this embodiment, a new class redis storage protocol is designed based on CRDT as a theoretical basis, log index based on raft performs data synchronization, OR-Set and LWW are used for data operation with integrated state, counter and LWW are used for data operation with value state, and deletion operation of counter is converted into common addition and subtraction operation. By modifying the storage protocol for each data type, the data synchronization can be ensured to be collision-free and finally consistent under the condition of a plurality of masters.
The embodiment also provides a multi-main storage device in different places based on conflict-free copy data types, which comprises:
the synchronous service module is respectively deployed on each machine of the main data center A and each machine of the main data center B;
the data in the synchronous service of the main data center A is subjected to data synchronization with the data in the synchronous service of the main data center B through the load balancing service module;
the synchronized data comprises a hash set, the hash set is subjected to stateful storage, and a storage structure of the hash set comprises:
the meta information comprises meta key and meta value, the meta key stores service key, the meta value comprises a version storage set, an update_time storage set and a size, the version storage set comprises a value A set by a main data center A and a value B set by a main data center B, and the update_time storage set comprises time stamps updated by the main data center A and the main data center B;
field information including field key and field value, the field key storing service key, field and self_version of the main data center a, the field value including update_time storing set storing timestamp set updated by the main data center a and the main data center B and actual value.
Aiming at the problem of conflict of synchronous data of the remote multi-master data center, the invention designs a new remote multi-master storage device based on a conflict-free copy data type (CRDT) to realize a similar redis storage protocol. CRDT needs to guarantee three laws, the exchange, the combination and idempotent laws. Idempotent law is guaranteed based on the logIndex of the raft, which is globally unique in the same raft cluster and will only be applied once, which is guaranteed by the raft protocol. The data synchronization (sync) is carried with the logIndex and the slice ID, the remote data center records the currently executed logIndex in the dimension of the slice ID for deduplication, and the swap judgment is carried out before the next execution, and if the judgment is new, the execution is carried out. Thus each request is performed only once, corresponding to the idempotent law being satisfied. For remotely stored records, the number of fragments of a cluster is limited, and the data type is also an integer, with the consumed memory being controllable.
And the synchronous service module is deployed on each machine of the main data center A and the main data center B and used for synchronizing the written data to another place. As shown in FIG. 1, the synchronization service (sync) is divided into 8 bucket bundles, each corresponding to a resident thread and a queue. Writing is carried out according to the logIndex module 8, a sequence number is calculated, and the sequence number is placed into a corresponding queue (queue). The data in the corresponding cooperative waiting queue (queue) is synchronized with the proxy service (proxy) at the far end through the load balancing service (LB). The synchronous command is added with the identification, so that the remote data is prevented from being synchronized back after being landed.
The remote multi-master storage device based on the conflict-free copy data type of the embodiment indirectly satisfies the idempotent law by ensuring that the remote is only executed once through the remote-based log index sync service of the raft. For data operation with state in collection, using version and time stamp as tag, realizing data structure (OR-Set: compared-removed Set) with time stamp as tag, finally achieving data agreement without conflict.
The present embodiment also provides a computer-readable storage medium storing a computer-executable program that, when executed, implements the multi-master remote storage method based on a conflict-free copy data type.
The computer readable storage medium of this embodiment may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
The embodiment also provides an electronic device, including a processor and a memory, where the memory is configured to store a computer executable program, and when the computer program is executed by the processor, the processor executes the method for storing multiple copies of data in different places based on conflict-free copy data types.
The electronic device is in the form of a general purpose computing device. The processor may be one or a plurality of processors and work cooperatively. The invention does not exclude that the distributed processing is performed, i.e. the processor may be distributed among different physical devices. The electronic device of the present invention is not limited to a single entity, but may be a sum of a plurality of entity devices.
The memory stores a computer executable program, typically machine readable code. The computer readable program may be executable by the processor to enable an electronic device to perform the method, or at least some of the steps of the method, of the present invention.
The memory includes volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may be non-volatile memory, such as Read Only Memory (ROM).
It should be understood that elements or components not shown in the above examples may also be included in the electronic device of the present invention. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a man-machine interaction element such as a button, a keyboard, and the like. The electronic device may be considered as covered by the invention as long as the electronic device is capable of executing a computer readable program in a memory for carrying out the method or at least part of the steps of the method.
From the above description of embodiments, those skilled in the art will readily appreciate that the present invention may be implemented by hardware capable of executing a specific computer program, such as the system of the present invention, as well as electronic processing units, servers, clients, handsets, control units, processors, etc. included in the system. The invention may also be implemented by computer software executing the method of the invention, e.g. by control software executed by a microprocessor, an electronic control unit, a client, a server, etc. It should be noted, however, that the computer software for performing the method of the present invention is not limited to being executed by one or a specific hardware entity, but may also be implemented in a distributed manner by unspecified specific hardware. For computer software, the software product may be stored on a computer readable storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), or may be stored distributed over a network, as long as it enables the electronic device to perform the method according to the invention.
The above embodiments are only for illustrating the present invention and not for limiting the technical solutions described in the present invention, and although the present invention has been described in detail in the present specification with reference to the above embodiments, the present invention is not limited to the above specific embodiments, and thus any modifications or equivalent substitutions are made to the present invention; all technical solutions and modifications thereof that do not depart from the spirit and scope of the invention are intended to be included in the scope of the appended claims.

Claims (10)

1. The remote multi-main storage method based on the conflict-free copy data type is characterized by comprising the following steps of:
respectively deploying synchronous services on each machine of the main data center A and each machine of the main data center B;
the data in the synchronization service of the main data center A and the data in the synchronization service of the main data center B are subjected to data synchronization through the load balancing service;
the synchronized data comprises a hash set, the hash set is subjected to stateful storage, and a storage structure of the hash set comprises:
the meta information comprises meta key and meta value, the meta key stores service key, the meta value comprises a version storage set, an update_time storage set and a size, the version storage set comprises a value A set by a main data center A and a value B set by a main data center B, and the update_time storage set comprises time stamps updated by the main data center A and the main data center B;
field information including field key and field value, the field key storing service key, field and self_version of the main data center, the field value including update_time storing set storing timestamp set updated by the main data center a and the main data center B and actual value.
2. The multi-master storage method of claim 1, wherein performing an Hset command operation on the hash set comprises:
the main data center A executes an Hset command operation, sets a version storage set and an update_time storage set in meta information, updates self_version of field key in field information, and updates local update_time time timestamp and value in field value, wherein meta value stored in meta information in the main data center A is version { A, B }, and update_time is { A, B };
after receiving synchronous data of the main data center A, the main data center B updates meta information, if the version of the main data center A side in the meta information locally stored by the main data center B is smaller than the version of the main data center A side in the meta information of the synchronous data, the version and the update_time of the Beijing side are updated, field information is updated, whether the update_time timestamp in the synchronous data is larger than the update_time of the main data center A side and the update_time of the main data center B side which are locally stored by the main data center B is judged, and if the update_time of the main data center A side is larger than any value, the update_time of the main data center A side is not executed.
3. The conflict-free replication data type-based off-site multi-master storage method of claim 1, wherein performing Hdel command operations on the hash set comprises:
the main data center A executes the Hdel command operation, deletes the corresponding field, and updates the update_time of the main data center A in meta information as the current time stamp;
after the main data center B receives synchronous data of the main data center A, meta information is unchanged, field information is updated, whether an update_time stamp in the synchronous data is larger than an update_time of the main data center A side and an update_time of the main data center B side which are locally stored in the main data center B is judged, and if the update_time of the main data center A side in the meta information is larger than the update_time of the main data center A side, the field is deleted, and meanwhile the update_time of the main data center A side in the meta information is updated as a current time stamp.
4. The multi-master-place storage method based on conflict-free replication data type of claim 1, wherein performing Del command operations for the hash set comprises:
time T1: performing Del command operation by the main data center A, deleting the key, adding 1 to the main data center A side version in meta information, wherein update_time is T1, and deleting field information;
time T2: after receiving synchronous data of the main data center A, the main data center B scans all field information, and aiming at the same field, the update_time of the main data center A side is smaller than the update_time in meta information of the synchronous data, and the field is deleted;
wherein, the time T1 and the time T2 are two adjacent times.
5. The multi-master-place storage method based on conflict-free replication data type of claim 4 wherein performing Del command operations on the hash set comprises:
time T1: performing Del command operation by the main data center A, deleting the key, adding 1 to the main data center A side version in meta information, wherein update_time is T1, and deleting field information;
time T2: after the main data center B receives the synchronous data of the main data center A at the moment T1, the moment T2 and the moment T1 have delay, the synchronous data needs to be judged and executed together according to version and update_time, if the version of the local side meta information in the main data center B is larger than the version of the main data center B side meta information in the synchronous data, the synchronous data is ignored.
6. The multi-place main storage method based on conflict-free copy data type according to claim 1, wherein the state storage is performed for value, and the data type of the value comprises string type;
the main data center A executes a set command operation aiming at the string type, when the main data center B receives the data synchronization of the main data center A, the string type data stored in the main data center B is written into the main data center B at the latest according to the update_time timestamp of the value according to the time-offset sequence.
7. The multi-master-place storage method based on conflict-free replication data type of claim 6 wherein the del operation is performed after the counter operation is performed on the value, the del operation being converted to an inverse operation of the counter operation on the data prior to the del operation when executed by the local data center.
8. A multi-master remote storage device based on conflict-free replication data types, comprising:
the synchronous service module is respectively deployed on each machine of the main data center A and each machine of the main data center B;
the data in the synchronous service of the main data center A is subjected to data synchronization with the data in the synchronous service of the main data center B through the load balancing service module;
the synchronized data comprises a hash set, the hash set is subjected to stateful storage, and a storage structure of the hash set comprises:
the meta information comprises meta key and meta value, the meta key stores service key, the meta value comprises a version storage set, an update_time storage set and a size, the version storage set comprises a value A set by a main data center A and a value B set by a main data center B, and the update_time storage set comprises time stamps updated by the main data center A and the main data center B;
field information including field key and field value, the field key storing service key, field and self_version of the main data center a, the field value including update_time storing set storing timestamp set updated by the main data center a and the main data center B and actual value.
9. A computer-readable storage medium, in which a computer-executable program is stored, which, when executed, implements the method for off-site multi-master storage based on conflict-free replication data types as claimed in any one of claims 1-7.
10. An electronic device comprising a processor and a memory for storing a computer executable program which, when executed by the processor, performs the method of multi-master-place storage based on conflict-free replication data types of any one of claims 1-7.
CN202310648085.3A 2023-06-02 2023-06-02 Remote multi-main storage method and device based on conflict-free copy data type Pending CN116668470A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310648085.3A CN116668470A (en) 2023-06-02 2023-06-02 Remote multi-main storage method and device based on conflict-free copy data type

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310648085.3A CN116668470A (en) 2023-06-02 2023-06-02 Remote multi-main storage method and device based on conflict-free copy data type

Publications (1)

Publication Number Publication Date
CN116668470A true CN116668470A (en) 2023-08-29

Family

ID=87725680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310648085.3A Pending CN116668470A (en) 2023-06-02 2023-06-02 Remote multi-main storage method and device based on conflict-free copy data type

Country Status (1)

Country Link
CN (1) CN116668470A (en)

Similar Documents

Publication Publication Date Title
US11442961B2 (en) Active transaction list synchronization method and apparatus
CN108287835B (en) Data cleaning method and device
AU2016405587B2 (en) Splitting and moving ranges in a distributed system
US8799213B2 (en) Combining capture and apply in a distributed information sharing system
CN109379432A (en) Data processing method, device, server and computer readable storage medium
CN113396407A (en) System and method for augmenting database applications using blockchain techniques
US9515878B2 (en) Method, medium, and system for configuring a new node in a distributed memory network
CN108710638B (en) Distributed concurrency control method and system based on mixed RDMA operation
US10489378B2 (en) Detection and resolution of conflicts in data synchronization
CN107577678A (en) The method of processing data storehouse affairs, client and server
US20230106118A1 (en) Distributed processing of transactions in a network using timestamps
CN105183400B (en) It is a kind of based on content addressed object storage method and system
US20150269040A1 (en) Restoring distributed shared memory data consistency within a recovery process from a cluster node failure
US10191936B2 (en) Two-tier storage protocol for committing changes in a storage system
WO2012045245A1 (en) Method and system for maintaining data consistency
CN110196856A (en) A kind of distributed data read method and device
US10749955B2 (en) Online cache migration in a distributed caching system using a hybrid migration process
CN105574187A (en) Duplication transaction consistency guaranteeing method and system for heterogeneous databases
CN110019873B (en) Face data processing method, device and equipment
CN112307119A (en) Data synchronization method, device, equipment and storage medium
CN113076304A (en) Distributed version management method, device and system
CN105138284A (en) Virtual machine disk mirror image synchronization operation optimization system and method
KR20210040864A (en) File directory traversal method, apparatus, device, and medium
CN108664520A (en) Safeguard method, apparatus, electronic equipment and the readable storage medium storing program for executing of data consistency
CN115617908A (en) MySQL data synchronization method, device, database terminal, medium and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication