CN109164985B - Method for copying data, master device and slave device - Google Patents

Method for copying data, master device and slave device Download PDF

Info

Publication number
CN109164985B
CN109164985B CN201810982126.1A CN201810982126A CN109164985B CN 109164985 B CN109164985 B CN 109164985B CN 201810982126 A CN201810982126 A CN 201810982126A CN 109164985 B CN109164985 B CN 109164985B
Authority
CN
China
Prior art keywords
data
slave device
request message
master device
version
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810982126.1A
Other languages
Chinese (zh)
Other versions
CN109164985A (en
Inventor
孙嘉岑
钱海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201810982126.1A priority Critical patent/CN109164985B/en
Publication of CN109164985A publication Critical patent/CN109164985A/en
Priority to PCT/CN2019/098307 priority patent/WO2020042852A1/en
Application granted granted Critical
Publication of CN109164985B publication Critical patent/CN109164985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method, a master device and a slave device for copying data. The method comprises the following steps: the method comprises the steps that a master device sends a first request message to a slave device, the slave device is instructed to obtain second data, the second data cover first data stored in the slave device, the master device and the slave device are respectively located in different data centers, the first data and the second data are data of two different versions of data with the same name, and the version of the second data is later than that of the first data; the master device receives a first reply message sent by the slave device, wherein the first reply message is used for indicating that the slave device successfully copies the second data. In the embodiment of the application, the slave device can acquire the coverage relation of the first data and the second data through the first request message, the historical operation information of the data does not need to be added in the metadata of the data, the size of the metadata is reduced, the final consistency of asynchronous copying can be ensured, and the performance of copying the data is improved.

Description

Method for copying data, master device and slave device
Technical Field
The present application relates to the field of storage, and more particularly, to a method, master device, and slave device for copying data.
Background
With the rapid development of data replication technology, asynchronous replication technology has become the mainstream data replication technology. In the prior art, in order to ensure that the same-name data stored in different data centers are subjected to multi-thread multi-time coverage of a main cluster and asynchronous replication between a main cluster and a standby cluster across regions, the consistency of the latest version data of the same-name data can be achieved after a certain time, a meta clock (meta clock) is added to the metadata of the data, the coverage relation between different versions of the data of the same-name data is determined based on the meta clock in the metadata, the data volume of the metadata is increased, the data replication cost is further increased, and the data replication performance is influenced. Therefore, how to reduce the size of metadata and improve the performance of data replication becomes a problem to be solved urgently.
Disclosure of Invention
The application provides a method and a device for copying data, which can reduce the size of metadata and improve the performance of copying data.
In a first aspect, a method for replicating data is provided, including: the method comprises the steps that a master device sends a first request message to a slave device, wherein the first request message is used for instructing the slave device to obtain second data and covering the second data with first data, the master device and the slave device are respectively located in different data centers, the first data and the second data are data of two different versions of same-name data, and the version of the second data is later than that of the first data;
when the latest version of the same-name data stored in the slave device is the first data, the first request message can be used for instructing the slave device to obtain the second data and overwriting the first data, and the master device receives a first response message sent by the slave device, wherein the first response message is used for indicating that the slave device successfully copies the second data; alternatively, the first and second electrodes may be,
when the latest version of the same-name data stored in the slave device is not the first data but third data, the master device receives a second request message sent by the slave device, wherein the second request message is used for inquiring the coverage relation between the second data and the third data in the master device, and the third data is data of one version of the same-name data;
the master device sends second indication information to the slave device, wherein the second indication information is used for indicating the coverage relation of the second data and the third data in the master device, and the master device receives a first response message sent by the slave device, and the first response message is used for indicating that the slave device successfully copies the second data.
The above-mentioned copied data can be understood as copied objects, copied files, copied blocks, or the like.
According to the method for copying data provided by the embodiment of the application, the second data is asynchronously copied through the first request message sent by the master device to the slave device, and the first data stored in the slave device is covered according to the first request message. The first data is the data of the old version in the same-name data, and the second data is the data of the new version in the same-name data. That is, each time copying is performed in the master device and the slave device, a new version of data is copied, and finally final consistency can be achieved. And the metadata of the second data does not need to increase the recording operation history of the metadata clock, so that the size of the metadata is reduced, and the performance of data copying is improved.
When the first request message instructs the second data to overwrite the first data, and the latest version of the data of the same name held in the slave device is not the first data but the third data, the slave device cannot determine the overwrite relationship of the second data and the third data.
At this time, a source returning verification mechanism is added, that is, the master device receives a second request message sent by the slave device for querying the coverage relationship between the third data and the second data, and the master device sends second indication information to the slave device according to the coverage relationship between the third data and the second data in the master device, and indicates the coverage relationship between the third data and the second data again. The final consistency between the master device and the slave device can be ensured when the data is copied.
The second indication information may specifically be status information of the third data and the second data, or,
when the machine clock is trusted, the second indication information may specifically be the third data and the second data at the time of the next download. The clock credibility means that a system where the master device and the slave device are located is provided with an atomic clock, and the slave device side receives clock information of the master device side. Flexible selectivity can be provided for possible indication forms of the second indication information.
In the case where the second data is overwritten with the third data in the master device, the state of the second data is a deleted (overwritten) state, and the third data is data overwriting the second data, then the state of the third data is a non-deleted (not overwritten) state; since the second data is overwritten by the third data in the master device, the second data is downloaded from the master device earlier than the third data.
With reference to the first aspect, in certain implementations of the first aspect, the first request message includes first indication information, where the first indication information is used to indicate that the version of the second data is later than the version of the first data.
According to the method for copying data provided by the embodiment of the application, the first request message carries the indication information that the version of the second data is later than that of the first data, so that when the slave device receives the first request message, the slave device can judge the coverage relation between the second data and the first data according to the first indication information carried in the first request message. The accuracy of the copied data is improved.
Specifically, the first request message further includes second data, or includes a first identifier of the second data.
When the first request message includes the second data, the slave device directly receives the second data and saves the second data. I.e. an implementation of obtaining second data from a device;
and when the first request message comprises the first identifier of the second data, the slave equipment acquires the second data from the master equipment according to the first identifier of the second data and copies the second data. I.e. another implementation of obtaining the second data from the device.
Specifically, the first identifier of the second data may be a version number of the second data, or other identification information capable of indicating the second data.
Optionally, in some embodiments, the first request message may not include the above-mentioned first indication information, including the following two possible cases:
the first condition is as follows: the master device and the slave device are located in different data centers, and a global scheduler is included between the different data centers, and can allocate a globally unique second identifier for each version of data uploaded by the same-name data, and the second identifier is an increasing sequence, that is, the second identifier of the new version is certainly larger than the second identifier of the old version. In this case, only the second identification of the second data may be included in the first request message, or the second identification of the second data and the second data may be included.
Specifically, when the first request message only includes the second identifier of the second data, the slave device obtains the second data from the master device according to the second identifier of the second data, and copies the second data. I.e. an implementation in which the second data is obtained from the device.
Or, when the first request message includes the second identifier of the second data and the second data, the slave device directly receives the second data and stores the second data. That is, another implementation of obtaining the second data from the device is two: the master device and the slave device are located in different data centers, and a global scheduler is included between the different data centers, and can allocate a globally unique clock (an atomic clock exists in the system) for each version of data uploaded by the same-name data, and the clock can accurately represent the off-disk time of each version of data of the same-name data in the master device, namely the clock of the new version is certainly later than the clock of the old version. In this case, only the clock information of the second data may be included in the first request message, or the clock information of the second data and the second data may be included.
Specifically, when the first request message only includes the clock information of the second data, the slave device acquires the second data from the master device according to the clock information of the second data, and copies the second data. I.e. an implementation in which the second data is obtained from the device. Or, when the clock information of the second data and the second data are included in the first request message, the slave device directly receives the second data and saves the second data. I.e. another implementation of obtaining the second data from the device.
With reference to the first aspect and the foregoing implementation manner of the first aspect, in another implementation manner of the first aspect, the first indication information is carried in a header field of the first request message.
According to the method for copying data provided by the embodiment of the application, the first indication information may be carried in a header field of the first request message. When the slave equipment receives the first request message, the version of the second data is later than that of the first data, and then the covering relation between the first data and the second data is known.
Optionally, the first indication information may be carried in other positions of the first request message.
With reference to the first aspect and the foregoing implementation manner of the first aspect, in another implementation manner of the first aspect, the first indication information is a version number of the first data.
According to the method for copying data provided by the embodiment of the application, the first indication information may be a version number of the first data. The version number of the data with the same name and different versions can uniquely determine the data corresponding to the version number, so that the first indication information is the version number of the first data, and the accuracy of determining the first data by the slave device can be improved.
With reference to the first aspect and the foregoing implementation manner of the first aspect, in another implementation manner of the first aspect, before the master device sends the first request message to the slave device, the method further includes: the master device determines a state of the second data, the state of the second data being an uncovered state.
According to the method for copying data provided by the embodiment of the application, before the master device sends the first request message for the second data to instruct the slave device to acquire the second data, it is determined that the state of the second data is in an uncovered state, and the first request message needs to be sent for the second data. The redundancy of the copy information can be avoided, and the signaling overhead is saved.
In a second aspect, a method for replicating data is provided, comprising: receiving, by a slave device, a first request message sent by a master device, where the first request message is used to instruct the slave device to obtain second data and to overwrite the second data with first data, where the master device and the slave device are located in different data centers, the first data and the second data are data of two different versions of data with the same name, and a version of the second data is later than a version of the first data;
when the latest version of the data with the same name stored in the slave device is the first data, the slave device obtains the second data according to the first request message and overwrites the first data stored locally with the second data; the slave device sends a first response message to the master device, wherein the first response message is used for indicating that the slave device successfully copies the second data; alternatively, the first and second electrodes may be,
when the latest version of the same-name data stored in the slave device is not the first data but third data, the slave device sends a second request message to the master device, wherein the second request message is used for inquiring the coverage relation of the second data and the third data, and the third data is data of one version of the same-name data; and the slave equipment receives second indication information sent by the master equipment, wherein the second indication information is used for indicating the coverage relation of the second data and the third data.
According to the method for copying data provided by the embodiment of the application, the slave device receives the first request message sent by the master device, performs asynchronous copying on the second data, and covers the first data according to the first request message. The first data is the data of the old version in the same-name data, and the second data is the data of the new version in the same-name data. That is, each time copying is performed in the master device and the slave device, a new version of data is copied, and finally final consistency can be achieved. And the metadata of the second data does not need to increase the recording operation history of the metadata clock, so that the size of the metadata is reduced, and the performance of data copying is improved.
When the first indication information indicates that the second data overwrites the first data, and the slave device does not store the first data but stores the third data, the slave device cannot determine the overlay relationship between the second data and the third data.
At this time, a source returning verification mechanism is added, that is, the slave device sends a second request message for querying the coverage relationship between the third data and the second data to the master device, and the master device sends second indication information to the slave device according to the coverage relationship between the third data and the second data in the master device, so as to re-indicate the coverage relationship between the third data and the second data. The final consistency between the master device and the slave device can be ensured when the data is copied.
The second indication information may specifically be state information of the third data and the second data, or a time of downloading the third data and the second data. Flexible selectivity can be provided for possible indication forms of the second indication information.
Specifically, in the case where the second data is overwritten by the third data in the master device, the state of the second data is a deleted (overwritten) state, and the third data is data overwriting the second data, then the state of the third data is a non-deleted (uncovered) state; since the second data is overwritten by the third data in the master device, the second data is downloaded from the master device earlier than the third data.
With reference to the second aspect, in some implementations of the second aspect, the first request message includes first indication information, where the first indication information is used to indicate that the version of the second data is later than the version of the first data.
According to the method for copying data provided by the embodiment of the application, the first request message carries the indication information that the version of the second data is later than that of the first data, so that when the slave device receives the first request message, the slave device can judge the coverage relation between the second data and the first data according to the first indication information carried in the first request message. The accuracy of the copied data is improved.
With reference to the second aspect and the foregoing implementation manner of the second aspect, in another implementation manner of the second aspect, the first indication information is carried in a header field of the first request message.
According to the method for copying data provided by the embodiment of the application, the first indication information may be carried in a header field of the first request message. When the slave device receives the first request message, the coverage relation of the first data and the second data can be quickly obtained.
Optionally, the first indication information may be carried in other positions of the first request message.
With reference to the second aspect and the foregoing implementation manner of the second aspect, in another implementation manner of the second aspect, the first indication information is a version number of the first data.
According to the method for copying data provided by the embodiment of the application, the first indication information may be a version number of the first data. The version number of the data with the same name and different versions can uniquely determine the data corresponding to the version number, so that the first indication information is the version number of the first data, and the accuracy of determining the first data by the slave device can be improved.
In a third aspect, a master device is provided, where the master device includes respective means for performing the methods in the first aspect and any possible implementation manner of the first aspect.
In a fourth aspect, a slave device is provided, which comprises means for performing the method of the second aspect and any possible implementation manner of the second aspect.
In a fifth aspect, a master device is provided that includes at least one processor and at least one memory. The at least one memory is configured to store a computer program, and the at least one processor is configured to call and run the computer program from the at least one memory, so that the host device executes the method in the first aspect and any possible implementation manner of the first aspect, where the host device further includes a hard disk, and the hard disk is configured to store the data with the same name.
In a sixth aspect, a slave device is provided that includes at least one processor and at least one memory. The at least one memory is configured to store a computer program, and the at least one processor is configured to invoke and run the computer program from the memory, so that the slave device performs the method of the second aspect and any possible implementation manner of the second aspect, and the slave device further includes a hard disk, where the hard disk is configured to store the data with the same name.
In a seventh aspect, a system is provided, which includes the master device of the fifth aspect and the slave device of the sixth aspect.
In an eighth aspect, there is provided a computer program product comprising: computer program code for causing a computer to perform the method of the first and second aspects described above when said computer program code is run on a computer.
It should be noted that, all or part of the computer program code may be stored in the first storage medium, where the first storage medium may be packaged together with the processor or may be packaged separately from the processor, and this is not specifically limited in this embodiment of the present application.
In a ninth aspect, a computer readable medium is provided, the computer readable medium having stored program code which, when run on a computer, causes the computer to perform the method of the first and second aspects above.
According to the method for copying the data, the main device and the slave device, the metadata of the data to be copied does not need to be recorded with the operation history of the data, the size of the metadata can be reduced, and the performance of copying the data can be improved.
Drawings
Fig. 1 is a schematic view of a scenario in which a method for copying data provided by an embodiment of the present application is applied.
FIG. 2 is a schematic diagram of a meta clock based asynchronous replication method.
FIG. 3 is a schematic diagram of another asynchronous meta clock-based copy method.
Fig. 4 is a schematic diagram of a method for copying data according to the present application.
FIG. 5 is a diagram illustrating an embodiment of a method for replicating data.
Fig. 6 is a schematic diagram of a master device of an embodiment of the present application.
Fig. 7 is a schematic block diagram of a master device of another embodiment of the present application.
Fig. 8 is a schematic diagram of a slave device according to an embodiment of the present application.
Fig. 9 is a schematic block diagram of a slave device of another embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
The technical scheme of the embodiment of the application can be applied to different data centers.
The following briefly introduces a scenario to which the embodiment of the present application is applicable with reference to fig. 1.
Fig. 1 is a schematic view of a scenario in which a method for copying data provided by an embodiment of the present application is applied. Including two portions 110 and 120.
110 is a data center, specifically, a data center 111 and a data center 112 as shown in fig. 1, in this embodiment, the data center 111 and the data center 112 are two different data centers.
Alternatively, the data center 111 and the data center 112 may be located in two different areas.
Alternatively, the data center 111 and the data center 112 may be different data centers in the same area.
The device participating in the data replication process in the data center 111 may be referred to as a master device, and the device participating in the data replication process in the data center 112 may be referred to as a slave device; alternatively, the first and second electrodes may be,
the device participating in the data replication process in the data center 111 may also be referred to as a slave device, and the device participating in the data replication process in the data center 112 may also be referred to as a master device. This is not limited by the present application.
It should be understood that, in the embodiment of the present application, the number of the slave devices is also not limited, and data replication may be performed between a plurality of slave devices and one master device.
For example, fig. 1 also includes a plurality of data centers, each of which includes a slave device, and data replication is performed between the master device and the slave device.
It should also be understood that the above-mentioned designations of master and slave devices are merely examples and should not limit the scope of the present application.
For example, a master device may also be referred to as a master node, a master cluster, a master region (region), a master peer, or the like; a slave device may also be referred to as a slave node, a slave cluster, a slave region or slave, etc.
120 is a duplicate channel for supporting the transmission of data to be duplicated between the master device and the slave device.
For example, for business reasons, a user needs to migrate data from one data center to another data center, a device migrating data in the data center migrating data may be referred to as the master device, and a device migrating data in a destination data center migrating data may be referred to as the slave device.
It should be understood that the master and slave devices shown in fig. 1 are the largest serviceable unit of centralized metadata management, and may be, for example, network devices or terminal devices that include a processor and memory.
To facilitate understanding of the embodiments of the present application, the following briefly introduces basic concepts that will be referred to in the embodiments of the present application.
Asynchronous replication (asynchronous replication): the copying action of the data is carried out in an asynchronous mode without influencing the downloading time delay of the local data. Wherein, the lower disk of the data specifically refers to: one way to achieve data persistence is, for example, to store data from volatile memory onto a non-volatile hard disk.
Specifically, the data replication is carried out in an asynchronous mode, and the data replication and the data downloading of the main device are two threads which do not affect each other.
For example, for the data a with the same name, the master device first receives the data of the first version of the data a with the same name uploaded by the user, sends a data copying request to the slave device, requests the slave device to copy the data of the first version of the data a with the same name, and performs downloading of the data of the first version of the data a with the same name locally on the master device, independent from the copying of the data of the first version.
It should be understood that when data is copied between the master device and the slave device, the final consistency of the data stored in the master device and the slave device is expected to be achieved for the data with the same name.
Wherein final consistency (final consistency) refers to: two pieces of same-name data stored in the master device and the slave device respectively are subjected to multiple times of coverage between the master device and the slave device aiming at different versions of the same-name data and asynchronous copying between the master device and the slave device, so that the latest versions of the same-name data in the master device and the slave device can be consistent after a certain time.
The same name data refers to data with the same name in the database.
For example, when new data is added to the database, if there is data in the database that is the same name as the data to be added, the new data overwrites the old data.
It should be understood that the above-described overwriting of data refers to overwriting that occurs between multiple versions of the same-name data, as in the non-multi-version scenario, only one version can be included in the database for the same-name data. When the master device and/or the slave device opens the multiple versions, the data of the multiple versions of the same-name data are not overwritten. Because, multiple versions of the same name data are retained when the master and/or slave device opens multiple versions.
From the foregoing, it should be understood that the data replication referred to in the present application refers to asynchronous replication of data of the same name between different devices across a data center that do not open a multi-version scenario.
Further, the master and slave devices may store an overlay history of different versions of data of the same name with a meta clock (meta clock).
Specifically, each meta clock maintains the creation time of the history data (including the logic clock and the machine clock), as well as the current operating context. Comparing the two meta clock lists of the two versions of the same name data can obtain the timing sequence of the two versions of the data.
It should be understood that the data replication between the master device and the slave device is enabled because a protocol implementation or control mechanism capable of reliably transmitting data is provided between two master devices and slave devices across a data center, and consistent interference factors possibly brought by data transmission are shielded.
Concurrent transaction (CV): an optimistic lock implementation for concurrent transactions of data. Among them, an Optimistic Loc (Optimistic Loc) is a method of concurrent control. Assuming that multiple concurrent transactions by multiple users do not affect each other during processing, each transaction can process the portion of data that is affected by each transaction without generating a lock. Before committing the data update, each transaction will first check whether other transactions have modified the data after the transaction reads the data. The committing transaction rolls back if there are updates to other transactions.
Specifically, optimistic locks are mostly implemented based on a data version (version) recording mechanism. The data version refers to adding a version identifier to the data.
For example, in a database table based version solution, this is typically accomplished by adding a "version" field to the database table. When data is read out, the version number is read out together, and when data is updated later, the version number is increased by one. At the moment, the version data of the submitted data is compared with the current version information of the corresponding record of the database table, if the version number of the submitted data is greater than the current version number of the database table, the submitted data is updated, and if not, the submitted data is regarded as overdue data.
Write-ahead loading (WAL): is a series of techniques for providing atomicity and durability in relational database systems.
In systems using WAL, all modifications are written to a log (log) file prior to commit. The log file typically includes recovery (redo) and undo (undo) information.
For example, assuming a program powers down the machine during the performance of certain operations, upon restart of the machine, the program may need to know whether the operation being performed at that time was successful or partially successful or failed. If WAL is used, the program can check the log file and compare the contents of operations that are scheduled to be performed in the event of a sudden power loss with the contents of operations that are actually performed. Based on this comparison, the program can decide whether to undo the operation done or to continue to complete the operation done, or to leave it intact.
In the prior art, in order to achieve consistency of data stored for the same-name data between a master device and a slave device in different data centers, a method based on the above meta clock asynchronous copy is proposed.
Specifically, the meta clock includes information such as a data creation time (createTime) and requirement information (requestInfo) of different versions of the same-name data.
Furthermore, the meta data of the same name and different versions stores N operation histories of the data by using a meta clock field, and the meta clock of the meta data corresponding to the data of the different versions of the data of the same name is compared to determine the data sequence covering relationship of the different versions of the data of the same name.
Next, the asynchronous replication method based on meta clock will be described by taking an example of receiving a replication request of two versions of data of the same name from a device and determining a sequential coverage relationship of the two versions of data.
Specifically, the slave device compares meta clocks in metadata of the two versions of data in two steps:
the first step is as follows: the slave device merges meta clocks in the metadata of the two versions of the data of the same name data in a common substring-based manner, and non-common parts of the meta clocks in the metadata of the two versions of the data of the same name data are sorted according to time.
The common substrings refer to different substrings, appear in respective mother strings, and the appearance sequence of the common substrings is consistent with that of the mother strings, so that the different substrings are called as common substrings of different mother strings.
For example, the substrings bo, bg, lg appear in both parent strings cnblogs and belong and appear in the same order as the parent strings, and are referred to as common substrings.
And secondly, the slave device judges the precedence order of the meta clocks in the metadata of the two versions of the data with the same name according to the merged meta clocks.
The above-mentioned meta clock based asynchronous replication method is briefly described with reference to fig. 2.
FIG. 2 is a schematic diagram of a meta clock based asynchronous replication method. The system comprises a master device and a slave device.
Specifically, the master device uploads a plurality of different versions of data of the same name back and forth. For example, including two versions of data (data V9 and data V8, as shown in FIG. 2), data V9 overrides data V8. That is, the latest version of the data of the same name in the master device is data V9.
V9 is a version number of the data V9, wherein data inside parentheses after V9 in fig. 2 represents a timestamp (1, 2, 3, 7) corresponding to meta clock included in metadata of the data V9.
It can be understood that the data with the same name recorded at the time "1" is overwritten by the data with the same name recorded at the time "2". Similarly, the homonymous data downloaded at the time of "2" is covered by the homonymous data downloaded at the time of "3"; the data with the same name downloaded at the time "3" is overwritten by the data with the same name downloaded at the time "7".
V8 is a version number of the data V8, wherein data inside parentheses after V8 in fig. 2 represents a timestamp (2, 3, 4, 5, 6) corresponding to meta clock included in the metadata of the data V8.
It can be understood that the data with the same name recorded at the time "2" is overwritten by the data with the same name recorded at the time "3". Similarly, the homonymous data downloaded at the time of "3" is covered by the homonymous data downloaded at the time of "4"; the homonymous data downloaded at the time of '4' is covered by the homonymous data downloaded at the time of '5'; the data with the same name downloaded at the time "5" is overwritten by the data with the same name downloaded at the time "6".
The process of implementing asynchronous replication between the master device and the slave device mainly comprises:
firstly, the master device sends a first copy data request message to the slave device for the data V9 to request the slave device to copy the data V9; the master transmits a second copy data request message to the slave for the data V8, requesting the slave to copy the data V8.
As described above, since the data V9 is data of a new version of the data of the same name, the data V9 overwrites the data V8 in the master.
Optionally, the first duplicated data request message sent by the master device to the slave device is sent before the second duplicated data request message, and the first duplicated data request message arrives at the slave device before the second duplicated data request message.
Alternatively, the first duplicated data request message sent by the master device to the slave device is sent before the second duplicated data request message, but the first duplicated data request message arrives at the slave device later than the second duplicated data request message. That is, the timings at which the duplicate data request message transmitted from the master device to the slave device arrives at the slave device are crossed. In this case, the slave device determines the precedence coverage relationship between the data V9 and the data V8 according to the meta clock in the metadata of the data V9 carried in the first replication data request message and the meta clock in the metadata of the data V8 carried in the second replication data request message. If the slave device does not compare meta clocks in the metadata of data V8 and data V9, determining the overlay relationship of data V9 and data V8 directly from the precedence relationship of receiving the first copy data request message and the second copy data request message will result in the latest version of the data of the same name in the slave device being data V8.
In the following, how the slave device decides the overlay relationship of the data V8 and the data V9 by comparing meta clocks in the metadata of the data V8 and the data V9 is described in detail.
First, the slave merges meta clocks in metadata of the data V8 and the data V9 in a merging manner based on a common substring.
As shown in fig. 2, the timestamp corresponding to meta clock in the metadata of data V9 is: 1. 3, 2, 7;
the timestamp corresponding to meta clock in the metadata of data V8 is: 3. 2, 4, 5 and 6.
Then, the common substring of the two is: 3. 2, the results are 1, 3, 2, which are combined for the first time.
Second, the slave device sorts the non-common part of meta clock in the metadata of the data V8 and the data V9 by time.
The final result is: 1. 3, 2, 4, 5, 6 and 7. Since the timestamp "7" is the timestamp corresponding to meta clock in the metadata of data V9, at the same time, the timestamp "7" is the latest timestamp in the meta clock result in the metadata of merged data V8 and data V9.
Thus, the slave device determines that the data V9 is the latest version of the data of the same name, and overwrites the data V8.
FIG. 2 illustrates a meta clock based asynchronous replication method. There are the following disadvantages:
the method based on the asynchronous replication of the meta clock needs to add the meta clock in the metadata of the data of each version of the same-name data, thereby increasing the cost.
In the method based on meta clock asynchronous copying, the main device needs to pre-read metadata of data of each version of the same-name data in the asynchronous copying process, so that the asynchronous copying performance is influenced.
And thirdly, when the number of the meta clocks stored by the master device and/or the slave device exceeds the upper limit of the number of the meta clocks which can be stored by the master device and/or the slave device, the final consistency of the asynchronous copying cannot be guaranteed.
Specifically, the third point of the above-described disadvantages is explained in conjunction with fig. 3.
Briefly described below in conjunction with FIG. 3, how a slave device determines the latest version of the data of the same name when the number of meta clocks stored by the master device exceeds the upper limit of the number of meta clocks that the master device is capable of storing.
Briefly described below with reference to fig. 3, how the slave device determines the latest version of the same-name data when the number of meta clocks stored in the master device exceeds the upper limit of the number of meta clocks stored in the master device and a time jump occurs when the master device receives data of different versions of the same-name data.
FIG. 3 is a schematic diagram of another asynchronous meta clock-based copy method. The system comprises a master device and a slave device.
Assuming that the master device generates a time jump, the data V11 may be the oldest historical version according to the meta clock principle, and then the master device uploads 11 data in turn, with the versions V0-V11, assuming that V11 finally arrives at the slave device. The size of the meta clock list in the master is limited to 11. The data history version of the data with the same name is V11-V9, and when the data V10 arrives at the main device, the data V11 is removed from the meta clock list.
When the slave device compares meta clocks in the metadata of each data, the meta clocks in the metadata of each data are found to have no common substring, the timestamp of each data is compared, the data V11 is considered to be the maximum, other data should be covered, and at this time, the slave device can determine that the data V11 is the latest version of the data with the same name according to the comparison of the timestamps, and finally data inconsistency is caused.
In order to solve the problem of the final consistency method of asynchronous data replication in fig. 2 or fig. 3. The application provides a method for copying data, which is used for reducing the metadata size of the data and realizing the consistency of the data between a master device and a slave device when the asynchronous data copying is carried out.
The method for copying data in the present application is described in detail below with reference to fig. 4 and 5. The method for copying data in the present application can be applied to at least two data centers, and the following is a brief description of the method for copying data in the present application, taking an example that a master device and a slave device are respectively located in two different data centers.
Fig. 4 is a schematic diagram of a method for copying data according to the present application. The method comprises five steps of a master device, a slave device and S410-S450, and the five steps are described in detail below.
Specifically, in this embodiment of the present application, the copying data includes: object replication, file replication, block replication, and the like.
Optionally, in the method for copying data provided in this embodiment of the present application, optionally, the data copying between the master device and the slave device is for data that needs to be copied, that is, before the master device sends a copy data request message to the slave device, it may be determined whether the data needs to be copied.
Specifically, fig. 4 includes S410, the master device determines the state of the second data.
In order to reduce the number of information transmission times between the slave device and the master device under the condition of large coverage among multiple versions of data with the same name.
The master device determines the state of the second data before sending a first request message to the slave device aiming at the second data and instructing the slave device to acquire the second data, and sends the first request message to the slave device when the state of the second data is in an uncovered state and instructs the slave device to acquire the second data. The state of the second data that is not covered may also be referred to as a non-deleted state, and the first request message may also be referred to as a duplicated data request message.
In the following, in order to avoid loss of generality, taking as an example that the master device determines whether to transmit the duplicate data request message for a certain version of data of the same name according to the state of the certain version of data, the feasibility of the master device determining whether to transmit the duplicate data request message according to the state of the certain version of data will be described.
The duplicate data request message is a general summary of the first request message. It should be appreciated that the first request message described above is a duplicate data request message for the second data transmission.
Specifically, the following two cases are included in the master device deciding whether to send the duplicate data request message for a certain data to the slave device according to the state of the data:
the first condition is as follows: the state of the data in the master device is a deleted (overwritten) state. I.e. the data has been overwritten in the master device with data of a newer version than the data version. If the master device determines that the status of the data is a deleted status, the master device does not send the request message for copying the data any more.
For example, the master device receives data (referred to as first data) uploaded by the user and aiming at one version of the same-name data a, and then, the master device receives data (referred to as second data) uploaded by the user and aiming at the other version of the same-name data a, wherein the second data is a new version of data, namely, the second data overwrites the first data.
When the master device prepares to transmit a transmission copy data request message for the first data to the slave device, it is determined that the first data has been overwritten and is in a deleted state. Then the master device no longer sends a duplicate data request message for the first data to the slave device.
Case two: the state of the data in the master is a non-deleted (not overwritten) state, i.e. the data has not been overwritten in the master. If the master device determines that the data is in a non-deleted state, the master device sends a copy data request message for the data.
For example, the master device receives data (referred to as first data) uploaded by the user and aiming at one version of the same-name data a, and then, the master device receives data (referred to as second data) uploaded by the user and aiming at the other version of the same-name data a, wherein the second data is a new version of data, namely, the second data overwrites the first data.
When the master device prepares to send a copy data sending request message for the first data to the slave device, it is determined that the first data is stored in the master device in an undeleted state, and the first data is not yet overwritten by the second data due to asynchronous copying. Then, the master device transmits a replication data request message for the first data to the slave device.
It should be understood that in the case shown in case two, the time when the master device prepares to send the duplicate data request message for the first data to the slave device may be the time when the master device receives another version of data (referred to as second data) uploaded by the user for the same-name data a. That is, the sending of the data replication request message by the master device and the receiving of the data uploaded by the user are two threads which do not interfere with each other and can be performed simultaneously.
Specifically, the master device sends a copy data request message for the data to the slave device when determining that the data needs to be copied.
It should be understood that, in the embodiment of the present application, the number of the slave devices is not limited, and the master device may simultaneously send a duplicate data request message for the data to the multiple slave devices.
In the following, taking an example that the master device requests the slave device to copy the second data when the second data is in an uncovered state, how the master device requests the slave device to copy the second data in the embodiment of the present application is described in detail.
Specifically, fig. 4 includes S420, the master device transmitting a first request message to the slave device.
When the master device sends the first request message, instructing the slave device to acquire the second data and overwriting the first data with the second data, including the following situations:
the first condition is as follows: the second data is data for which the master device requests the copying of the data of the same name for the first time, and the master device does not have an overwriting operation for data of different versions of the data of the same name.
It is to be understood that the second data is data of the first version of the data of the same name. And no other version of data in the primary device is overwritten by the second data. Then, the first request message does not need to instruct the slave device to overwrite certain data with the second data.
And after receiving the first request message, the slave equipment directly copies the second data according to the first request message. Since the second data is the data that is the first request to be copied, and the second data does not overwrite other versions of data, that is, there is no data of other versions of the same name data to which the second data belongs in the slave device, there is no conflict of data of different versions of the same name data being downloaded in the slave device.
For example, the second data is data of a first version of the same-name data a uploaded to the host device by the user, and specifically, the version number of the second data is V2; the master device sends a first request message to the slave device. Then, the slave device receives the first request message, and stores V2 in the metadata corresponding to the data a of the same name to obtain the second data.
Specifically, when the first request message carries the second data, one possibility of obtaining the second data from the device is to directly receive the second data and store the second data; alternatively, the first and second electrodes may be,
when the first request message does not carry the second data and carries the first identifier of the second data, one possibility that the slave device obtains the second data is that the slave device can obtain the second data from the master device according to the first identifier of the second data and copy the second data.
It should be understood that in the first case, since there is no overlay between different versions, the data versions of the same-name data stored in the slave device and the master device are consistent. The method mainly researches the same-name data stored in the master device and the slave device respectively, and how to ensure that the latest versions of the same-name data in the master device and the slave device are consistent when the same-name data are subjected to multiple coverage among different versions of the same-name data in the master device and the slave device and asynchronous copying between the master device and the slave device, and further researches are not carried out on the condition.
Case two: the master device sends a duplicate data request message for multiple times for different versions of the same-name data. Then there is an override action between the different versions of data. That is, the first request message for the second data transmission is not a message for the first time requesting to copy data, which is transmitted from the master device to the slave device.
If the slave device receives the time sequence of the copy data request message of the data with different versions of the same name data, the time sequence is consistent with the downloading time sequence of the data with different versions of the same name data in the master device.
Since data of other versions of the same-name data exists in the slave device, a conflict may occur when the slave device stores metadata according to the copy data request message.
And the slave device reads the latest version of the data with the same name on the disk of the slave device again, determines that the version number of the latest version of the data is consistent with the version number of the data indicated to be covered in the received copy data request message, executes the covering action by the slave device, and copies the data carried in the copy data request message.
For example, at a first time, the master device downloads first data, where the first data is data of one version of the data with the same name; and at the second moment, the main device downloads second data, wherein the second data is the data of the other version of the same-name data, and the first moment is earlier than the second moment, so that the second data is the data of the latest version of the same-name data. The second data overwrites the first data in the master device.
The slave device receives a second request message at a third moment to request to obtain the first data; the slave device receives the first request message at a fourth time, the second data is requested to be obtained, and the third time is earlier than the fourth time.
Furthermore, when the slave device receives the first request message at the fourth time and downloads the metadata of the second data, a conflict may occur because the slave device already stores the first data. The slave device reads the latest version of the same-name data on the disk, determines that the latest version on the disk is the first data, and the first request message indicates that the second data overwrites the first data already saved in the slave device. Then the slave device determines that the second data is the latest version of the same-name data and performs the override action.
It should be understood that the second case illustrates the method for copying data in the present application, which is often the case. That is, when the first request message instructs the slave device to acquire the second data and overwrite the second data on the first data, the latest version of the data with the same name stored in the slave device is the first data, and the slave device can complete acquisition of the second data and overwrite the first data according to the first request message.
Case three: the master device sends a duplicate data request message for multiple times for different versions of the same-name data. Then there is an override action between the different versions of data. That is, the first request message for the second data transmission is not a message for the first time requesting to copy data, which is transmitted from the master device to the slave device.
If the slave device receives the time sequence of the copy data request message of the data with different versions of the same name data, the time sequence is not consistent with the downloading time sequence of the data with different versions of the same name data in the master device.
When the slave device is downloading the disc metadata, the received replication data request message indicates that the overwritten data does not exist in the slave device. At this time, the slave device cannot determine the covering relationship between the metadata of the data to be copied carried in the received copy data request message and the data on the disk.
For example, at a first time, the master device downloads first data, where the first data is data of one version of the data with the same name; and at the second moment, the main device downloads second data, wherein the second data is the data of the other version of the same-name data, and the first moment is earlier than the second moment, so that the second data is the data of the latest version of the same-name data. The second data overwrites the first data in the master device.
The slave equipment receives a second request message at a fourth moment and instructs to acquire the first data; the slave device receives the first request message at a third time to instruct to acquire the second data, wherein the third time is earlier than the fourth time.
The slave device receives a first request message at a third time, the first request message indicating that the data overwritten by the second data is the first data, reads the latest version of the data of the same name on the disc from the slave device, and does not have the first data since the third time is earlier than the fourth time, that is, the third time, the first data does not arrive at the slave device, at which time the slave device cannot perform overwriting of the first data by the second data. In this case, it is assumed that the version of the data of the same name stored in the slave device disk is third data, which is different from the second data and the first data and is data of another version of the data of the same name.
In the third case, the slave device cannot acquire the second data according to the first request message and overwrites the first data. The second data cannot be copied by determining the overlay relationship between the third data and the second data. In this case, S430 is performed, and the slave device transmits a second request message to the master device.
Specifically, when the slave device receives the first request message, the coverage relationship between the third data and the second data cannot be determined. At this time, the slave device sends a second request message to the master device, where the second request message is used to query the coverage relationship between the second data and the third data in the master device.
Further, S440 is executed, and the master device sends second indication information to the slave device.
Specifically, the second indication information is used to indicate a coverage relationship between the second data and the third data in the primary device.
Optionally, in some embodiments, the second indication information includes:
information of a state of the second data and the third data. That is, the master device transmits the states of the third data and the second data in the master device to the slave device.
For example, the second data is overwritten by the third data in the master device. Then, the second indication information indicates that the second data is in a deleted state and the third data is in a non-deleted state.
Optionally, in some embodiments, the second indication information includes:
and the second data and the third data are respectively information of the time of downloading in the main equipment. That is, the master device transmits the third data and the second data in the time information of the download in the master device to the slave device.
For example, the second data is downloaded from the host device at the first time, and the third data is downloaded from the host device at the second time. And the first time is earlier than the second time. From the time information, it can be determined that the third data overwrites the second data in the master device. And further, the slave device determines the coverage relation of the third data and the second data according to the second indication information.
Specifically, after the slave device receives the second indication information sent by the master device, the coverage relationship between the third data and the second data can be determined according to the second indication information.
For example, the second indication information indicates the state of the third data and the second data in the master device: the second data is in a deleted state, and the third data is in a non-deleted state. Then, the slave device updates the metadata on the slave device disc according to the second indication information, takes the third data as the data of the latest version of the same-name data, and does not copy the second data;
or, the second indication information indicates the state of the third data and the second data in the master device: the third data is in a deleted state, and the second data is in a non-deleted state. The slave device updates the metadata on the slave device disk according to the second indication information, treats the second data as the latest version of the data of the same name, and copies the second data.
For example, the second indication information indicates information of respective times of downloading of the third data and the second data in the master device:
the second data is downloaded at a first time, the third data is downloaded at a second time, and the first time is earlier than the second time. Then, the slave device determines that the third data is the data of the latest version of the same-name data according to the second indication information, and does not copy the second data; alternatively, the first and second electrodes may be,
the downloading time of the third data is the first time, the downloading time of the second data is the second time, and the first time is earlier than the second time. The slave device updates the metadata on the slave device disk according to the second indication information, treats the second data as the latest version of the data of the same name, and copies the second data.
The first request message is used for instructing the slave device to obtain second data and overwriting the second data with the first data, wherein the master device and the slave device are located in different data centers, the first data and the second data are data of two different versions of data with the same name, and the version of the second data is later than that of the first data.
Specifically, the information included in the first request message may be the following cases:
in the first case: the first request message includes first indication information indicating that the version of the second data is later than the version of the first data. In this case, the slave device determines the overlay relationship of the first data and the second data according to the first indication information.
Specifically, the first request message further includes second data:
when the first request message includes the second data, the slave device receives the first request message, and first determines whether the first data covered by the second data is stored on a local disk according to the first indication information.
Alternatively, when the latest version of the data of the same name saved on the disk of the slave device is the first data, the slave device directly receives the second data and saves the second data. And then, covering the first data on the slave device disc according to the first indication information to finish the acquisition of the second data. Furthermore, the latest version of the data of the same name stored in the master device is the second data, and the latest version of the data of the same name stored in the slave device is also the second data, so that the final consistency is achieved.
Optionally, when the latest version of the data with the same name saved on the disk of the slave device is not the first data but the third data, refer to case three in S420, and will not be described herein again.
Specifically, the first request message includes a first identifier of the second data, and does not include the second data:
when the first request message includes the first identifier of the second data, the slave device receives the first request message, and first determines whether the first data covered by the second data is stored on a local disk according to the first indication information.
Alternatively, when the latest version of the same-name data saved on the disk of the slave device is the first data. Firstly, the slave device acquires the second data from the master device according to the first identifier of the second data in the first request message, copies the second data, and then covers the first data on the disk of the slave device according to the first indication information to complete the acquisition of the second data. Furthermore, the latest version of the data of the same name stored in the master device is the second data, and the latest version of the data of the same name stored in the slave device is also the second data, so that the final consistency is achieved.
Optionally, when the latest version of the data with the same name saved on the disk of the slave device is not the first data but the third data, refer to case three in S420, and will not be described herein again.
Specifically, the first identifier of the second data may be a version number of the second data, or other identification information capable of identifying the second data.
In the second case: the first request message does not include the first indication information. Only the second identification of the second data is included in the first request message. In this case, the slave device can determine the overlay relationship of the first data and the second data according to the second identifier of the second data.
Specifically, the determining, by the slave device, the coverage relationship between the first data and the second data according to the second identifier of the second data includes:
regarding the collection of the data centers as a system, a global scheduler (for example, a scheduling server) exists in the system, and the global scheduler assigns identifications to data of different versions of the same-name data, and the identification of the data can be a globally unique and monotonically increasing array.
When the second identifier of the second data is acquired from the equipment, the size of the identifier of the first data and the size of the second identifier of the second data are compared, when the second identifier of the second data is larger than the identifier of the first data, the second data is more updated than the first data, and the second data is determined to cover the first data.
It should be understood that the second identifier of the second data in the second case is different from the first identifier of the second data in the first case, and the second identifier of the second data in the second case is one of the globally incremented series, and cannot be taken as the identifier of the second data configuration; whereas in the first case the first identification of the second data is only a unique identification of the data.
Specifically, the first request message may further include second data. When the latest version of the same-name data stored on the disk of the slave device is the first data, the slave device directly receives the second data and stores the second data, and then the second data is determined to be more updated than the first data according to the condition that the second identification of the second data is larger than the identification of the first data, so that the first data on the disk is covered by the second data. Furthermore, the latest version of the data of the same name stored in the master device is the second data, and the latest version of the data of the same name stored in the slave device is also the second data, so that the final consistency is achieved.
Alternatively, when the latest version of the data of the same name stored on the disk of the slave device is not the first data but the third data, the size of the second identifier of the second data is compared with the size of the identifier of the third data, and if the second identifier of the second data is larger, the slave device directly receives the second data and stores the second data.
Specifically, the first request message may not include the second data: optionally, when the latest version of the data with the same name stored on the disk of the slave device is the first data, the slave device first obtains the second data from the master device according to the second identifier of the second data in the first request message, and copies the second data. And determining that the second data is more updated than the first data according to the fact that the second identification of the second data is larger than the identification of the first data, so that the second data is overlaid on the first data on the disk. Furthermore, the latest version of the data of the same name stored in the master device is the second data, and the latest version of the data of the same name stored in the slave device is also the second data, so that the final consistency is achieved.
Optionally, when the latest version of the data with the same name stored on the disk of the slave device is not the first data but the third data, the size of the second identifier of the second data is compared with the size of the identifier of the third data, and if the second identifier of the second data is larger, the slave device first acquires the second data from the master device according to the second identifier of the second data in the first request message, and copies the second data. And determining that the second data is more updated than the third data according to the fact that the second identification of the second data is larger than the identification of the third data, so that the second data is overlaid on the third data on the disk.
Specifically, the second identifier of the second data may be a version number of the second data, or other identification information capable of identifying the second data.
In the third case: the first request message does not include the first indication information. Only clock information of the second data is included in the first request message. In this case, the slave device can determine the overlay relationship of the first data and the second data according to the clock information of the second data.
Specifically, how the slave device judges the coverage relationship of the first data and the second data according to the clock information of the second data includes:
the set of the data centers is regarded as a system, a global scheduler (such as a scheduling server) exists in the system, the global scheduler distributes clock information according to the downloading time of the data with the same name and different versions, the clock information of the data is globally unique, and the clock information of the data is sequentially in one-to-one correspondence with the downloading time. It can be understood that the clock information of the lower early data indicates an early time and the clock information of the lower late data indicates a late time, in which case a global clock is required, to which the clock information of both the slave device and the master device shall be based.
When the slave device acquires the clock information of the second data, the clock information of the first data and the clock information of the second data are compared, if the clock information of the second data is later than the clock information of the first data, the second data is updated than the first data, and the second data is determined to cover the first data.
Specifically, the first request message may further include second data: when the latest version of the same-name data stored on the disk of the slave device is the first data, the slave device directly receives the second data and stores the second data, and then the second data is determined to be updated compared with the first data according to the clock information of the second data, which is later than the clock information of the first data, so that the first data on the disk is covered by the second data. Furthermore, the latest version of the data of the same name stored in the master device is the second data, and the latest version of the data of the same name stored in the slave device is also the second data, so that the final consistency is achieved.
Alternatively, when the latest version of the data of the same name stored on the disk of the slave device is not the first data but the third data, the clock information of the second data is compared with the clock information of the third data in the morning and evening, and if the clock information of the second data is later, the slave device directly receives the second data and stores the second data.
Specifically, the first request message may not include the second data: when the latest version of the data with the same name stored on the disk of the slave device is the first data, the slave device firstly acquires the second data from the master device according to the clock information of the second data in the first request message, and copies the second data. And determining that the second data is updated compared with the first data according to the clock information of the second data, which is later than the clock information of the first data, so that the second data is overlaid on the first data on the disk. Furthermore, the latest version of the data of the same name stored in the master device is the second data, and the latest version of the data of the same name stored in the slave device is also the second data, so that the final consistency is achieved.
Alternatively, when the latest version of the data of the same name stored on the disk of the slave device is not the first data but the third data, the clock information of the second data is compared with the clock information of the third data, and if the clock information of the second data is later, the slave device first acquires the second data from the master device according to the clock information of the second data in the first request message and copies the second data. And determining that the second data is more updated than the third data according to the clock information that the clock information of the second data is later than the clock information of the third data, so that the second data is overlaid on the third data on the disk.
In the following, mainly, the signaling interaction between the master device and the slave device when the first request message is the message shown in the first case is described.
It should be understood that the above-mentioned homonymous data also includes other versions of data, and the above-mentioned first data and second data are only data in which any two of the multiple versions of homonymous data have a direct overlay relationship.
Specifically, the version number of the first data is a first version number, and the version number of the second data is a second version number.
Optionally, in some embodiments, the first indication information is carried in a header field of the first request message.
Optionally, in other embodiments, the first indication information is a version number of the first data.
For example, the first indication information is a version number (first version number) carrying the first data in a header field of the first request message. The data covered by the second data in the master device is provided for the slave device as indication information of the first data with the first version number.
Specifically, the version number of the first data is V1, and the version number of the second data is V2. Then V1 is carried in the header field of the first request message.
Optionally, in some embodiments, the second data does not overwrite any data in the master device. Then, the header field of the first request message is empty.
Specifically, the header field of the first request message is declared as an override version number (covered _ date).
It should be understood that each version of data of the same name is identified using a unique version number.
S450, the slave device sends a first response message to the master device.
Specifically, the first reply message is used to indicate that the slave device successfully copies the second data.
The above-described method for copying data is described below with reference to a specific embodiment.
Specifically, in this embodiment, the coverage relationship between the versions of the same-name data is determined by the order of downloading the versions of the data, where downloading the metadata is controlled by modifying CV in the metadata in an exclusive manner.
FIG. 5 is a diagram illustrating an embodiment of a method for replicating data. The method comprises twelve steps of S510-S590, wherein the steps comprise a user, a master device and a slave device.
S510, a user uploads data (called as first data, and the version number is A) of a first version of the data with the same name.
S511, the user uploads the second version of the data (called second data, version number B) with the same name.
S512, the user uploads the data (called as third data, with the version number of C) of the third version of the data with the same name.
It should be understood that the users performing the above S510, S511 and S512 may be the same user or different users.
When the user is the same, the user can be understood as continuously uploading data of three versions of the same-name data;
when the users are different users, it can be understood that the users upload data of three versions of the same-name data respectively, the users can upload the same-name data at the same time, and the master device determines the precedence relationship of the data of different versions of the same-name data uploaded by different users according to the time for receiving the data of the three versions of the same-name data.
S520, the master device sends a third replication data request message to the slave device.
Specifically, in the present embodiment, it is assumed that the master device transmits a duplicated data request message to the slave device for each of three different versions of the same-name data.
Due to uncontrollable factors such as thread scheduling or network status. Leading to. The master device preferentially arrives at the slave device for a third duplicated data request message of the data (third data) of the third version of the same-name data.
Optionally, the third replication data request message carries the third data and indication information indicating that the third data covers the second data in the primary device.
Optionally, the third replication data request message does not carry the third data, and carries an identifier of the third data, and the slave device can obtain the third data from the master device according to the identifier of the third data.
The slave device is preferentially reached due to the third duplicated data request message. Then, the second data does not exist in the slave device, and the slave device does not have any version of data of the same-name data, and the slave device downloads the third data as the latest version of the same-name data successfully.
Further, S521 is executed, in which the slave device sends a second response message to the master device, indicating that the copying of the third data is successful.
S530, the master device transmits a second duplicated data request message to the slave device.
Due to uncontrollable factors such as thread scheduling or network status. Leading to. The master device arrives at the slave device after the third duplicated data request message for the second version of the data (second data) of the same name data arrives at the slave device.
Optionally, the second replication data request message carries the second data and indication information indicating that the second data covers the first data in the primary device.
Optionally, the second replication data request message does not carry the second data, and carries an identifier of the second data, and the slave device can obtain the second data from the master device according to the identifier of the second data.
Since the first data does not exist in the slave device, and the latest version of the same-name data exists in the slave device as the third data. At this time, the slave device cannot determine which of the second data and the third data is the latest version of the data of the same name. S540 is executed, the slave device sends a query message to the master device.
The query message is used for querying the coverage relation of the second data and the third data in the main device.
And S550, the master device sends third indication information to the slave device.
The third indication information indicates that the second data has been deleted in the master device, and the third data is the latest version of the same-name data.
And S560, the slave device determines not to save the second data according to the third indication information.
Further, S561 is executed, in which the slave device sends a third response message to the master device, indicating that the copying of the second data is successful.
S570, the master device sends a first duplicated data request message to the slave device.
Due to uncontrollable factors such as thread scheduling or network status. Leading to. The master device sends a first duplicate data request message for the first version of the data (first data) of the same name to the slave device after the second duplicate data request message reaches the slave device.
It should be understood that the first duplicated data request message carries the first data, and a header of the first duplicated data request message is empty. I.e. the first replication data request message does not carry any overlay information. When the slave device receives the first copy data request message, it determines that the first data is the oldest version of the data of the same name based on the first copy data request message, and the slave device has stored therein the third data. Then the first data is deleted from the device.
Further, S571 is executed, in which the slave device sends a fourth response message to the master device, indicating that the copying of the first data is successful.
In particular, the master and slave devices in fig. 4 and 5 may be buckets in different data centers. The master device is a source bucket, and the slave device is a target bucket.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
The master device and the slave device provided by the embodiment of the present application are described below with reference to fig. 6 to 9.
Fig. 6 is a schematic diagram of a master device according to an embodiment of the present application. The apparatus 600 shown in fig. 6 includes a receiving unit 610, a processing unit 620, and a transmitting unit 630. The apparatus 600 may be configured to perform the steps performed by the master device in the methods shown in fig. 4 and fig. 5.
A sending unit 630, configured to send a first request message to a slave device, where the first request message is used to instruct the slave device to obtain second data and overwrite the second data with first data stored in the slave device, and the master device and the slave device are located in different data centers,
the first data and the second data are data of two different versions of the same-name data, and the version of the second data is later than that of the first data.
A receiving unit 610, configured to receive a first reply message sent by the slave device, where the first reply message is used to indicate that the slave device successfully copies the second data.
Specifically, the first request message includes first indication information, where the first indication information is used to indicate that the version of the second data is later than the version of the first data.
Optionally, the first indication information is carried in a header field of the first request message.
Optionally, the first indication information is a version number of the first data.
Specifically, when the latest version of the data of the same-name data stored in the slave device is not the first data but the third data, the receiving unit 610 is further configured to receive a second request message sent by the slave device, where the second request message is used to query the coverage relationship between the second data and the third data in the master device, and the third data is a version of the data of the same-name data;
the sending unit 630 is further configured to send second indication information to the slave device, where the second indication information is used to indicate a coverage relationship between the second data and the third data in the master device.
Specifically, the second indication information includes:
information of states of the second data and the third data; alternatively, the first and second electrodes may be,
and the second data and the third data are respectively information of the time of downloading in the main equipment.
Specifically, before the sending unit 630 sends the first request message to the slave device, the apparatus further includes: a processing unit 620, configured to determine a state of the second data, where the state of the second data is an uncovered state.
In an alternative embodiment, the apparatus 600 may also be the main device 700, and specifically, the processing unit 620 may be the processor 720, and the receiving unit 610 and the sending unit 630 may be the input/output interface 730. The main device 700 may further include a memory 710 and a hard disk 740, as shown in fig. 7 in particular.
Fig. 7 is a schematic block diagram of a master device of another embodiment of the present application. The master device 700 shown in fig. 7 may include: memory 710, processor 720, input/output interface 730, and hard disk 740. The memory 710, the processor 720, the input/output interface 730 and the hard disk 740 are connected through a communication connection, the memory 710 is used for storing program instructions, the processor 720 is used for executing the program instructions stored in the memory 720 to control the input/output interface 730 to receive input data and information and output data such as operation results, and the data and information received by the input/output interface 730 can be stored in the hard disk 740, for example, the hard disk 740 is used for storing data of the same name.
It should be understood that, in the embodiment of the present Application, the processor 720 may adopt a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, for executing related programs to implement the technical solutions provided in the embodiments of the present Application.
The memory 710, which may include both read-only memory and random-access memory, provides instructions and data to the processor 720. A portion of processor 720 may also include non-volatile random access memory. For example, processor 720 may also store information of the device type.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 720. The method disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 710, and the processor 720 reads the information in the memory 710 and performs the steps of the method in combination with the hardware. To avoid repetition, it is not described in detail here.
Fig. 8 is a schematic diagram of a slave device according to an embodiment of the present application. The apparatus 800 shown in fig. 8 comprises a receiving unit 810, a processing unit 820 and a transmitting unit 830. The apparatus 800 may be configured to perform the steps performed by the slave device in the methods shown in fig. 4 and fig. 5.
A receiving unit 810, configured to receive a first request message sent by a master device, where the first request message is used to instruct a slave device to obtain second data and to overwrite the second data with first data stored in the slave device, and the master device and the slave device are located in different data centers respectively,
the first data and the second data are data of two different versions of the same-name data, and the version of the second data is later than that of the first data;
a processing unit 820, configured to obtain the second data according to the first request message, and overwrite the first data that is locally stored with the second data;
a sending unit 830, configured to send a first reply message to the master device, where the first reply message is used to indicate that the slave device successfully copies the second data.
Specifically, the first request message includes first indication information indicating that the version of the second data is later than the version of the first data.
Optionally, the first indication information is carried in a header field of the first request message.
Optionally, the first indication information is a version number of the first data.
Optionally, when the latest version of the data of the same-name data stored in the slave device is not the first data but a third data, the sending unit 830 is further configured to send a second request message to the master device, where the second request message is used to query an overlay relationship between the second data and the third data, and the third data is a version of the data of the same-name data;
the receiving unit 810 is further configured to receive second indication information sent by the master device, where the second indication information is used to indicate a coverage relationship between the second data and the third data.
Specifically, the second indication information includes:
information of states of the second data and the third data; alternatively, the first and second electrodes may be,
and the second data and the third data are respectively information of the time of downloading in the main equipment.
In an alternative embodiment, the apparatus 800 may also be a slave device 900, and specifically, the processing unit 820 may be a processor 920, and the receiving unit 810 and the sending unit 830 may be an input/output interface 930. The slave device 800 may further include a memory 910 and a hard disk 940, as shown in fig. 9 in particular.
Fig. 9 is a schematic block diagram of a slave device of another embodiment of the present application. The slave device 900 shown in fig. 9 may include: memory 910, processor 920, input/output interface 930, and hard disk 940. The memory 910, the processor 920, the input/output interface 930 and the hard disk 940 are connected through a communication connection, the memory 910 is configured to store program instructions, the processor 920 is configured to execute the program instructions stored in the memory 920 to control the input/output interface 930 to receive input data and information and output data such as operation results, and the data and information received by the input/output interface 930 may be stored in the hard disk 940, for example, the hard disk 940 is configured to store copied data of the same name.
It should be understood that, in the embodiment of the present Application, the processor 920 may adopt a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, for executing related programs to implement the technical solutions provided in the embodiments of the present Application.
The memory 910 may include both read-only memory and random-access memory, and provides instructions and data to the processor 920. A portion of processor 920 may also include non-volatile random access memory. For example, the processor 920 may also store information of device types.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 920. The method disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 910, and the processor 920 reads the information in the memory 910, and performs the steps of the above method in combination with the hardware thereof. To avoid repetition, it is not described in detail here.
It should be understood that in the embodiments of the present application, the processor may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should be further understood that in the embodiment of the present application, the hard disk (HDD) as one of the storage media of the master device and the slave device may be a Solid State Disk (SSD), a mechanical hard disk (mechanical hard disk), a hybrid hard disk (SSHD), and the like.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (16)

1. A method for replicating data, comprising:
the method comprises the steps that a master device sends a first request message to a slave device, wherein the first request message is used for instructing the slave device to obtain second data and enabling the second data to cover first data stored in the slave device, the master device and the slave device are respectively located in different data centers, the first data and the second data are data of two different versions of data with the same name, and the version of the second data is later than that of the first data;
the master device receives a first reply message sent by the slave device, wherein the first reply message is used for indicating that the slave device successfully copies the second data.
2. The method according to claim 1, wherein the first request message includes first indication information indicating that the version of the second data is later than the version of the first data.
3. The method of claim 2, wherein the first indication information is a version number of the first data.
4. The method according to any of claims 1-3, wherein prior to the master device sending a first request message to a slave device, the method further comprises:
the master device determines a state of the second data, the state of the second data being an uncovered state.
5. A method for replicating data, comprising:
receiving a first request message sent by a master device by a slave device, wherein the first request message is used for instructing the slave device to obtain second data and covering the second data with first data stored in the slave device, the master device and the slave device are respectively located in different data centers,
the first data and the second data are data of two different versions of the same-name data, and the version of the second data is later than that of the first data;
the slave device obtains the second data according to the first request message, and covers the first data which are locally stored with the second data;
the slave device sends a first reply message to the master device, wherein the first reply message is used for indicating that the slave device successfully copies the second data.
6. The method according to claim 5, wherein the first request message includes first indication information indicating that the version of the second data is later than the version of the first data.
7. The method of claim 6, wherein the first indication information is a version number of the first data.
8. A master device, comprising:
a sending unit, configured to send a first request message to a slave device, where the first request message is used to instruct the slave device to obtain second data and overwrite the second data with first data stored in the slave device, and the master device and the slave device are located in different data centers respectively,
the first data and the second data are data of two different versions of the same-name data, and the version of the second data is later than that of the first data;
a receiving unit, configured to receive a first reply message sent by the slave device, where the first reply message is used to indicate that the slave device successfully copies the second data.
9. The master device according to claim 8, wherein the first request message includes first indication information indicating that the version of the second data is later than the version of the first data.
10. The master device of claim 9, wherein the first indication information is a version number of the first data.
11. The master device according to any one of claims 8 to 10, wherein before the sending unit sends the first request message to the slave device, the master device further comprises:
and the processing unit is used for determining the state of the second data, and the state of the second data is an uncovered state.
12. A slave device, comprising:
a receiving unit, configured to receive a first request message sent by a master device, where the first request message is used to instruct a slave device to obtain second data and to overwrite the second data with first data stored in the slave device, and the master device and the slave device are located in different data centers respectively,
the first data and the second data are data of two different versions of the same-name data, and the version of the second data is later than that of the first data;
the processing unit is used for acquiring the second data according to the first request message and covering the first data which is locally stored with the second data;
a sending unit, configured to send a first reply message to the master device, where the first reply message is used to indicate that the slave device successfully copies the second data.
13. The slave device of claim 12, wherein the first request message includes first indication information indicating that the version of the second data is later than the version of the first data.
14. The slave device of claim 13, wherein the first indication information is a version number of the first data.
15. A master device, comprising at least one processor and at least one memory, the at least one memory being configured to store a computer program, the at least one processor being configured to invoke and execute the computer program from the at least one memory, such that the master device performs the method of any one of claims 1-4;
the main equipment also comprises a hard disk, and the hard disk is used for storing the data with the same name.
16. A slave device, comprising at least one processor and at least one memory, the at least one memory being configured to store a computer program, the at least one processor being configured to invoke and execute the computer program from the at least one memory, such that the slave device performs the method of claims 5-7;
the slave device further comprises a hard disk, and the hard disk is used for storing the data with the same name.
CN201810982126.1A 2018-08-27 2018-08-27 Method for copying data, master device and slave device Active CN109164985B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810982126.1A CN109164985B (en) 2018-08-27 2018-08-27 Method for copying data, master device and slave device
PCT/CN2019/098307 WO2020042852A1 (en) 2018-08-27 2019-07-30 Method for copying data, and master device, and slave device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810982126.1A CN109164985B (en) 2018-08-27 2018-08-27 Method for copying data, master device and slave device

Publications (2)

Publication Number Publication Date
CN109164985A CN109164985A (en) 2019-01-08
CN109164985B true CN109164985B (en) 2020-07-07

Family

ID=64896793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810982126.1A Active CN109164985B (en) 2018-08-27 2018-08-27 Method for copying data, master device and slave device

Country Status (2)

Country Link
CN (1) CN109164985B (en)
WO (1) WO2020042852A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109164985B (en) * 2018-08-27 2020-07-07 华为技术有限公司 Method for copying data, master device and slave device
CN110493338B (en) * 2019-08-20 2022-09-13 深圳柚石物联技术有限公司 Equipment mutual control method, system and computer readable storage medium
CN114077448A (en) * 2020-08-11 2022-02-22 深圳云天励飞技术股份有限公司 Data management method and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101154234A (en) * 2006-09-26 2008-04-02 国际商业机器公司 System, method and computer program product for managing data versions
CN104461774A (en) * 2014-11-24 2015-03-25 华为技术有限公司 Asynchronous replication method, device and system
CN106528338A (en) * 2016-10-28 2017-03-22 华为技术有限公司 Remote data replication method, storage equipment and storage system
US9720620B1 (en) * 2014-03-11 2017-08-01 Amazon Technologies, Inc. Efficient data volume replication for block-based storage

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015081473A1 (en) * 2013-12-02 2015-06-11 华为技术有限公司 Asynchronous replication method, apparatus and system
US9830333B1 (en) * 2014-06-27 2017-11-28 Amazon Technologies, Inc. Deterministic data replication with conflict resolution
CN105843702B (en) * 2015-01-14 2019-04-12 阿里巴巴集团控股有限公司 A kind of method and device for data backup
CN109164985B (en) * 2018-08-27 2020-07-07 华为技术有限公司 Method for copying data, master device and slave device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101154234A (en) * 2006-09-26 2008-04-02 国际商业机器公司 System, method and computer program product for managing data versions
US9720620B1 (en) * 2014-03-11 2017-08-01 Amazon Technologies, Inc. Efficient data volume replication for block-based storage
CN104461774A (en) * 2014-11-24 2015-03-25 华为技术有限公司 Asynchronous replication method, device and system
CN106528338A (en) * 2016-10-28 2017-03-22 华为技术有限公司 Remote data replication method, storage equipment and storage system

Also Published As

Publication number Publication date
WO2020042852A1 (en) 2020-03-05
CN109164985A (en) 2019-01-08

Similar Documents

Publication Publication Date Title
US20210042266A1 (en) Geographically-distributed file system using coordinated namespace replication over a wide area network
US11016944B2 (en) Transferring objects between different storage devices based on timestamps
CN109783214B (en) Task scheduling control system
CN109164985B (en) Method for copying data, master device and slave device
CN107644030B (en) Distributed database data synchronization method, related device and system
EP1958087B1 (en) Resource freshness and replication
US10417103B2 (en) Fault-tolerant methods, systems and architectures for data storage, retrieval and distribution
US9495381B2 (en) Geographically-distributed file system using coordinated namespace replication over a wide area network
AU2019347897B2 (en) Methods, devices and systems for real-time checking of data consistency in a distributed heterogenous storage system
EP1326184A2 (en) Conflict resolution for collaborative work system
EP4276651A1 (en) Log execution method and apparatus, and computer device and storage medium
CN114968966A (en) Distributed metadata remote asynchronous replication method, device and equipment
Cox et al. File synchronization with vector time pairs
US8705537B1 (en) Eventually-consistent data stream consolidation
CN111104404B (en) Data storage method and device based on distributed objects
CN109376193B (en) Data exchange system based on self-adaptive rule
US20240111747A1 (en) Optimizing the operation of a microservice cluster
CN110851417B (en) Method and device for copying distributed file system files
CN112069067B (en) Data testing method and device based on block chain and computer readable storage medium
KR101929948B1 (en) Method and system for data type based multi-device synchronization
Horttanainen New Production System for Finnish Meteorological Institute
CN118069039A (en) K8s cluster event acquisition method and system
KR20180134814A (en) Method and system for data type based multi-device synchronization
JP2009134510A (en) Computer system, data management method, data management program, and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant