CN113297321A - Data synchronization method and device, electronic equipment and computer readable storage medium - Google Patents

Data synchronization method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN113297321A
CN113297321A CN202010729788.5A CN202010729788A CN113297321A CN 113297321 A CN113297321 A CN 113297321A CN 202010729788 A CN202010729788 A CN 202010729788A CN 113297321 A CN113297321 A CN 113297321A
Authority
CN
China
Prior art keywords
data
hash value
synchronized
target
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010729788.5A
Other languages
Chinese (zh)
Other versions
CN113297321B (en
Inventor
林延峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010729788.5A priority Critical patent/CN113297321B/en
Publication of CN113297321A publication Critical patent/CN113297321A/en
Application granted granted Critical
Publication of CN113297321B publication Critical patent/CN113297321B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication

Abstract

The embodiment of the disclosure discloses a data synchronization method, a data synchronization device, an electronic device and a computer readable storage medium, wherein the method comprises the following steps: acquiring data to be synchronized, wherein the data to be synchronized comprises at least one group of data groups to be synchronized, and the data groups to be synchronized comprise sub data to be synchronized corresponding to different identifications; calculating the hash value of the subdata to be synchronized in the data group to be synchronized, and executing write-in operation on the data group to be synchronized according to the comparison between the hash value of the subdata to be synchronized and the hash value in the corresponding identification hash value set; and traversing the data group to be synchronized in the data to be synchronized to complete data synchronization. The technical scheme can not only ensure the sequence of the interdependent data, but also realize the grouping concurrency of the data, thereby meeting the requirements of users on the data reading and writing and the data synchronization quality and speed.

Description

Data synchronization method and device, electronic equipment and computer readable storage medium
Technical Field
The embodiment of the disclosure relates to the technical field of data processing, in particular to a data synchronization method, a data synchronization device, electronic equipment and a computer-readable storage medium.
Background
With the development of network science and technology and data technology, it is often necessary to perform operations such as data reading and writing, incremental data synchronization, and the like on a database or other data storage components with large data storage capacity. The incremental data synchronization method adopted in the prior art can only ensure the order of data columns, or has poor performance and can not meet the requirements of users on the data reading and writing quality and speed, or completely abandons the order among the columns, and is easy to generate data conflict and inconsistency. Particularly, when the data amount is becoming huge, a data synchronization method capable of ensuring the sequentiality of interdependent data and providing packet concurrency is needed.
Disclosure of Invention
The embodiment of the disclosure provides a data synchronization method and device, electronic equipment and a computer-readable storage medium.
In a first aspect, an embodiment of the present disclosure provides a data synchronization method.
Specifically, the data synchronization method includes:
acquiring data to be synchronized, wherein the data to be synchronized comprises at least one group of data groups to be synchronized, and the data groups to be synchronized comprise sub data to be synchronized corresponding to different identifications;
calculating the hash value of the subdata to be synchronized in the data group to be synchronized, and executing write-in operation on the data group to be synchronized according to the comparison between the hash value of the subdata to be synchronized and the hash value in the corresponding identification hash value set;
and traversing the data group to be synchronized in the data to be synchronized to complete data synchronization.
With reference to the first aspect, in a first implementation manner of the first aspect, the calculating a hash value of sub data to be synchronized in the data group to be synchronized, and performing a write operation on the data group to be synchronized according to a comparison between the hash value of the sub data to be synchronized and a hash value in a corresponding identification hash value set, further includes:
and storing the hash value of the subdata to be synchronized into a corresponding identification hash value set, and enabling the hash value to point to a data space written into the data group to be synchronized.
With reference to the first aspect and the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the calculating a hash value of sub data to be synchronized in the data group to be synchronized, and performing a write operation on the data group to be synchronized according to a comparison between the hash value of the sub data to be synchronized and a hash value in a corresponding identification hash value set, includes:
calculating the hash value of the subdata to be synchronized in the data group to be synchronized;
comparing the hash value of the subdata to be synchronized with the hash value in the corresponding identification hash value set;
if a target hash value corresponding to the hash value of the subdata to be synchronized exists in the corresponding identification hash value set, writing the data set to be synchronized into a data space pointed by the target hash value;
and if the corresponding identification hash value set does not have a target hash value corresponding to the hash value of the subdata to be synchronized, creating a new data space, and writing the data group to be synchronized into the new data space.
With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, in a third implementation manner of the first aspect, if a target hash value corresponding to a hash value of the to-be-synchronized sub data exists in the corresponding identification hash value set, writing the to-be-synchronized data group into a data space to which the target hash value points, includes:
if target hash values corresponding to the hash values of all sub-data to be synchronized exist in the corresponding identification hash value set, acquiring one or more data spaces pointed to by the target hash values, merging the data spaces, writing the data groups to be synchronized into a merged data space, and enabling the target hash values and the hash values pointed to the one or more data spaces to point to the merged data space;
if a target hash value corresponding to the hash values of two or more sub-data to be synchronized exists in the corresponding identification hash value set, acquiring one or more data spaces pointed by the target hash value, merging the data spaces, writing the data group to be synchronized into a merged data space, enabling the target hash value and the hash values pointed to the one or more data spaces to point to the merged data space, writing the hash values of the sub-data to be synchronized, which do not have the corresponding target hash value, into the hash value set of the corresponding identification, and enabling the hash values of the sub-data to be synchronized, which are written into the hash value set of the corresponding identification, to point to the merged data space;
if a target hash value corresponding to the hash value of the subdata to be synchronized exists in the corresponding identification hash value set, a target data space pointed by the target hash value is obtained, the data group to be synchronized is written into the target data space, the target hash value is pointed to the target data space, the hash value of the subdata to be synchronized, which does not correspond to the target hash value, is written into the hash value set of the corresponding identification, and the hash value of the subdata to be synchronized, which is written into the hash value set of the corresponding identification, is pointed to the target data space.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, and the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, if a target hash value corresponding to a hash value of the sub-data to be synchronized does not exist in the corresponding identification hash value set, creating a new data space, and writing the data group to be synchronized into the new data space, the method includes:
if the target hash value corresponding to the hash value of the subdata to be synchronized does not exist in the corresponding identification hash value set, a new data space is created, the data group to be synchronized is written into the new data space, the hash value of the subdata to be synchronized is written into the corresponding identification hash value set, and the hash value of the subdata to be synchronized points to the new data space.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, and the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the calculating a hash value of sub data to be synchronized in the data group to be synchronized, and before performing a write operation on the data group to be synchronized, according to a comparison between the hash value of the sub data to be synchronized and a hash value in a corresponding identification hash value set, further includes:
and if the hash value set corresponding to the identification of the subdata to be synchronized does not exist, creating the hash value set corresponding to the identification of the subdata to be synchronized.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, and the fifth implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the disclosure further includes:
and responding to the received data concurrent read-write operation request, and executing data concurrent read-write operation based on one or more data spaces written in the data group to be synchronized.
In a second aspect, an embodiment of the present disclosure provides a data synchronization apparatus.
Specifically, the data synchronization apparatus includes:
the device comprises an acquisition module, a synchronization module and a synchronization module, wherein the acquisition module is configured to acquire data to be synchronized, the data to be synchronized comprises at least one group of data groups to be synchronized, and the data groups to be synchronized comprise sub data to be synchronized corresponding to different identifications;
the writing module is configured to calculate a hash value of the sub data to be synchronized in the data group to be synchronized, and perform writing operation on the data group to be synchronized according to comparison between the hash value of the sub data to be synchronized and a hash value in a corresponding identification hash value set;
and the traversing module is configured to traverse the data group to be synchronized in the data to be synchronized to complete data synchronization.
With reference to the second aspect, in a first implementation manner of the second aspect, the writing module is further configured to:
and storing the hash value of the subdata to be synchronized into a corresponding identification hash value set, and enabling the hash value to point to a data space written into the data group to be synchronized.
With reference to the second aspect and the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the writing module is configured to:
calculating the hash value of the subdata to be synchronized in the data group to be synchronized;
comparing the hash value of the subdata to be synchronized with the hash value in the corresponding identification hash value set;
if a target hash value corresponding to the hash value of the subdata to be synchronized exists in the corresponding identification hash value set, writing the data set to be synchronized into a data space pointed by the target hash value;
and if the corresponding identification hash value set does not have a target hash value corresponding to the hash value of the subdata to be synchronized, creating a new data space, and writing the data group to be synchronized into the new data space.
With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a third implementation manner of the second aspect, if a target hash value corresponding to a hash value of the sub data to be synchronized exists in the corresponding identification hash value set, the portion of the data group to be synchronized that is written in the data space to which the target hash value points is configured to:
if target hash values corresponding to the hash values of all sub-data to be synchronized exist in the corresponding identification hash value set, acquiring one or more data spaces pointed to by the target hash values, merging the data spaces, writing the data groups to be synchronized into a merged data space, and enabling the target hash values and the hash values pointed to the one or more data spaces to point to the merged data space;
if a target hash value corresponding to the hash values of two or more sub-data to be synchronized exists in the corresponding identification hash value set, acquiring one or more data spaces pointed by the target hash value, merging the data spaces, writing the data group to be synchronized into a merged data space, enabling the target hash value and the hash values pointed to the one or more data spaces to point to the merged data space, writing the hash values of the sub-data to be synchronized, which do not have the corresponding target hash value, into the hash value set of the corresponding identification, and enabling the hash values of the sub-data to be synchronized, which are written into the hash value set of the corresponding identification, to point to the merged data space;
if a target hash value corresponding to the hash value of the subdata to be synchronized exists in the corresponding identification hash value set, a target data space pointed by the target hash value is obtained, the data group to be synchronized is written into the target data space, the target hash value is pointed to the target data space, the hash value of the subdata to be synchronized, which does not correspond to the target hash value, is written into the hash value set of the corresponding identification, and the hash value of the subdata to be synchronized, which is written into the hash value set of the corresponding identification, is pointed to the target data space.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, and the third implementation manner of the second aspect, in a fourth implementation manner of the second aspect, if there is no target hash value corresponding to the hash value of the sub-data to be synchronized in the corresponding identification hash value set, the creating a new data space, and writing the data group to be synchronized into a portion of the new data space is configured to:
if the target hash value corresponding to the hash value of the subdata to be synchronized does not exist in the corresponding identification hash value set, a new data space is created, the data group to be synchronized is written into the new data space, the hash value of the subdata to be synchronized is written into the corresponding identification hash value set, and the hash value of the subdata to be synchronized points to the new data space.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, and the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect, the write module further includes, before the write module:
and the creating module is configured to create a hash value set corresponding to the identifier of the sub data to be synchronized if the hash value set corresponding to the identifier of the sub data to be synchronized does not exist.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, the fourth implementation manner of the second aspect, and the fifth implementation manner of the second aspect, in a sixth implementation manner of the second aspect, the disclosure further includes:
and the read-write module is configured to respond to the received data concurrent read-write operation request and execute data concurrent read-write operation based on one or more data spaces written in the data group to be synchronized.
In a third aspect, the disclosed embodiments provide an electronic device, including a memory for storing one or more computer instructions that support a data synchronization apparatus to perform the above data synchronization method, and a processor configured to execute the computer instructions stored in the memory. The data synchronization apparatus may further include a communication interface for the data synchronization apparatus to communicate with other devices or a communication network.
In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium for storing computer instructions for a data synchronization apparatus, which includes computer instructions for performing the data synchronization method described above as a data synchronization apparatus.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
according to the technical scheme, the hash value of the subdata to be synchronized in the data group to be synchronized is calculated, and the high-speed and high-quality synchronization of the incremental data is effectively realized by means of comparison with the hash value in the preset hash value set with corresponding identification and the design of the data space and the correspondence between the data space and the hash value in the hash value set. The technical scheme can not only ensure the sequence of the interdependent data, but also realize the grouping concurrency of the data, thereby meeting the requirements of users on the data reading and writing and the data synchronization quality and speed.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the disclosure.
Drawings
Other features, objects, and advantages of embodiments of the disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 shows a flow diagram of a data synchronization method according to an embodiment of the present disclosure;
FIG. 2 illustrates a schematic diagram of data synchronization for a data group to be synchronized (a1, b1) according to an embodiment of the present disclosure;
FIG. 3 illustrates a schematic diagram of data synchronization for a data group to be synchronized (a2, b2) according to an embodiment of the present disclosure;
FIG. 4 illustrates a schematic diagram of data synchronization for a data group to be synchronized (a2, b3), according to an embodiment of the present disclosure;
FIG. 5 illustrates a schematic diagram of data synchronization for a data group to be synchronized (a1, b2), according to an embodiment of the present disclosure;
FIG. 6 shows a block diagram of a data synchronization apparatus according to an embodiment of the present disclosure;
FIG. 7 is a schematic block diagram of a computer system suitable for use in implementing a data synchronization method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the disclosed embodiments will be described in detail with reference to the accompanying drawings so that they can be easily implemented by those skilled in the art. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the disclosed embodiments, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.
It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
According to the technical scheme provided by the embodiment of the disclosure, the hash value of the subdata to be synchronized in the data group to be synchronized is calculated, and the high-speed and high-quality synchronization of the incremental data is effectively realized by means of comparison with the hash value in the preset hash value set with the corresponding identification and the design of the data space and the correspondence between the data space and the hash value in the hash value set. The technical scheme can not only ensure the sequence of the interdependent data, but also realize the grouping concurrency of the data, thereby meeting the requirements of users on the data reading and writing and the data synchronization quality and speed.
Fig. 1 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure, as shown in fig. 1, the data synchronization method includes the following steps S101-S103:
in step S101, data to be synchronized is obtained, where the data to be synchronized includes at least one group of data groups to be synchronized, and the data groups to be synchronized include sub data to be synchronized corresponding to different identifiers;
in step S102, calculating hash values of the sub data to be synchronized in the data group to be synchronized, and performing a write operation on the data group to be synchronized according to a comparison between the hash values of the sub data to be synchronized and hash values in corresponding identification hash value sets;
in step S103, traversing the data group to be synchronized in the data to be synchronized, and completing data synchronization.
As mentioned above, with the development of network science and technology and data technology, operations such as data reading and writing, incremental data synchronization, etc. are often required to be performed on a database or other data storage components with large data storage capacity. The incremental data synchronization method adopted in the prior art can only ensure the order of data columns, or has poor performance and can not meet the requirements of users on the data reading and writing quality and speed, or completely abandons the order among the columns, and is easy to generate data conflict and inconsistency. Particularly, when the data amount is becoming huge, a data synchronization method capable of ensuring the sequentiality of interdependent data and providing packet concurrency is needed.
In view of the above problems, in this embodiment, a data synchronization method is proposed, which effectively achieves high-speed and high-quality synchronization of incremental data by calculating hash values of sub-data to be synchronized in a data set to be synchronized, and by means of comparison with hash values in a preset hash value set having corresponding identifications, and design of a data space and correspondence between the data space and the hash values in the hash value set. The technical scheme can not only ensure the sequence of the interdependent data, but also realize the grouping concurrency of the data, thereby meeting the requirements of users on the data reading and writing and the data synchronization quality and speed.
In an embodiment of the present disclosure, the data synchronization method may be applied to a computer, a computing device, an electronic device, a server, a service cluster, and the like, which perform a synchronization operation on data.
In an embodiment of the present disclosure, the data to be synchronized refers to data that needs to be synchronized and read and written. For example, the data to be synchronized may be one or several rows of data in a certain database having one or more fields, in this example, different columns of the database correspond to different fields, or it may be said that different columns of the database have different identification information, of course, in other examples, the identification information may also be identification information having other meanings, and each row of data in the database may be regarded as a data group, and the data group is composed of data corresponding to different fields or different identification information, that is, in this example, the data to be synchronized includes at least one group of data groups to be synchronized, where the data groups to be synchronized include sub data to be synchronized corresponding to different identifications. Assuming that only two columns of data exist in a certain database, each column of data corresponds to different fields or identification information, the data to be synchronized includes multiple sets of data groups to be synchronized, each set of data groups to be synchronized includes sub data to be synchronized corresponding to two different identifications, for example, the data to be synchronized may include the following multiple data groups to be synchronized: (a1, B1) (a2, B2) (a2, B3) (a1, B2) (a5, B5.) each data group to be synchronized includes data of two corresponding different fields, for example, the data group to be synchronized (a1, B1) includes data a1 of a corresponding field a and data B1 of a corresponding field B, the data group to be synchronized (a2, B2) includes data a2 of a corresponding field a and data B2 of a corresponding field B, and so on. For convenience of description, the following explains and explains the technical solution of the present disclosure by taking the data group to be synchronized in this example as an example.
In an embodiment of the present disclosure, the hash value set refers to a hash value used for storing stored data, and is compared with a hash value of data to be synchronized to determine whether a set of duplicate data exists. In order to detect the repeatability or the correlation between the new read-write data to be synchronized and the stored data conveniently, wherein the correlation means that three data tuples of a, b and c exist, if the data tuples of a, b and c can form a set through equal keywords, the a, b and c have the correlation, on the contrary, if the data tuples of a and b are uncorrelated, that is, the data in the data tuples of a and b are completely different, no collision occurs no matter what sequence is executed, and no mutual influence occurs, the a and b do not have the correlation. In this embodiment, the number of sets of hash values corresponds to the number of identification information. For example, for the above example, the number of fields is 2, and the number of hash value sets is also 2. It should be noted that, when performing hash value comparison, hash values of data in the same field are compared, for example, for the above example, the fields to which data in the data group to be synchronized belongs are a and B, the hash value set includes a hash value set a corresponding to the field a and a hash value set B corresponding to the field B, for a certain data group to be synchronized (a1, B1), hash values hash (a1) and hash (B1) of sub-data to be synchronized are calculated, the hash (a1) is compared with the hash values in the hash value set a, and the hash (B1) is compared with the hash values in the hash value set B.
In this embodiment, the write operation of the data group to be synchronized is performed according to the comparison result of the hash values, that is, the data group to be synchronized is written into the corresponding data space, and then all the data groups to be synchronized in the data to be synchronized are traversed, so that the data synchronization can be completed. Due to the fact that the writing of the data group to be synchronized has a sequence and corresponding data writing time can be generated, the data in the data space are ordered, and therefore the data can be acquired accurately from the data space in a follow-up mode.
In an embodiment of the present disclosure, the step S102, that is, calculating a hash value of the sub data to be synchronized in the data group to be synchronized, and performing a write operation on the data group to be synchronized according to a comparison between the hash value of the sub data to be synchronized and a hash value in a corresponding identification hash value set, further includes:
and storing the hash value of the subdata to be synchronized into a corresponding identification hash value set, and enabling the hash value to point to a data space written into the data group to be synchronized.
In order to compare the hash values of the subsequent data to be synchronized to determine whether there is duplicate data, in this embodiment, after the write operation of the data group to be synchronized is performed according to the comparison result of the hash values, the hash values of the sub-data to be synchronized, on which the write operation has been performed, are also stored in the hash value set of the corresponding identifier. In addition, in order to clarify the storage location of the data that has been subjected to the write operation, after the hash value of the sub-data to be synchronized that has been subjected to the write operation is stored in the hash value set of the corresponding identifier, the hash value is also pointed to the data space written in the data group to be synchronized, and a corresponding relationship between the hash value and the data space is established.
In an embodiment of the present disclosure, the step S102, that is, calculating a hash value of the sub data to be synchronized in the data group to be synchronized, and performing a write operation on the data group to be synchronized according to a comparison between the hash value of the sub data to be synchronized and a hash value in a corresponding identification hash value set, includes the following steps:
calculating the hash value of the subdata to be synchronized in the data group to be synchronized;
comparing the hash value of the subdata to be synchronized with the hash value in the corresponding identification hash value set;
if a target hash value corresponding to the hash value of the subdata to be synchronized exists in the corresponding identification hash value set, writing the data set to be synchronized into a data space pointed by the target hash value;
and if the corresponding identification hash value set does not have a target hash value corresponding to the hash value of the subdata to be synchronized, creating a new data space, and writing the data group to be synchronized into the new data space.
In this embodiment, when performing a write operation on the data group to be synchronized, first calculating a hash value of each sub data to be synchronized in the data group to be synchronized; then, comparing the hash value of the subdata to be synchronized with the hash value in the corresponding identification hash value set; if a target hash value corresponding to the hash value of the sub-data to be synchronized exists in the corresponding identification hash value set, which indicates that repeatability or correlation exists between the sub-data to be synchronized and stored data, the data set to be synchronized can be written into a data space pointed by the target hash value at this moment; if the target hash value corresponding to the hash value of the sub-data to be synchronized does not exist in the corresponding identification hash value set, it is indicated that repeatability or correlation does not exist between the sub-data to be synchronized and the stored data, at this time, a new data space needs to be created, and then the data set to be synchronized is written into the new data space.
Further, in an embodiment of the present disclosure, if a target hash value corresponding to the hash value of the sub-data to be synchronized exists in the corresponding identification hash value set, the step of writing the data group to be synchronized into a data space pointed by the target hash value includes the following steps:
if target hash values corresponding to the hash values of all sub-data to be synchronized exist in the corresponding identification hash value set, acquiring one or more data spaces pointed to by the target hash values, merging the data spaces, writing the data groups to be synchronized into a merged data space, and enabling the target hash values and the hash values pointed to the one or more data spaces to point to the merged data space;
if a target hash value corresponding to the hash values of two or more sub-data to be synchronized exists in the corresponding identification hash value set, acquiring one or more data spaces pointed by the target hash value, merging the data spaces, writing the data group to be synchronized into a merged data space, enabling the target hash value and the hash values pointed to the one or more data spaces to point to the merged data space, writing the hash values of the sub-data to be synchronized, which do not have the corresponding target hash value, into the hash value set of the corresponding identification, and enabling the hash values of the sub-data to be synchronized, which are written into the hash value set of the corresponding identification, to point to the merged data space;
if a target hash value corresponding to the hash value of the subdata to be synchronized exists in the corresponding identification hash value set, a target data space pointed by the target hash value is obtained, the data group to be synchronized is written into the target data space, the target hash value is pointed to the target data space, the hash value of the subdata to be synchronized, which does not correspond to the target hash value, is written into the hash value set of the corresponding identification, and the hash value of the subdata to be synchronized, which is written into the hash value set of the corresponding identification, is pointed to the target data space.
In this embodiment, the data group to be synchronized is written according to the different correspondence between the hash value of the sub-data to be synchronized and the hash value in the corresponding identification hash value set. Specifically, the method comprises the following steps:
if the corresponding identification hash value set has the target hash value corresponding to the hash value of all the sub-data to be synchronized, that is, the hash value of all the sub-data to be synchronized can find the corresponding target hash value in the corresponding identification hash value set, which indicates that all the sub-data to be synchronized and the stored data have repeatability or correlation, at this time, a new data space is not needed, and only the data group to be synchronized needs to be directly written into the data space pointed by the target hash value. However, since there may be more than one sub-data to be synchronized and there may be more than one target hash value, there may be one or more data spaces pointed to by the target hash value. If the target data space pointed by the target hash value is one, the data group to be synchronized can be directly written into the target data space, and the target hash value is pointed to the target data space; however, if there are two or more target data spaces pointed to by the target hash values, the two or more target data spaces are merged to obtain a merged data space, the data group to be synchronized is written into the merged data space, and finally the target hash values and the hash values pointed to the one or more data spaces before are all pointed to the merged data space.
If a target hash value corresponding to the hash values of two or more sub-data to be synchronized exists in the corresponding identification hash value set, that is, the hash values of part of the sub-data to be synchronized can find the corresponding target hash value in the corresponding identification hash value set, which indicates that repeatability or correlation exists between part of the sub-data to be synchronized and stored data, a new data space is not needed, and the data group to be synchronized only needs to be directly written into the data space pointed by the target hash value. Similar to the above case, since there may be more than one sub data to be synchronized and there may be more than one target hash value, there may be one or more data spaces pointed to by the target hash value. If the target data space pointed by the target hash value is one, the data group to be synchronized can be directly written into the target data space, and the target hash value is pointed to the target data space; however, if there are two or more target data spaces pointed to by the target hash values, the two or more target data spaces are merged to obtain a merged data space, the data group to be synchronized is written into the merged data space, and finally the target hash values and the hash values pointed to the one or more data spaces before are all pointed to the merged data space. Different from the above situation, because only some hash values of the sub-data to be synchronized can find corresponding target hash values in the corresponding identified hash value sets, in order to compare the hash values of the subsequent data to be synchronized to determine whether there is duplicate data, the hash values of the sub-data to be synchronized, for which there is no corresponding target hash value, need to be written into the corresponding identified hash value sets, and the hash values of the sub-data to be synchronized, written into the corresponding identified hash value sets, point to the target data space written into the data group to be synchronized or merge the data space.
If only a target hash value corresponding to the hash value of one sub data to be synchronized exists in the corresponding identification hash value set, that is, only the hash value of one sub data to be synchronized can find the corresponding target hash value in the corresponding identification hash value set, which indicates that repeatability or correlation exists between the sub data to be synchronized and stored data, at this time, a new data space is not needed, and only the data group to be synchronized needs to be directly written into the target data space pointed by the target hash value, and the target hash value is pointed to the target data space. However, in order to compare the hash values of the subsequent data to be synchronized to determine whether there is duplicate data, the hash values of the sub-data to be synchronized, which do not have a corresponding target hash value, need to be written into the hash value sets of the corresponding identifiers, and the hash values of the sub-data to be synchronized, which are written into the hash value sets of the corresponding identifiers, are pointed to the target data space.
That is to say, as long as the target hash value corresponding to the hash value of the sub-data to be synchronized exists in the corresponding identification hash value set, the data set to be synchronized is only required to be written into the data space to which the target hash value points, the hash value of the sub-data to be synchronized, which does not have the corresponding target hash value, is written into the corresponding identification hash value set, and the corresponding relationship between the relevant hash value and the data space is established, without newly establishing a data space.
Further, in an embodiment of the present disclosure, if there is no target hash value corresponding to the hash value of the sub-data to be synchronized in the corresponding identification hash value set, the step of creating a new data space and writing the data set to be synchronized into the new data space includes the following steps:
if the target hash value corresponding to the hash value of the subdata to be synchronized does not exist in the corresponding identification hash value set, a new data space is created, the data group to be synchronized is written into the new data space, the hash value of the subdata to be synchronized is written into the corresponding identification hash value set, and the hash value of the subdata to be synchronized points to the new data space.
Unlike the previous embodiment, in this embodiment, there is no target hash value corresponding to the hash value of the sub-data to be synchronized in the corresponding identification hash value set, and certainly there is no target data space pointed by the target hash value and into which data can be directly written. At this time, a new data space needs to be created, the data group to be synchronized is written into the new data space, and then the hash values of the sub-data to be synchronized are all written into the hash value sets of the corresponding identifiers, so as to compare the hash values of the subsequent data to be synchronized, determine whether repeated data exist, and enable the hash values of the sub-data to be synchronized to point to the new data space.
That is, if there is no target hash value corresponding to the hash value of the sub-data to be synchronized in the corresponding identification hash value set, a new data space needs to be created, the hash value of the sub-data to be synchronized is written into the corresponding identification hash value set, and a corresponding relationship between the hash value of the sub-data to be synchronized and the new data space is established.
In an embodiment of the present disclosure, in the step S102, that is, calculating a hash value of the sub data to be synchronized in the data group to be synchronized, and before the step of performing the write operation on the data group to be synchronized according to a comparison between the hash value of the sub data to be synchronized and a hash value in the corresponding identification hash value set, the method further includes:
and if the hash value set corresponding to the identification of the subdata to be synchronized does not exist, creating the hash value set corresponding to the identification of the subdata to be synchronized.
At the beginning of the implementation of the data synchronization method, or the previously stored data does not relate to data corresponding to certain identification information, that is, if it is determined that the hash value set corresponding to the identification of the sub-data to be synchronized does not exist, the hash value set corresponding to the identification of the sub-data to be synchronized needs to be created to store the hash value of the sub-data to be synchronized of the corresponding identification.
In an embodiment of the present disclosure, the method further comprises the steps of:
and responding to the received data concurrent read-write operation request, and executing data concurrent read-write operation based on one or more data spaces written in the data group to be synchronized.
It is mentioned above that data sets to be synchronized having a repetition or correlation will be written into the same data space, while data sets to be synchronized having no repetition or correlation will be written into different data spaces, i.e. there will be one or more mutually independent data spaces. Therefore, after receiving the data concurrent read-write operation request, the data concurrent read-write operation can be executed based on the one or more mutually independent data spaces, so that the quality and the speed of data read-write are improved.
The overall flow of the technical solution of the present disclosure is described below by taking the above example as an example. As described above, it is assumed that the data to be synchronized includes the following plurality of data groups to be synchronized: (a1, B1) (a2, B2) (a2, B3) (a1, B2) (a5, B5.) wherein each data group to be synchronized includes two sub-data to be synchronized corresponding to field a and field B, respectively, and the hash value set a corresponding to field a and the hash value set B corresponding to field B are both null. When data synchronization operation is performed based on the data to be synchronized, first of all, a first data group to be synchronized (a1, B1) in the data to be synchronized is extracted, hash values hash (a1) and hash (B1) of sub-data a1 and B1 to be synchronized in the data group to be synchronized (a1, B1) are calculated respectively, hash values (a1) are compared with hash values in a hash value set a, hash (B1) is compared with hash values in a hash value set B, it is found that no hash value corresponding to hash (a1) and hash (B1) exists in both the hash value set a and the hash value set B, at this time, a new data space packet is created, the data group to be synchronized (a1, B1) is written into the hash value set 1, hash (a1) and hash (B1) are written into the hash value set a and the hash value set B, and the hash value sets (a 4642) and the hash value set B are pointed to a graph as shown in a 1.
Then, a second data group to be synchronized (a2, B2) in the data to be synchronized is extracted, hash values of sub-data a2 and B2 to be synchronized in the data group to be synchronized (a2, B2) are respectively calculated, hash values (a2) and hash values (B2) are respectively calculated, the hash value (a2) is compared with hash values in a hash value set a, the hash value (B2) is compared with hash values in a hash value set B, it is found that no hash value corresponding to the hash value (a2) and the hash value (B2) exists in the hash value set a and the hash value set B, at this time, a new data space packet 2 needs to be created, the data group to be synchronized (a2, B2) is written into the packet 2, the hash value (a2) and the hash value (B2) are respectively written into the hash value set a and the hash value set B, and the hash value set (a2) and the hash value set (B2) point to a 463 as shown in the figure.
Then extracting a third data group (a2, B3) to be synchronized in the data to be synchronized, respectively calculating hash values (a2) and hash values (B3) of sub-data a2 and B3 to be synchronized in the data group (a2, B3) to be synchronized, comparing the hash value (a2) with the hash value in a hash value set A, comparing the hash value (B3) with the hash value in a hash value set B, finding that the hash value corresponding to the hash value (a2) exists in the hash value set A, but no hash value corresponding to the hash (B3) in the hash value set B, at this time, no new data space needs to be created, but extracts the data space Bucket2 pointed to by the hash value hash (a2) in the hash value set a, writes the data group to be synchronized (a2, b3) into the Bucket2, a hash (B3) with no corresponding hash value present is written into the hash value set B, and the hash (B3) is pointed to Bucket2, as shown in fig. 4.
Then, a fourth data group to be synchronized (a1, B2) in the data to be synchronized is extracted, hash values of sub-data a1 and B2 to be synchronized in the data group to be synchronized (a1, B2) are respectively calculated, hash values (a1) and hash values (B2) are respectively calculated, the hash values (a1) are compared with the hash values in the hash value set a, hash values (B2) are compared with the hash values in the hash value set B, it is found that hash values corresponding to the hash values (a1) exist in the hash value set a, hash values corresponding to the hash values (B2) also exist in the hash value set B, at this time, a new data space is not required to be created, but a data space Bucket1 pointed by the hash values (a1) in the hash value set a and a data space Bucket2 pointed by the hash values (B2) in the hash value set B are extracted, a data space Bucket 6 and a data space Bucket2 in the data group to be synchronized, and the data group to be synchronized (a1, 3 and a data group to be synchronized, b2) write to Bucket3 and direct hash (a1) and hash (b2) and the hash (b1), hash (a2) and hash (b3) previously directed to Bucket1 and Bucket2 to Bucket3, as shown in fig. 5.
And performing the operation according to the steps on other data groups to be synchronized in the data to be synchronized, so that the data synchronization of the data to be synchronized can be completed.
It should be noted that, the above description all takes the field number as 2 as an example, and for the case where the field or the identification information is 3 or more, the same reasoning can be carried out according to the above description, and the disclosure will not be described again.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.
Fig. 6 shows a block diagram of a data synchronization apparatus according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 6, the data synchronization apparatus includes:
an obtaining module 601, configured to obtain data to be synchronized, where the data to be synchronized includes at least one group of data groups to be synchronized, and the data groups to be synchronized include sub data to be synchronized corresponding to different identifiers;
a writing module 602 configured to calculate hash values of the sub data to be synchronized in the data group to be synchronized, and perform writing operation on the data group to be synchronized according to comparison between the hash values of the sub data to be synchronized and hash values in the corresponding identification hash value sets;
and a traversing module 603 configured to traverse the data group to be synchronized in the data to be synchronized, so as to complete data synchronization.
As mentioned above, with the development of network science and technology and data technology, operations such as data reading and writing, incremental data synchronization, etc. are often required to be performed on a database or other data storage components with large data storage capacity. The incremental data synchronization method adopted in the prior art can only ensure the order of data columns, or has poor performance and can not meet the requirements of users on the data reading and writing quality and speed, or completely abandons the order among the columns, and is easy to generate data conflict and inconsistency. Particularly, when the data amount is becoming huge, a data synchronization method capable of ensuring the sequentiality of interdependent data and providing packet concurrency is needed.
In view of the above problems, in this embodiment, a data synchronization apparatus is proposed, which effectively achieves high-speed and high-quality synchronization of incremental data by calculating hash values of sub-data to be synchronized in a data group to be synchronized, and by means of comparison with hash values in a preset hash value set having corresponding identifications, and design of a data space and correspondence between the data space and the hash values in the hash value set. The technical scheme can not only ensure the sequence of the interdependent data, but also realize the grouping concurrency of the data, thereby meeting the requirements of users on the data reading and writing and the data synchronization quality and speed.
In an embodiment of the present disclosure, the data synchronization apparatus may be implemented as a computer, a computing device, an electronic device, a server, a service cluster, or the like, which performs a synchronization operation on data.
In an embodiment of the present disclosure, the data to be synchronized refers to data that needs to be synchronized and read and written. For example, the data to be synchronized may be one or several rows of data in a certain database having one or more fields, in this example, different columns of the database correspond to different fields, or it may be said that different columns of the database have different identification information, of course, in other examples, the identification information may also be identification information having other meanings, and each row of data in the database may be regarded as a data group, and the data group is composed of data corresponding to different fields or different identification information, that is, in this example, the data to be synchronized includes at least one group of data groups to be synchronized, where the data groups to be synchronized include sub data to be synchronized corresponding to different identifications. Assuming that only two columns of data exist in a certain database, each column of data corresponds to different fields or identification information, the data to be synchronized includes multiple sets of data groups to be synchronized, each set of data groups to be synchronized includes sub data to be synchronized corresponding to two different identifications, for example, the data to be synchronized may include the following multiple data groups to be synchronized: (a1, B1) (a2, B2) (a2, B3) (a1, B2) (a5, B5.) each data group to be synchronized includes data of two corresponding different fields, for example, the data group to be synchronized (a1, B1) includes data a1 of a corresponding field a and data B1 of a corresponding field B, the data group to be synchronized (a2, B2) includes data a2 of a corresponding field a and data B2 of a corresponding field B, and so on. For convenience of description, the following explains and explains the technical solution of the present disclosure by taking the data group to be synchronized in this example as an example.
In an embodiment of the present disclosure, the hash value set refers to a hash value used for storing stored data, and is compared with a hash value of data to be synchronized to determine whether a set of duplicate data exists. In order to detect the repeatability or the correlation between the new read-write data to be synchronized and the stored data conveniently, wherein the correlation means that three data tuples of a, b and c exist, if the data tuples of a, b and c can form a set through equal keywords, the a, b and c have the correlation, on the contrary, if the data tuples of a and b are uncorrelated, that is, the data in the data tuples of a and b are completely different, no collision occurs no matter what sequence is executed, and no mutual influence occurs, the a and b do not have the correlation. In this embodiment, the number of sets of hash values corresponds to the number of identification information. For example, for the above example, the number of fields is 2, and the number of hash value sets is also 2. It should be noted that, when performing hash value comparison, hash values of data in the same field are compared, for example, for the above example, the fields to which data in the data group to be synchronized belongs are a and B, the hash value set includes a hash value set a corresponding to the field a and a hash value set B corresponding to the field B, for a certain data group to be synchronized (a1, B1), hash values hash (a1) and hash (B1) of sub-data to be synchronized are calculated, the hash (a1) is compared with the hash values in the hash value set a, and the hash (B1) is compared with the hash values in the hash value set B.
In this embodiment, the write operation of the data group to be synchronized is performed according to the comparison result of the hash values, that is, the data group to be synchronized is written into the corresponding data space, and then all the data groups to be synchronized in the data to be synchronized are traversed, so that the data synchronization can be completed. Due to the fact that the writing of the data group to be synchronized has a sequence and corresponding data writing time can be generated, the data in the data space are ordered, and therefore the data can be acquired accurately from the data space in a follow-up mode.
In an embodiment of the present disclosure, the writing module 602 may be further configured to:
and storing the hash value of the subdata to be synchronized into a corresponding identification hash value set, and enabling the hash value to point to a data space written into the data group to be synchronized.
In order to compare the hash values of the subsequent data to be synchronized to determine whether there is duplicate data, in this embodiment, after the write operation of the data group to be synchronized is performed according to the comparison result of the hash values, the hash values of the sub-data to be synchronized, on which the write operation has been performed, are also stored in the hash value set of the corresponding identifier. In addition, in order to clarify the storage location of the data that has been subjected to the write operation, after the hash value of the sub-data to be synchronized that has been subjected to the write operation is stored in the hash value set of the corresponding identifier, the hash value is also pointed to the data space written in the data group to be synchronized, and a corresponding relationship between the hash value and the data space is established.
In an embodiment of the present disclosure, the writing module 602 may be configured to:
calculating the hash value of the subdata to be synchronized in the data group to be synchronized;
comparing the hash value of the subdata to be synchronized with the hash value in the corresponding identification hash value set;
if a target hash value corresponding to the hash value of the subdata to be synchronized exists in the corresponding identification hash value set, writing the data set to be synchronized into a data space pointed by the target hash value;
and if the corresponding identification hash value set does not have a target hash value corresponding to the hash value of the subdata to be synchronized, creating a new data space, and writing the data group to be synchronized into the new data space.
In this embodiment, when performing a write operation on the data group to be synchronized, first calculating a hash value of each sub data to be synchronized in the data group to be synchronized; then, comparing the hash value of the subdata to be synchronized with the hash value in the corresponding identification hash value set; if a target hash value corresponding to the hash value of the sub-data to be synchronized exists in the corresponding identification hash value set, which indicates that repeatability or correlation exists between the sub-data to be synchronized and stored data, the data set to be synchronized can be written into a data space pointed by the target hash value at this moment; if the target hash value corresponding to the hash value of the sub-data to be synchronized does not exist in the corresponding identification hash value set, it is indicated that repeatability or correlation does not exist between the sub-data to be synchronized and the stored data, at this time, a new data space needs to be created, and then the data set to be synchronized is written into the new data space.
Further, in an embodiment of the present disclosure, if a target hash value corresponding to the hash value of the sub-data to be synchronized exists in the corresponding identification hash value set, the portion of the data group to be synchronized written in the data space pointed by the target hash value may be configured to:
if target hash values corresponding to the hash values of all sub-data to be synchronized exist in the corresponding identification hash value set, acquiring one or more data spaces pointed to by the target hash values, merging the data spaces, writing the data groups to be synchronized into a merged data space, and enabling the target hash values and the hash values pointed to the one or more data spaces to point to the merged data space;
if a target hash value corresponding to the hash values of two or more sub-data to be synchronized exists in the corresponding identification hash value set, acquiring one or more data spaces pointed by the target hash value, merging the data spaces, writing the data group to be synchronized into a merged data space, enabling the target hash value and the hash values pointed to the one or more data spaces to point to the merged data space, writing the hash values of the sub-data to be synchronized, which do not have the corresponding target hash value, into the hash value set of the corresponding identification, and enabling the hash values of the sub-data to be synchronized, which are written into the hash value set of the corresponding identification, to point to the merged data space;
if a target hash value corresponding to the hash value of the subdata to be synchronized exists in the corresponding identification hash value set, a target data space pointed by the target hash value is obtained, the data group to be synchronized is written into the target data space, the target hash value is pointed to the target data space, the hash value of the subdata to be synchronized, which does not correspond to the target hash value, is written into the hash value set of the corresponding identification, and the hash value of the subdata to be synchronized, which is written into the hash value set of the corresponding identification, is pointed to the target data space.
In this embodiment, the data group to be synchronized is written according to the different correspondence between the hash value of the sub-data to be synchronized and the hash value in the corresponding identification hash value set. Specifically, the method comprises the following steps:
if the corresponding identification hash value set has the target hash value corresponding to the hash value of all the sub-data to be synchronized, that is, the hash value of all the sub-data to be synchronized can find the corresponding target hash value in the corresponding identification hash value set, which indicates that all the sub-data to be synchronized and the stored data have repeatability or correlation, at this time, a new data space is not needed, and only the data group to be synchronized needs to be directly written into the data space pointed by the target hash value. However, since there may be more than one sub-data to be synchronized and there may be more than one target hash value, there may be one or more data spaces pointed to by the target hash value. If the target data space pointed by the target hash value is one, the data group to be synchronized can be directly written into the target data space, and the target hash value is pointed to the target data space; however, if there are two or more target data spaces pointed to by the target hash values, the two or more target data spaces are merged to obtain a merged data space, the data group to be synchronized is written into the merged data space, and finally the target hash values and the hash values pointed to the one or more data spaces before are all pointed to the merged data space.
If a target hash value corresponding to the hash values of two or more sub-data to be synchronized exists in the corresponding identification hash value set, that is, the hash values of part of the sub-data to be synchronized can find the corresponding target hash value in the corresponding identification hash value set, which indicates that repeatability or correlation exists between part of the sub-data to be synchronized and stored data, a new data space is not needed, and the data group to be synchronized only needs to be directly written into the data space pointed by the target hash value. Similar to the above case, since there may be more than one sub data to be synchronized and there may be more than one target hash value, there may be one or more data spaces pointed to by the target hash value. If the target data space pointed by the target hash value is one, the data group to be synchronized can be directly written into the target data space, and the target hash value is pointed to the target data space; however, if there are two or more target data spaces pointed to by the target hash values, the two or more target data spaces are merged to obtain a merged data space, the data group to be synchronized is written into the merged data space, and finally the target hash values and the hash values pointed to the one or more data spaces before are all pointed to the merged data space. Different from the above situation, because only some hash values of the sub-data to be synchronized can find corresponding target hash values in the corresponding identified hash value sets, in order to compare the hash values of the subsequent data to be synchronized to determine whether there is duplicate data, the hash values of the sub-data to be synchronized, for which there is no corresponding target hash value, need to be written into the corresponding identified hash value sets, and the hash values of the sub-data to be synchronized, written into the corresponding identified hash value sets, point to the target data space written into the data group to be synchronized or merge the data space.
If only a target hash value corresponding to the hash value of one sub data to be synchronized exists in the corresponding identification hash value set, that is, only the hash value of one sub data to be synchronized can find the corresponding target hash value in the corresponding identification hash value set, which indicates that repeatability or correlation exists between the sub data to be synchronized and stored data, at this time, a new data space is not needed, and only the data group to be synchronized needs to be directly written into the target data space pointed by the target hash value, and the target hash value is pointed to the target data space. However, in order to compare the hash values of the subsequent data to be synchronized to determine whether there is duplicate data, the hash values of the sub-data to be synchronized, which do not have a corresponding target hash value, need to be written into the hash value sets of the corresponding identifiers, and the hash values of the sub-data to be synchronized, which are written into the hash value sets of the corresponding identifiers, are pointed to the target data space.
That is to say, as long as the target hash value corresponding to the hash value of the sub-data to be synchronized exists in the corresponding identification hash value set, the data set to be synchronized is only required to be written into the data space to which the target hash value points, the hash value of the sub-data to be synchronized, which does not have the corresponding target hash value, is written into the corresponding identification hash value set, and the corresponding relationship between the relevant hash value and the data space is established, without newly establishing a data space.
Further, in an embodiment of the present disclosure, if there is no target hash value corresponding to the hash value of the sub-data to be synchronized in the corresponding identification hash value set, a new data space is created, and a portion of the data set to be synchronized that is written in the new data space may be configured to:
if the target hash value corresponding to the hash value of the subdata to be synchronized does not exist in the corresponding identification hash value set, a new data space is created, the data group to be synchronized is written into the new data space, the hash value of the subdata to be synchronized is written into the corresponding identification hash value set, and the hash value of the subdata to be synchronized points to the new data space.
Unlike the previous embodiment, in this embodiment, there is no target hash value corresponding to the hash value of the sub-data to be synchronized in the corresponding identification hash value set, and certainly there is no target data space pointed by the target hash value and into which data can be directly written. At this time, a new data space needs to be created, the data group to be synchronized is written into the new data space, and then the hash values of the sub-data to be synchronized are all written into the hash value sets of the corresponding identifiers, so as to compare the hash values of the subsequent data to be synchronized, determine whether repeated data exist, and enable the hash values of the sub-data to be synchronized to point to the new data space.
That is, if there is no target hash value corresponding to the hash value of the sub-data to be synchronized in the corresponding identification hash value set, a new data space needs to be created, the hash value of the sub-data to be synchronized is written into the corresponding identification hash value set, and a corresponding relationship between the hash value of the sub-data to be synchronized and the new data space is established.
In an embodiment of the present disclosure, before the writing module 602, the method further includes:
and the creating module is configured to create a hash value set corresponding to the identifier of the sub data to be synchronized if the hash value set corresponding to the identifier of the sub data to be synchronized does not exist.
At the beginning of the implementation of the data synchronization method, or the previously stored data does not relate to data corresponding to certain identification information, that is, if it is determined that the hash value set corresponding to the identification of the sub-data to be synchronized does not exist, the hash value set corresponding to the identification of the sub-data to be synchronized needs to be created to store the hash value of the sub-data to be synchronized of the corresponding identification.
In an embodiment of the present disclosure, the apparatus further includes:
and the read-write module is configured to respond to the received data concurrent read-write operation request and execute data concurrent read-write operation based on one or more data spaces written in the data group to be synchronized.
It is mentioned above that data sets to be synchronized having a repetition or correlation will be written into the same data space, while data sets to be synchronized having no repetition or correlation will be written into different data spaces, i.e. there will be one or more mutually independent data spaces. Therefore, after receiving the data concurrent read-write operation request, the data concurrent read-write operation can be executed based on the one or more mutually independent data spaces, so that the quality and the speed of data read-write are improved.
It should be noted that, the above description all takes the field number as 2 as an example, and for the case where the field or the identification information is 3 or more, the same reasoning can be carried out according to the above description, and the disclosure will not be described again.
The embodiment of the present disclosure also discloses an electronic device, which includes a memory and a processor; wherein the content of the first and second substances,
the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to perform any of the method steps described above.
FIG. 7 is a schematic block diagram of a computer system suitable for use in implementing a data synchronization method according to an embodiment of the present disclosure.
As shown in fig. 7, the computer system 700 includes a processing unit 701 that can execute various processes in the above-described embodiments according to a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the system 700 are also stored. The processing unit 701, the ROM702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary. The processing unit 701 may be implemented as a CPU, a GPU, a TPU, an FPGA, an NPU, or other processing units.
In particular, the above described methods may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the data synchronization method. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the disclosed embodiment also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the embodiments of the present disclosure.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (16)

1. A method of data synchronization, comprising:
acquiring data to be synchronized, wherein the data to be synchronized comprises at least one group of data groups to be synchronized, and the data groups to be synchronized comprise sub data to be synchronized corresponding to different identifications;
calculating the hash value of the subdata to be synchronized in the data group to be synchronized, and executing write-in operation on the data group to be synchronized according to the comparison between the hash value of the subdata to be synchronized and the hash value in the corresponding identification hash value set;
and traversing the data group to be synchronized in the data to be synchronized to complete data synchronization.
2. The method of claim 1, wherein the calculating the hash value of the sub-data to be synchronized in the data set to be synchronized, and performing a write operation on the data set to be synchronized according to the comparison between the hash value of the sub-data to be synchronized and the hash value in the corresponding set of identified hash values, further comprises:
and storing the hash value of the subdata to be synchronized into a corresponding identification hash value set, and enabling the hash value to point to a data space written into the data group to be synchronized.
3. The method of claim 1, wherein the calculating the hash value of the sub-data to be synchronized in the data group to be synchronized, and performing a write operation on the data group to be synchronized according to the comparison between the hash value of the sub-data to be synchronized and the hash value in the corresponding identification hash value set comprises:
calculating the hash value of the subdata to be synchronized in the data group to be synchronized;
comparing the hash value of the subdata to be synchronized with the hash value in the corresponding identification hash value set;
if a target hash value corresponding to the hash value of the subdata to be synchronized exists in the corresponding identification hash value set, writing the data set to be synchronized into a data space pointed by the target hash value;
and if the corresponding identification hash value set does not have a target hash value corresponding to the hash value of the subdata to be synchronized, creating a new data space, and writing the data group to be synchronized into the new data space.
4. The method of claim 3, wherein if a target hash value corresponding to the hash value of the sub-data to be synchronized exists in the corresponding set of identification hash values, writing the set of data to be synchronized into a data space pointed to by the target hash value, includes:
if target hash values corresponding to the hash values of all sub-data to be synchronized exist in the corresponding identification hash value set, acquiring one or more data spaces pointed to by the target hash values, merging the data spaces, writing the data groups to be synchronized into a merged data space, and enabling the target hash values and the hash values pointed to the one or more data spaces to point to the merged data space;
if a target hash value corresponding to the hash values of two or more sub-data to be synchronized exists in the corresponding identification hash value set, acquiring one or more data spaces pointed by the target hash value, merging the data spaces, writing the data group to be synchronized into a merged data space, enabling the target hash value and the hash values pointed to the one or more data spaces to point to the merged data space, writing the hash values of the sub-data to be synchronized, which do not have the corresponding target hash value, into the hash value set of the corresponding identification, and enabling the hash values of the sub-data to be synchronized, which are written into the hash value set of the corresponding identification, to point to the merged data space;
if a target hash value corresponding to the hash value of the subdata to be synchronized exists in the corresponding identification hash value set, a target data space pointed by the target hash value is obtained, the data group to be synchronized is written into the target data space, the target hash value is pointed to the target data space, the hash value of the subdata to be synchronized, which does not correspond to the target hash value, is written into the hash value set of the corresponding identification, and the hash value of the subdata to be synchronized, which is written into the hash value set of the corresponding identification, is pointed to the target data space.
5. The method according to claim 3 or 4, wherein if there is no target hash value corresponding to the hash value of the sub-data to be synchronized in the corresponding identification hash value set, creating a new data space, and writing the data set to be synchronized into the new data space, includes:
if the target hash value corresponding to the hash value of the subdata to be synchronized does not exist in the corresponding identification hash value set, a new data space is created, the data group to be synchronized is written into the new data space, the hash value of the subdata to be synchronized is written into the corresponding identification hash value set, and the hash value of the subdata to be synchronized points to the new data space.
6. The method according to any one of claims 1 to 5, wherein the calculating the hash value of the sub data to be synchronized in the data set to be synchronized, and according to the comparison between the hash value of the sub data to be synchronized and the hash value in the corresponding identification hash value set, before performing the write operation on the data set to be synchronized, the method further comprises:
and if the hash value set corresponding to the identification of the subdata to be synchronized does not exist, creating the hash value set corresponding to the identification of the subdata to be synchronized.
7. The method of any of claims 1-6, further comprising:
and responding to the received data concurrent read-write operation request, and executing data concurrent read-write operation based on one or more data spaces written in the data group to be synchronized.
8. A data synchronization apparatus, comprising:
the device comprises an acquisition module, a synchronization module and a synchronization module, wherein the acquisition module is configured to acquire data to be synchronized, the data to be synchronized comprises at least one group of data groups to be synchronized, and the data groups to be synchronized comprise sub data to be synchronized corresponding to different identifications;
the writing module is configured to calculate a hash value of the sub data to be synchronized in the data group to be synchronized, and perform writing operation on the data group to be synchronized according to comparison between the hash value of the sub data to be synchronized and a hash value in a corresponding identification hash value set;
and the traversing module is configured to traverse the data group to be synchronized in the data to be synchronized to complete data synchronization.
9. The apparatus of claim 8, the write module further configured to:
and storing the hash value of the subdata to be synchronized into a corresponding identification hash value set, and enabling the hash value to point to a data space written into the data group to be synchronized.
10. The apparatus of claim 8, the write module configured to:
calculating the hash value of the subdata to be synchronized in the data group to be synchronized;
comparing the hash value of the subdata to be synchronized with the hash value in the corresponding identification hash value set;
if a target hash value corresponding to the hash value of the subdata to be synchronized exists in the corresponding identification hash value set, writing the data set to be synchronized into a data space pointed by the target hash value;
and if the corresponding identification hash value set does not have a target hash value corresponding to the hash value of the subdata to be synchronized, creating a new data space, and writing the data group to be synchronized into the new data space.
11. The apparatus of claim 10, wherein if a target hash value corresponding to the hash value of the sub-data to be synchronized exists in the corresponding set of identification hash values, the portion of the data set to be synchronized that is written into the data space to which the target hash value points is configured to:
if target hash values corresponding to the hash values of all sub-data to be synchronized exist in the corresponding identification hash value set, acquiring one or more data spaces pointed to by the target hash values, merging the data spaces, writing the data groups to be synchronized into a merged data space, and enabling the target hash values and the hash values pointed to the one or more data spaces to point to the merged data space;
if a target hash value corresponding to the hash values of two or more sub-data to be synchronized exists in the corresponding identification hash value set, acquiring one or more data spaces pointed by the target hash value, merging the data spaces, writing the data group to be synchronized into a merged data space, enabling the target hash value and the hash values pointed to the one or more data spaces to point to the merged data space, writing the hash values of the sub-data to be synchronized, which do not have the corresponding target hash value, into the hash value set of the corresponding identification, and enabling the hash values of the sub-data to be synchronized, which are written into the hash value set of the corresponding identification, to point to the merged data space;
if a target hash value corresponding to the hash value of the subdata to be synchronized exists in the corresponding identification hash value set, a target data space pointed by the target hash value is obtained, the data group to be synchronized is written into the target data space, the target hash value is pointed to the target data space, the hash value of the subdata to be synchronized, which does not correspond to the target hash value, is written into the hash value set of the corresponding identification, and the hash value of the subdata to be synchronized, which is written into the hash value set of the corresponding identification, is pointed to the target data space.
12. The apparatus according to claim 10 or 11, wherein the portion that creates a new data space and writes the data set to be synchronized into the new data space if there is no target hash value corresponding to the hash value of the sub-data to be synchronized in the corresponding identification hash value set is configured to:
if the target hash value corresponding to the hash value of the subdata to be synchronized does not exist in the corresponding identification hash value set, a new data space is created, the data group to be synchronized is written into the new data space, the hash value of the subdata to be synchronized is written into the corresponding identification hash value set, and the hash value of the subdata to be synchronized points to the new data space.
13. The apparatus according to any of claims 8-12, wherein the write module is preceded by:
and the creating module is configured to create a hash value set corresponding to the identifier of the sub data to be synchronized if the hash value set corresponding to the identifier of the sub data to be synchronized does not exist.
14. The apparatus of any of claims 8-13, further comprising:
and the read-write module is configured to respond to the received data concurrent read-write operation request and execute data concurrent read-write operation based on one or more data spaces written in the data group to be synchronized.
15. An electronic device comprising a memory and a processor; wherein the content of the first and second substances,
the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of any of claims 1-7.
16. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the method steps of any of claims 1-7.
CN202010729788.5A 2020-07-27 2020-07-27 Data synchronization method and device, electronic equipment and computer readable storage medium Active CN113297321B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010729788.5A CN113297321B (en) 2020-07-27 2020-07-27 Data synchronization method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010729788.5A CN113297321B (en) 2020-07-27 2020-07-27 Data synchronization method and device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113297321A true CN113297321A (en) 2021-08-24
CN113297321B CN113297321B (en) 2022-04-26

Family

ID=77318243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010729788.5A Active CN113297321B (en) 2020-07-27 2020-07-27 Data synchronization method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113297321B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102426611A (en) * 2012-01-13 2012-04-25 广州从兴电子开发有限公司 Database synchronization method and device thereof
CN106980680A (en) * 2017-03-30 2017-07-25 联想(北京)有限公司 Date storage method and storage device
CN107423436A (en) * 2017-08-04 2017-12-01 郑州云海信息技术有限公司 A kind of method migrated for online data between distinct type data-base
US20190205429A1 (en) * 2018-01-03 2019-07-04 Salesforce.Com, Inc. Data validation for data record migrations
CN110928952A (en) * 2019-11-28 2020-03-27 北京艾摩瑞策科技有限公司 Data synchronization method and device based on block chain
CN111061740A (en) * 2019-12-17 2020-04-24 北京软通智慧城市科技有限公司 Data synchronization method, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102426611A (en) * 2012-01-13 2012-04-25 广州从兴电子开发有限公司 Database synchronization method and device thereof
CN106980680A (en) * 2017-03-30 2017-07-25 联想(北京)有限公司 Date storage method and storage device
CN107423436A (en) * 2017-08-04 2017-12-01 郑州云海信息技术有限公司 A kind of method migrated for online data between distinct type data-base
US20190205429A1 (en) * 2018-01-03 2019-07-04 Salesforce.Com, Inc. Data validation for data record migrations
CN110928952A (en) * 2019-11-28 2020-03-27 北京艾摩瑞策科技有限公司 Data synchronization method and device based on block chain
CN111061740A (en) * 2019-12-17 2020-04-24 北京软通智慧城市科技有限公司 Data synchronization method, equipment and storage medium

Also Published As

Publication number Publication date
CN113297321B (en) 2022-04-26

Similar Documents

Publication Publication Date Title
US10606806B2 (en) Method and apparatus for storing time series data
US9916286B2 (en) Reformatting multiple paragraphs of text using the formatting of a sample object by creating multiple candidate combinations and selecting a closest match
CN111427971B (en) Business modeling method, device, system and medium for computer system
CN113076304A (en) Distributed version management method, device and system
CN104111957A (en) Method and system for synchronizing distributed transaction
US20120330988A1 (en) Systems And Methods For Performing Index Joins Using Auto Generative Queries
US8407255B1 (en) Method and apparatus for exploiting master-detail data relationships to enhance searching operations
CN114372060A (en) Data storage method, device, equipment and storage medium
CN112148713B (en) Method and device for data migration between heterogeneous databases
CN113297321B (en) Data synchronization method and device, electronic equipment and computer readable storage medium
CN110046172B (en) Online computing data processing method and system
CN111061740A (en) Data synchronization method, equipment and storage medium
CN116975649A (en) Data processing method, device, electronic equipment, storage medium and program product
CN111046246A (en) Label updating method and device and distributed storage system
CN112163024B (en) Configuration information export and import method based on hierarchical association structure
CN110929207B (en) Data processing method, device and computer readable storage medium
CN115905402B (en) Method and device for processing transaction log
US10747626B2 (en) Method and technique of achieving extraordinarily high insert throughput
CN110750569A (en) Data extraction method, device, equipment and storage medium
US20170116300A1 (en) Efficient mirror data re-sync
CN116795835A (en) Correlation query method and device
CN114327293B (en) Data reading method, device, equipment and storage medium
CN108920602B (en) Method and apparatus for outputting information
CN113127440A (en) Data operation method and device, electronic equipment and storage medium
CN113627937A (en) Block storage method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40057468

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant