CN117033512A - Data synchronization method, apparatus, device, medium, and program product - Google Patents

Data synchronization method, apparatus, device, medium, and program product Download PDF

Info

Publication number
CN117033512A
CN117033512A CN202310994154.6A CN202310994154A CN117033512A CN 117033512 A CN117033512 A CN 117033512A CN 202310994154 A CN202310994154 A CN 202310994154A CN 117033512 A CN117033512 A CN 117033512A
Authority
CN
China
Prior art keywords
file
data
data file
size
ith
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310994154.6A
Other languages
Chinese (zh)
Inventor
党震宇
张志海
李俊谦
李昊溟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310994154.6A priority Critical patent/CN117033512A/en
Publication of CN117033512A publication Critical patent/CN117033512A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a data synchronization method, which may be applied to the cloud computing technology field, the distributed technology field, the financial science and technology field, or other related fields. The method comprises the following steps: setting one park of m parks as a main park for an access application, and setting the other n parks as n standby parks for the access application; writing the data of the access application into a service cluster of the home park to generate a first data file; synchronizing the data of the access application to the service clusters of the n backup parks to generate n second data files; checking consistency of each of the n second data files with the first data file; in response to an ith second data file of the n second data files being inconsistent with the first data file, a resynchronization operation is performed on the ith second data file based on the first data file. The method ensures the consistency of the data of each park, improves the disaster recovery and backup capacity of the platform, and enhances the safety of the data.

Description

Data synchronization method, apparatus, device, medium, and program product
Technical Field
The present disclosure relates to the field of cloud computing technology and the field of distributed technology, and more particularly, to a data synchronization method, apparatus, device, medium, and program product.
Background
The object storage service is an object-based mass storage service, and provides mass, safe, high-reliability and low-cost data storage capacity for clients. Modern organizations need to create and analyze large amounts of unstructured data, such as photographs, videos, emails, web pages, and audio files.
In object storage, disaster recovery is one of important indexes of a system, and disaster recovery refers to short for disaster recovery and backup. Disaster tolerance means that two or more sets of IT systems with the same function are built in the same city or different places, health state monitoring and function switching can be performed between the two systems, and when the system stops working accidentally in one place, the whole application system can be switched to the other place, so that the system functions can work normally; by backup, it is meant that a user makes one or more copies of data generated by an application to enhance the security of the data.
In order to improve disaster recovery capability, most file systems currently use dual-park storage, i.e. all data backup is stored in another park. Due to the specificity of the financial industry, the storage platform is required to provide 7×24 hours of read-write service to the outside. Moreover, in the normal running time of the platform, the data synchronization mechanism exists among parks, but the final consistency of the data of each center cannot be completely guaranteed.
Disclosure of Invention
In view of the foregoing, according to a first aspect of the present disclosure, an embodiment of the present disclosure provides a data synchronization method applied to an object storage platform, where the object storage platform includes service clusters deployed in m parks, and m is a positive integer greater than or equal to 2, the method including: setting one park of the m parks as a main park for the access application and setting the other n parks of the m parks as n standby parks for the access application in response to the requirement of the access application accessing the object storage platform, wherein n is a positive integer greater than or equal to 1 and less than or equal to m-1; writing data of the access application into a service cluster of the home park in response to a write operation of the access application to generate a first data file; synchronizing the data of the access application to the service clusters of the n backup parks to generate n second data files; checking, by a file checking service, consistency of each of the n second data files with the first data file; and in response to the ith second data file in the n second data files not being consistent with the first data file, performing resynchronization operation on the ith second data file based on the first data file, wherein i is a positive integer greater than or equal to 1 and less than or equal to n.
According to some exemplary embodiments, the checking, by the file checking service, consistency of each of the n second data files with the first data file specifically includes: acquiring the size of a file block in the first data file and the size of a file block in the ith second data file; comparing the size of the file block in the first data file with the size of the file block in the ith second data file; and responsive to the size of the file block in the first data file not being consistent with the size of the file block in the ith second data file, determining that the ith second data file is not consistent with the first data file.
According to some exemplary embodiments, the obtaining the size of the file block in the first data file and the size of the file block in the ith second data file specifically includes: continuously scanning file blocks which are closed under a first storage directory through a file checking service process deployed on a service cluster of the main park to acquire the size of the file blocks in the first data file, wherein the first storage directory is a storage directory used for storing data of the access application in the service cluster of the main park; and continuously scanning the file blocks which are closed under a second storage directory through a file checking service process deployed on the service cluster of the ith backup park to acquire the size of the file blocks in the ith second data file, wherein the second storage directory is a storage directory used for storing the data of the access application in the service cluster of the ith backup park.
According to some exemplary embodiments, the checking, by the file checking service, consistency of each of the n second data files with the first data file further specifically includes: acquiring the size of each object of the file block in the first data file and the size of each object of the file block in the ith second data file in response to the size of the file block in the first data file being consistent with the size of the file block in the ith second data file; comparing the size of each object of the file block in the first data file with the size of each object of the file block in the ith second data file; and determining that the ith second data file is inconsistent with the first data file in response to the size of the at least one object of the file block in the first data file not being inconsistent with the size of the at least one object of the file block in the ith second data file.
According to some exemplary embodiments, the performing a resynchronization operation on the ith second data file based on the first data file includes: and deleting the j-th file block in the i-th second data file when the size of the j-th file block in the first data file is inconsistent with the size of the j-th file block in the i-th second data file, and copying the j-th file block in the first data file to a storage position corresponding to the j-th file block in the i-th second data file.
According to some exemplary embodiments, after said checking the consistency of each of the n second data files with the first data file, the method further comprises: writing the file block information which is checked to be inconsistent into a database; the performing a resynchronization operation on the ith second data file based on the first data file includes: the database is scanned at regular time through the recovery service so as to obtain file block information of which the inconsistency is detected; and executing resynchronization operation in an idle period of the service cluster according to the acquired file block information.
According to some exemplary embodiments, the checking, by the file checking service, consistency of each of the n second data files with the first data file further specifically includes: acquiring the size of an index file block in the first data file and the size of an index file block in the ith second data file; comparing the size of the index file block in the first data file with the size of the index file block in the ith second data file; and responsive to the size of the index file block in the first data file not being consistent with the size of the index file block in the ith second data file, determining that the ith second data file is not consistent with the first data file.
According to some exemplary embodiments, the method further comprises: and in response to the data amount in the first data file being greater than a preset data amount threshold, starting a plurality of file checking services and a plurality of recovery services, and concurrently performing the checking and the resynchronization operation.
According to a second aspect of the present disclosure, there is also provided a data synchronization apparatus applied to an object storage platform, the object storage platform including service clusters deployed in m parks, m being a positive integer greater than or equal to 2, the apparatus comprising:
a setting module, configured to: setting one park of the m parks as a main park for the access application and setting the other n parks of the m parks as n standby parks for the access application in response to the requirement of the access application accessing the object storage platform, wherein n is a positive integer greater than or equal to 1 and less than or equal to m-1;
a writing module for: writing data of the access application into a service cluster of the home park in response to a write operation of the access application to generate a first data file;
a synchronization module for: synchronizing the data of the access application to the service clusters of the n backup parks to generate n second data files;
A file checking module for: checking, by a file checking service, consistency of each of the n second data files with the first data file; and
a resynchronization module for: and responding to that an ith second data file in the n second data files is inconsistent with the first data file, and executing resynchronization operation on the ith second data file based on the first data file, wherein i is a positive integer which is more than or equal to 1 and less than or equal to n.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: one or more processors; and a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.
According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method as described above.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
One or more of the above embodiments have the following advantages or benefits: by the method provided by the disclosure, the data of multiple centers are synchronized, so that the final consistency of the data among all parks (centers) can be ensured, the centers can take over the service at any time, the disaster recovery and backup capacity of the platform are obviously improved, and the safety of the data is enhanced.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an application scenario diagram of a data synchronization method, apparatus, device, medium and program product according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a data synchronization method according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a block diagram of a data synchronization device according to an embodiment of the present disclosure;
fig. 4 schematically illustrates a block diagram of an electronic device of a data synchronization method according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are taken, and the public order harmony is not violated.
First, technical terms described herein are explained and illustrated as follows.
kafka is a high throughput distributed message queue system. The method is characterized in that the producer consumer mode, first In First Out (FIFO) guarantees sequence, does not lose data by itself, and defaults to clear data every 7 days.
It should be noted that, the data synchronization method and the data synchronization device provided by the embodiment of the disclosure may be applied to the fields of cloud computing and distributed technology, and may also be applied to the financial field.
Fig. 1 schematically illustrates an application scenario diagram of a data synchronization method, apparatus, device, medium and program product according to an embodiment of the present disclosure.
As shown in fig. 1, an application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the privacy compliance detection method for the internet of things device provided in the embodiments of the present disclosure may be generally executed by the server 105. Accordingly, the privacy compliance detection apparatus for an internet of things device provided in the embodiments of the present disclosure may be generally disposed in the server 105. The privacy compliance detection method for an internet of things device provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the privacy compliance detection apparatus for an internet of things device provided in the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 2 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically illustrates a flow chart for a data synchronization method according to an embodiment of the disclosure.
Embodiments of the present disclosure provide a data synchronization method, and a flowchart of the method may be seen in fig. 2, specifically including operations S1-S5.
In operation S1, in response to a request of an access application accessing the object storage platform, one of the m parks is set as a main park for the access application, and the other n parks of the m parks are set as n standby parks for the access application, wherein n is a positive integer greater than or equal to 1 and less than or equal to m-1.
In operation S2, data of the access application is written into a service cluster of the home park in response to a write operation of the access application to generate a first data file.
In operation S3, the data of the access application is synchronized into the service clusters of the n campuses to generate n second data files.
In operation S4, each of the n second data files is checked for consistency with the first data file by a file check service.
In operation S5, in response to the ith second data file of the n second data files not conforming to the first data file, performing a resynchronization operation on the ith second data file based on the first data file, wherein i is a positive integer of 1 or more and n or less.
In the method provided by the embodiment of the disclosure, for m parks owned by an object storage platform, m is a positive integer greater than or equal to 2, service clusters in the m parks are deployed first, including storage service and kafka cluster service. When an access application is accessed, setting main park information according to the requirement of the access application, and setting the rest n parks as standby parks, wherein n is a positive integer which is more than or equal to 1 and less than or equal to m-1. Setting up to the home park indicates that the access application write operation will only be in the home park, and simultaneously generates the first data file, then connects the kafka cluster through the replier process, establishes a plurality of consumer groups, synchronizes the data of the access application to other backup parks, and generates the second data file in the backup parks. And checking whether each second data file in the n second data files is consistent with the first data file through the file checking service, and if not, executing a resynchronization operation on the ith second data file. According to the method provided by the disclosure, the data of the multiple centers can be synchronized, and under the condition that the data consistency of the multi-center object storage platform is ensured, the method has high availability and high disaster recovery capability, and inconsistent data files are synchronously modified, so that the investment of manpower and material resources in the later stage can be reduced, and the corresponding cost is saved. By the distributed system infrastructure, the characteristics of the clusters are fully utilized, and the access application writing data is stored in multiple copies, so that the backup capability of the storage platform can be improved.
Further, operation S4 specifically includes S41-S43.
In operation S41, a size of a file block in the first data file and a size of a file block in the ith second data file are acquired.
In operation S42, the size of the file block in the first data file is compared with the size of the file block in the ith second data file.
In operation S43, it is determined that the ith second data file is inconsistent with the first data file in response to the size of the file block in the first data file being inconsistent with the size of the file block in the ith second data file.
In the embodiment of the disclosure, after the main park and the backup park are deployed and corresponding data files are generated, the phenomenon of inconsistent data may still occur. Thus, based on the home zone information set by the access application, the data in the home zone must be considered complete and accurate (i.e., if data inconsistencies occur, there must be more data in the home zone than in the backup zone). In this case, the file-locker service is checked by the file, and based on the data stored in the home park, whether or not the hdfs file block size written by the user is uniform is scanned and judged. When the consistency of the data file is checked through the file checking service, the size of the file block in the data file is firstly obtained, and then the comparison and the judgment are carried out, so that the checking efficiency and the checking accuracy can be improved.
Further, operation S41 specifically includes S411-S412.
In operation S411, through a file checking service process deployed on a service cluster of the home park, file blocks that have been closed under a first storage directory are continuously scanned to obtain a size of a file block in the first data file, where the first storage directory is a storage directory in the service cluster of the home park for storing data of the access application.
In operation S412, the file blocks that have been closed under the second storage directory are continuously scanned by the file checking service process deployed on the service cluster of the ith backup park to obtain the size of the file blocks in the ith second data file, where the second storage directory is a storage directory in the service cluster of the ith backup park for storing the data of the access application.
In the embodiment provided by the disclosure, the size of the file block can be obtained more completely by scanning the file block already closed under the storage directory, and other related operations of the file checking service are not affected.
In the embodiment provided by the present disclosure, operation S4 further specifically includes S44-S46.
In operation S44, in response to the size of the file block in the first data file being identical to the size of the file block in the i-th second data file, the size of each object of the file block in the first data file and the size of each object of the file block in the i-th second data file are acquired.
In operation S45, the sizes of the respective objects of the file blocks in the first data file and the sizes of the respective objects of the file blocks in the ith second data file are compared, respectively.
In operation S46, it is determined that the ith second data file is inconsistent with the first data file in response to the size of the at least one object of the file block in the first data file being inconsistent with the size of the at least one object of the file block in the ith second data file.
When comparing, there is still a possibility that the file sizes of the primary and backup parks are consistent for a certain file block, but the storage objects of the users are inconsistent in the process of file transmission. In this case, a file object checking service hdsf-object-locker service is required to perform object checking on the database with inconsistent master and slave in the inserted database, and also check whether each object stored in the master park in the file block is consistent with each object stored in the slave park based on the master park. On the basis of comparing the sizes of the file blocks, the sizes of all objects in the file blocks are further compared, and compared with the method for detecting the sizes of the file blocks only, the method has the advantages that the detection efficiency of error data is higher and more accurate, and meanwhile, the method can help to modify the error data later.
Further, operation S5 specifically includes S51.
In operation S51, when the size of the jth file block in the first data file is inconsistent with the size of the jth file block in the ith second data file, deleting the jth file block in the ith second data file, and copying the jth file block in the first data file to a storage location corresponding to the jth file block in the ith second data file.
In the embodiment provided by the present disclosure, if an inconsistent situation is found, fewer file blocks in the backup park are deleted first, and then the complete file block in the primary park is copied to the backup park. The hbase index file block may be copied from the home park to the backup park in a similar manner. The file blocks of the backup parks are modified in the comparison process, so that the final consistency of data among the parks (centers) can be ensured, the centers can take over the service at any time, the disaster recovery and backup capacity of the platform are obviously improved, and the safety of the data is enhanced.
In an embodiment provided by the present disclosure, after checking the consistency of each of the n second data files with the first data file, further comprising: and writing the file block information which is checked to be inconsistent into a database. Performing a resynchronization operation on the ith second data file based on the first data file includes: the database is scanned at regular time through the recovery service so as to obtain file block information of which the inconsistency is detected; and executing resynchronization operation in an idle period of the service cluster according to the acquired file block information.
Recording inconsistent file block information into a database, then acquiring a file block task list to be repaired from the database by recovering the recovery service, and executing resynchronization operation in an idle period, so that a certain data storage space can be released, the processing cost is reduced, the processing is centralized, the synchronization efficiency can be improved, and the method has universality and flexibility.
Further, operation S4 also specifically includes S47-S49.
In operation S47, the size of the index file block in the first data file and the size of the index file block in the ith second data file are acquired.
In operation S48, the size of the index file block in the first data file is compared with the size of the index file block in the ith second data file.
In operation S49, in response to the size of the index file block in the first data file not being identical to the size of the index file block in the ith second data file, it is determined that the ith second data file is not identical to the first data file.
In the embodiment provided in the present disclosure, by acquiring and comparing the sizes of the index file blocks in the first data file and the ith second data file, since the index file structure is a data structure constructed to increase the searching speed, the index file structure is often used in the fields of search engines, databases, and the like in practical applications. Common index file structures are b+ trees, hash tables, inverted indexes, and the like. Each of these structures has its own features and application scenarios. In this embodiment, the index file block is referred to, so that the searching speed and efficiency can be effectively improved.
In an embodiment of the present disclosure, the method further includes, in response to the amount of data in the first data file being greater than a preset data amount threshold, starting a plurality of file checking services and a plurality of recovery services, and concurrently performing the checking and the resynchronizing operations.
And (3) concurrently checking and resynchronizing the data in the data file exceeding the set data quantity threshold, and simultaneously designating a certain file checking process to check the file blocks under the specific directory, and continuously checking the closed hdfs file (closing time exceeds 3 minutes) for 24 hours, so that the overall efficiency of the file checking process is ensured.
Fig. 3 schematically shows a block diagram of a data synchronization apparatus according to an embodiment of the present disclosure.
As shown in fig. 3, the data synchronization apparatus 300 according to this embodiment includes a setting module 310, a writing module 320, a synchronization module 330, a file checking module 340, and a resynchronization module 350.
The setting module 310 is configured to set one of the m parks as a home park for the access application and set the other n parks of the m parks as n standby parks for the access application in response to a requirement of the access application accessing the object storage platform, where n is a positive integer greater than or equal to 1 and less than or equal to m-1.
And the writing module 320 is configured to write the data of the access application into the service cluster of the home park in response to the writing operation of the access application, so as to generate a first data file.
And the synchronization module 330 is configured to synchronize the data of the access application to the service clusters of the n campuses, so as to generate n second data files.
A file checking module 340, configured to check, by a file checking service, consistency of each of the n second data files with the first data file.
And a resynchronization module 350, configured to perform, based on the first data file, a resynchronization operation on an ith second data file in response to the ith second data file being inconsistent with the first data file, where i is a positive integer greater than or equal to 1 and less than or equal to n.
In an embodiment of the present disclosure, the file checking module 340 is specifically configured to: acquiring the size of a file block in the first data file and the size of a file block in the ith second data file; comparing the size of the file block in the first data file with the size of the file block in the ith second data file; and responsive to the size of the file block in the first data file not being consistent with the size of the file block in the ith second data file, determining that the ith second data file is not consistent with the first data file.
In embodiments of the present disclosure, the file inspection module 340 may also be configured to: continuously scanning file blocks which are closed under a first storage directory through a file checking service process deployed on a service cluster of the main park to acquire the size of the file blocks in the first data file, wherein the first storage directory is a storage directory used for storing data of the access application in the service cluster of the main park; and continuously scanning the file blocks which are closed under a second storage directory through a file checking service process deployed on the service cluster of the ith backup park to acquire the size of the file blocks in the ith second data file, wherein the second storage directory is a storage directory used for storing the data of the access application in the service cluster of the ith backup park.
In embodiments of the present disclosure, the file inspection module 340 may also be configured to: acquiring the size of each object of the file block in the first data file and the size of each object of the file block in the ith second data file in response to the size of the file block in the first data file being consistent with the size of the file block in the ith second data file; comparing the size of each object of the file block in the first data file with the size of each object of the file block in the ith second data file; and determining that the ith second data file is inconsistent with the first data file in response to the size of the at least one object of the file block in the first data file not being inconsistent with the size of the at least one object of the file block in the ith second data file.
In an embodiment of the present disclosure, the resynchronization module 350 is specifically configured to: and deleting the j-th file block in the i-th second data file when the size of the j-th file block in the first data file is inconsistent with the size of the j-th file block in the i-th second data file, and copying the j-th file block in the first data file to a storage position corresponding to the j-th file block in the i-th second data file.
In embodiments of the present disclosure, the resynchronization module 350 may also be used to: writing the file block information which is checked to be inconsistent into a database; the performing a resynchronization operation on the ith second data file based on the first data file includes: the database is scanned at regular time through the recovery service so as to obtain file block information of which the inconsistency is detected; and executing resynchronization operation in an idle period of the service cluster according to the acquired file block information.
In embodiments of the present disclosure, the file inspection module 340 may also be configured to: acquiring the size of an index file block in the first data file and the size of an index file block in the ith second data file; comparing the size of the index file block in the first data file with the size of the index file block in the ith second data file; and responsive to the size of the index file block in the first data file not being consistent with the size of the index file block in the ith second data file, determining that the ith second data file is not consistent with the first data file.
In embodiments of the present disclosure, the resynchronization module 350 may also be used to: and in response to the data amount in the first data file being greater than a preset data amount threshold, starting a plurality of file checking services and a plurality of recovery services, and concurrently performing the checking and the resynchronization operation.
Fig. 4 schematically illustrates a block diagram of an electronic device of a data synchronization method according to an embodiment of the disclosure.
As shown in fig. 4, an electronic device 400 according to an embodiment of the present invention includes a processor 401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. The processor 401 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 401 may also include on-board memory for caching purposes. Processor 401 may include a single processing unit or multiple processing units for performing the different actions of the method flow in accordance with an embodiment of the invention.
In the RAM 403, various programs and data necessary for the operation of the electronic device 400 are stored. The processor 401, the ROM 402, and the RAM 403 are connected to each other by a bus 404. The processor 401 performs various operations of the method flow according to the embodiment of the present invention by executing programs in the ROM 402 and/or the RAM 403. Note that the program may be stored in one or more memories other than the ROM 402 and the RAM 403. The processor 401 may also perform various operations of the method flow according to an embodiment of the present invention by executing programs stored in the one or more memories.
According to an embodiment of the invention, the electronic device 400 may further comprise an input/output (I/O) interface 405, the input/output (I/O) interface 405 also being connected to the bus 404. Electronic device 400 may also include one or more of the following components connected to I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output portion 407 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage section 408 including a hard disk or the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. The drive 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 410 as needed, so that a computer program read therefrom is installed into the storage section 408 as needed.
The present invention also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present invention.
According to embodiments of the present invention, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, the computer-readable storage medium may include ROM 402 and/or RAM 403 and/or one or more memories other than ROM 402 and RAM 403 described above.
Embodiments of the present invention also include a computer program product comprising a computer program containing program code for performing the method shown in the flowcharts. The program code means for causing a computer system to carry out the methods provided by embodiments of the present invention when the computer program product is run on the computer system.
The above-described functions defined in the system/apparatus of the embodiment of the present invention are performed when the computer program is executed by the processor 401. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication portion 409, and/or installed from the removable medium 411. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 409 and/or installed from the removable medium 411. The above-described functions defined in the system of the embodiment of the present invention are performed when the computer program is executed by the processor 401. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

Claims (12)

1. The data synchronization method is applied to an object storage platform, wherein the object storage platform comprises service clusters deployed in m parks, and m is a positive integer greater than or equal to 2, and is characterized by comprising the following steps:
setting one park of the m parks as a main park for the access application and setting the other n parks of the m parks as n standby parks for the access application in response to the requirement of the access application accessing the object storage platform, wherein n is a positive integer greater than or equal to 1 and less than or equal to m-1;
writing data of the access application into a service cluster of the home park in response to a write operation of the access application to generate a first data file;
Synchronizing the data of the access application to the service clusters of the n backup parks to generate n second data files;
checking, by a file checking service, consistency of each of the n second data files with the first data file; and
and responding to that an ith second data file in the n second data files is inconsistent with the first data file, and executing resynchronization operation on the ith second data file based on the first data file, wherein i is a positive integer which is more than or equal to 1 and less than or equal to n.
2. The method according to claim 1, wherein said checking, by a file checking service, the consistency of each of said n second data files with said first data file, in particular comprises:
acquiring the size of a file block in the first data file and the size of a file block in the ith second data file;
comparing the size of the file block in the first data file with the size of the file block in the ith second data file; and
and in response to the size of the file block in the first data file not being consistent with the size of the file block in the ith second data file, determining that the ith second data file is not consistent with the first data file.
3. The method according to claim 1, wherein the obtaining the size of the file block in the first data file and the size of the file block in the ith second data file specifically comprises:
continuously scanning file blocks which are closed under a first storage directory through a file checking service process deployed on a service cluster of the main park to acquire the size of the file blocks in the first data file, wherein the first storage directory is a storage directory used for storing data of the access application in the service cluster of the main park; and
and continuously scanning the closed file blocks under a second storage directory through a file checking service process deployed on the service cluster of the ith backup park to acquire the size of the file blocks in the ith second data file, wherein the second storage directory is a storage directory used for storing the data of the access application in the service cluster of the ith backup park.
4. A method according to any of claims 1-3, wherein said checking, by a file checking service, the consistency of each of said n second data files with said first data file, further specifically comprises:
Acquiring the size of each object of the file block in the first data file and the size of each object of the file block in the ith second data file in response to the size of the file block in the first data file being consistent with the size of the file block in the ith second data file;
comparing the size of each object of the file block in the first data file with the size of each object of the file block in the ith second data file; and
and in response to the size of at least one object of the file block in the first data file not being consistent with the size of at least one object of the file block in the ith second data file, determining that the ith second data file is not consistent with the first data file.
5. The method of claim 4, wherein performing a resynchronization operation on the ith second data file based on the first data file comprises:
and deleting the j-th file block in the i-th second data file when the size of the j-th file block in the first data file is inconsistent with the size of the j-th file block in the i-th second data file, and copying the j-th file block in the first data file to a storage position corresponding to the j-th file block in the i-th second data file.
6. The method of any of claims 1-3 and 5, wherein after said checking for consistency of each of the n second data files with the first data file, the method further comprises: writing the file block information which is checked to be inconsistent into a database;
the performing a resynchronization operation on the ith second data file based on the first data file includes: the database is scanned at regular time through the recovery service so as to obtain file block information of which the inconsistency is detected; and executing resynchronization operation in an idle period of the service cluster according to the acquired file block information.
7. A method according to claim 2 or 3, wherein said checking, by a file checking service, the consistency of each of said n second data files with said first data file, further comprises in particular:
acquiring the size of an index file block in the first data file and the size of an index file block in the ith second data file;
comparing the size of the index file block in the first data file with the size of the index file block in the ith second data file; and
And in response to the size of the index file block in the first data file not being consistent with the size of the index file block in the ith second data file, determining that the ith second data file is not consistent with the first data file.
8. The method of claim 6, wherein the method further comprises: and in response to the data amount in the first data file being greater than a preset data amount threshold, starting a plurality of file checking services and a plurality of recovery services, and concurrently performing the checking and the resynchronization operation.
9. A data synchronization device applied to an object storage platform, the object storage platform including service clusters deployed in m parks, m being a positive integer greater than or equal to 2, the device comprising:
a setting module, configured to: setting one park of the m parks as a main park for the access application and setting the other n parks of the m parks as n standby parks for the access application in response to the requirement of the access application accessing the object storage platform, wherein n is a positive integer greater than or equal to 1 and less than or equal to m-1;
a writing module for: writing data of the access application into a service cluster of the home park in response to a write operation of the access application to generate a first data file;
A synchronization module for: synchronizing the data of the access application to the service clusters of the n backup parks to generate n second data files;
a file checking module for: checking, by a file checking service, consistency of each of the n second data files with the first data file; and
a resynchronization module for: and responding to that an ith second data file in the n second data files is inconsistent with the first data file, and executing resynchronization operation on the ith second data file based on the first data file, wherein i is a positive integer which is more than or equal to 1 and less than or equal to n.
10. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-8.
11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-8.
12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.
CN202310994154.6A 2023-08-08 2023-08-08 Data synchronization method, apparatus, device, medium, and program product Pending CN117033512A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310994154.6A CN117033512A (en) 2023-08-08 2023-08-08 Data synchronization method, apparatus, device, medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310994154.6A CN117033512A (en) 2023-08-08 2023-08-08 Data synchronization method, apparatus, device, medium, and program product

Publications (1)

Publication Number Publication Date
CN117033512A true CN117033512A (en) 2023-11-10

Family

ID=88629418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310994154.6A Pending CN117033512A (en) 2023-08-08 2023-08-08 Data synchronization method, apparatus, device, medium, and program product

Country Status (1)

Country Link
CN (1) CN117033512A (en)

Similar Documents

Publication Publication Date Title
CN109086409B (en) Microservice data processing method and device, electronic equipment and computer readable medium
US12019652B2 (en) Method and device for synchronizing node data
US8478726B2 (en) Parallel database backup and restore
US10380103B2 (en) Object data updating method and apparatus in an object storage system
CN113254466B (en) Data processing method and device, electronic equipment and storage medium
CN110262807B (en) Cluster creation progress log acquisition system, method and device
CN113485962B (en) Log file storage method, device, equipment and storage medium
US11544229B1 (en) Enhanced tracking of data flows
CN113343312B (en) Page tamper-proof method and system based on front-end embedded point technology
CN111338834B (en) Data storage method and device
CN110895534A (en) Data splicing method, device, medium and electronic equipment
CN110795331A (en) Software testing method and device
CN112181724B (en) Big data disaster recovery method and device and electronic equipment
CN111444148A (en) Data transmission method and device based on MapReduce
CN116521639A (en) Log data processing method, electronic equipment and computer readable medium
CN113220237B (en) Distributed storage method, device, equipment and storage medium
CN116069725A (en) File migration method, device, apparatus, medium and program product
CN111984686A (en) Data processing method and device
CN113886353B (en) Data configuration recommendation method and device for hierarchical storage management software and storage medium
CN115098469A (en) Database migration method and device, electronic equipment and readable storage medium
CN117033512A (en) Data synchronization method, apparatus, device, medium, and program product
CN113127496B (en) Method and device for determining change data in database, medium and equipment
CN110941683B (en) Method, device, medium and electronic equipment for acquiring object attribute information in space
CN110543520B (en) Data migration method and device
CN113760860B (en) Data reading method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination