CN116756236A - Data synchronization method, device, equipment and storage medium - Google Patents

Data synchronization method, device, equipment and storage medium Download PDF

Info

Publication number
CN116756236A
CN116756236A CN202310627244.1A CN202310627244A CN116756236A CN 116756236 A CN116756236 A CN 116756236A CN 202310627244 A CN202310627244 A CN 202310627244A CN 116756236 A CN116756236 A CN 116756236A
Authority
CN
China
Prior art keywords
data
storage
disaster recovery
data table
synchronization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310627244.1A
Other languages
Chinese (zh)
Inventor
周允
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202310627244.1A priority Critical patent/CN116756236A/en
Publication of CN116756236A publication Critical patent/CN116756236A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data synchronization method, a device, equipment and a storage medium, which can be applied to the financial field or other fields and comprises the following steps: when the disaster recovery environment side performs data synchronization on the data of the production environment side, a basic information table and a storage instance table of the production environment side data table are obtained. The disaster recovery environment end can create a disaster recovery data synchronization table and record the synchronized data table information. And determining the data table to be synchronized which is not synchronized by comparing the disaster recovery backup data synchronization table with the storage instance table. And inquiring the basic information table to obtain a storage path of the data table to be synchronized, wherein the storage path stores a bottom storage file of the data table to be synchronized. And the disaster recovery environment end acquires the bottom storage file and stores the bottom storage file in the target storage path, so that the data tables to be synchronized can be synchronized. The bottom storage file of the production environment end is copied and stored in the disaster recovery environment end, so that the data synchronization can be realized, the data export and import processes are saved, and the data synchronization efficiency is improved.

Description

Data synchronization method, device, equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data synchronization method, apparatus, device, and storage medium.
Background
In the context of rapid development in the information age, the impact of large data assets on enterprise competitiveness is also becoming increasingly important, with large data systems being at the heart of enterprise-level data architecture. Once a big data system becomes problematic, significant losses are incurred to upstream and downstream systems and even enterprise services. Therefore, in the application process of enterprises, important data is synchronously backed up by disaster recovery means, and when original data is lost or fails, data service can be realized by utilizing the backed-up data, so that paralysis of the whole data service process is prevented.
The current disaster recovery scheme is to lead out the data to be backed up in the database, synchronize the exported data to the disaster recovery environment, and then import the data into the data table of the disaster recovery environment. The process of importing and exporting data is time consuming, affecting the efficiency of data synchronization.
Disclosure of Invention
In view of the above, the present application provides a data synchronization method, apparatus, device and storage medium, so as to improve the efficiency of data synchronization.
In a first aspect, the present application provides a data synchronization method, where the method is applied to a disaster recovery environment end, and the method includes:
acquiring a basic information table and a storage instance table of a production environment end data table, wherein the storage instance table is used for storing log records of the data table;
comparing the disaster recovery data synchronization table with the storage instance table to obtain a data table to be synchronized, wherein the disaster recovery data synchronization table is used for representing synchronized data table information;
inquiring the basic information table to acquire a storage path of the data table to be synchronized;
and acquiring a bottom storage file corresponding to the storage path, storing the bottom storage file into a target storage path, synchronizing the data table to be synchronized through the bottom storage file, and matching the target storage path with the storage path.
In one possible implementation, the basic information table includes: at least one of a database name, a data table name, a storage path of a data table, and a date field;
the storage instance table includes: at least one of a data table name and a creation date;
the disaster recovery data synchronization table comprises: at least one of a data table name, a creation date, a synchronization date, and a synchronization status.
In one possible implementation manner, the comparing the disaster recovery backup data synchronization table with the storage instance table, and obtaining the data table to be synchronized includes:
comparing the data table name and the creation date in the disaster recovery data synchronization table with the data table name and the creation date in the storage instance table to obtain the data table name which exists in the storage instance table and does not exist in the disaster recovery data synchronization table as a target data table name and the creation date which exists in the storage instance table and does not exist in the disaster recovery data synchronization table as a target creation date;
and determining the data table to be synchronized based on the data table name corresponding to the target data table name and the target creation date.
In one possible implementation, the method further includes:
acquiring a data table name with a synchronization state of synchronization failure in the disaster recovery data synchronization table, and a creation date corresponding to the data table name with the synchronization failure;
inquiring a storage path of a data table to be processed corresponding to the name of the data table with the synchronization failure and the creation date corresponding to the name of the data table with the synchronization failure in the basic information table;
and acquiring and storing the bottom storage file corresponding to the storage path of the data table to be processed, and synchronizing the data table to be processed through the bottom storage file corresponding to the storage path of the data table to be processed.
In one possible implementation manner, the obtaining the basic information table and the storage instance table of the production environment side data table includes:
creating a disaster recovery basic information table and a disaster recovery storage instance table of the disaster recovery environment end, wherein the disaster recovery basic information table has the same structure as the basic information table, and the disaster recovery storage instance table has the same structure as the storage instance table;
acquiring data of the basic information table and storing the data into the disaster recovery basic information table, and acquiring data of the storage instance table and storing the data into the disaster recovery storage instance table;
the disaster recovery backup data synchronization table and the storage instance table are compared to obtain a data table to be synchronized, and the method comprises the following steps:
and comparing the disaster recovery data synchronization table with the disaster recovery storage instance table to acquire the data table to be synchronized.
In one possible implementation, the data table is a hive table, the storage path is in the format of a distributed file system hdfs path, and the underlying storage file is an optimized rank ORC file.
In one possible implementation manner, the obtaining and storing the bottom storage file corresponding to the storage path includes:
and copying the ORC file corresponding to the hdfs path to the target storage path by using a distributed copy command.
In a second aspect, the present application provides a data synchronization apparatus, the apparatus comprising:
the first acquisition unit is used for acquiring a basic information table and a storage instance table of the production environment side data table, wherein the storage instance table is used for storing log records of the data table;
the second acquisition unit is used for comparing the disaster recovery data synchronization table with the storage instance table to acquire a data table to be synchronized, wherein the disaster recovery data synchronization table is used for representing synchronized data table information;
a third obtaining unit, configured to query the basic information table to obtain a storage path of the data table to be synchronized;
and the synchronization unit is used for acquiring the bottom storage file corresponding to the storage path and storing the bottom storage file into a target storage path, synchronizing the data table to be synchronized through the bottom storage file, and matching the target storage path with the storage path.
In a third aspect, the present application provides a data synchronization device, the device comprising: a memory and a processor;
the memory is used for storing related program codes;
the processor is configured to invoke the program code to execute the data synchronization method according to any implementation manner of the first aspect.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program for executing the data synchronization method according to any one of the implementation manners of the first aspect.
From this, the application has the following beneficial effects:
when the disaster recovery environment side performs data synchronization on the data of the production environment side, a basic information table and a storage instance table of the production environment side data table are obtained, wherein the basic information table comprises basic information of each data table, and the storage instance table is used for storing log records of the data table, such as newly adding data to a certain data table at a certain time. In order to record the information of the data table which is already synchronized by the production environment side, the disaster recovery environment side can create a disaster recovery data synchronization table, so that the data table to be synchronized which is not synchronized can be determined by comparing the disaster recovery data synchronization table with the storage instance table before data synchronization. And then inquiring the basic information table to acquire a storage path of the data table to be synchronized, wherein the storage path stores a bottom storage file of the data table to be synchronized. After the disaster recovery environment side obtains the bottom storage file and stores the bottom storage file in a target storage path of the disaster recovery environment side, the data table to be synchronized can be synchronized through the bottom storage file, and the target storage path is matched with the storage path of the production environment side. According to the data synchronization method provided by the application, the bottom storage file of the data table to be synchronized at the production environment end is copied and stored under the target storage path of the disaster recovery environment end, so that the data synchronization can be realized, the processes of exporting data and importing data are saved, and the data synchronization efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments provided in the present application, and other drawings may be obtained according to these drawings for those of ordinary skill in the art.
FIG. 1 is a schematic diagram of data synchronization according to an embodiment of the present application;
FIG. 2 is a flowchart of a data synchronization method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a data synchronization device according to an embodiment of the present application;
fig. 4 is a schematic diagram of a data synchronization device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, where the described embodiments are merely exemplary implementations, but not all implementations of the application. Those skilled in the art can combine embodiments of the application to obtain other embodiments without inventive faculty, and such embodiments are also within the scope of the application.
In the process of applying the data service, the paralysis of the whole data service caused by the loss of original data or the occurrence of faults is prevented, and important data is usually backed up by using disaster recovery means. The current disaster recovery scheme is to lead out the data to be backed up in the database, synchronize the exported data to the disaster recovery environment, and then import the data into the data table of the disaster recovery environment. The process of importing and exporting data is time consuming, affecting the efficiency of data synchronization.
Based on the above, the embodiment of the application provides a data synchronization method so as to improve the efficiency of data synchronization. When the disaster recovery environment side synchronizes the data of the production environment side, a basic information table and a storage instance table of the production environment side data table are obtained, wherein the basic information table comprises basic information of each data table, and the storage instance table is used for storing log records of the data tables. In order to record the information of the data table which is already synchronized by the production environment side, the disaster recovery environment side can create a disaster recovery data synchronization table, so that the data table to be synchronized which is not synchronized can be determined by comparing the disaster recovery data synchronization table with the storage instance table before data synchronization. And then inquiring the basic information table to acquire a storage path of the data table to be synchronized, wherein the storage path stores a bottom storage file of the data table to be synchronized. After the disaster recovery environment side obtains the bottom storage file and stores the bottom storage file in a target storage path of the disaster recovery environment side, the data table to be synchronized can be synchronized through the bottom storage file, and the target storage path is matched with the storage path of the production environment side. According to the data synchronization method provided by the application, the bottom storage file of the data table to be synchronized at the production environment end is copied and stored under the target storage path of the disaster recovery environment end, so that the data synchronization can be realized, the processes of exporting data and importing data are saved, and the data synchronization efficiency is improved.
In order to facilitate understanding of the technical method provided by the embodiment of the present application, the following description will be provided with reference to an application scenario.
Referring to fig. 1, fig. 1 is a schematic diagram of data synchronization according to an embodiment of the present application.
In the application scene, the production environment end and the disaster recovery environment end can be deployed on two different terminal devices or servers, and the production environment end and the disaster recovery environment end are connected through a network to realize data interaction. Alternatively, both the production environment and the disaster recovery environment may utilize the distributed system infrastructure Hadoop to store data.
In order to prevent the data of the production environment end from being failed or lost and affecting the data service in the application process, the data of the production environment end needs to be synchronized into the data table of the disaster recovery environment end in advance at regular time. Therefore, when the data of the production environment end is in a problem, the synchronous data of the disaster recovery environment end can be called, and the data service can be continuously provided. In the embodiment of the application, in order to realize data synchronization, the data table structure established at the disaster recovery environment end is the same as the data table structure of the production environment end.
In one possible implementation, the data table of the production environment side storage data may be a hive table. Wherein hive is a data warehouse tool based on Hadoop, and is used for extracting, converting and loading data, and can store, inquire and analyze large-scale data stored in Hadoop. In order to realize data synchronization, the disaster recovery environment end needs to create a hive table with the same structure as the hive table of the production environment end, so as to store the data synchronized from the production environment end.
For easy understanding, the following description will be given of the data of the disaster recovery environment side synchronous production environment side with reference to the accompanying drawings.
Referring to fig. 2, fig. 2 is a flowchart of a data synchronization method according to an embodiment of the present application.
The method may comprise the steps of:
s201: and acquiring a basic information table and a storage instance table of the production environment side data table.
The basic information table is used for recording basic information of the data table, and comprises a database name, a data table name, a storage path of the data table, a date field and the like. The date field may be represented as a partition field of the data table, for example, in the creation date field "create_date". The storage instance table is used to store log records of the data table, including data table names, creation dates, and the like. I.e. for each processing record of the data table. For example, it may be recorded that a field was modified for a certain data table at a certain time or that data was added to a certain data table at a certain time. The specific operation at a particular time for each data table can be understood by storing the instance table to obtain the latest processed data table according to the specific time.
In one possible implementation manner, after the disaster recovery environment end is connected with the production environment end through a network, the basic information table and the storage instance table of the production environment end can be requested to be accessed through the network so as to acquire the data of the basic information table and the storage instance table, and then the data synchronization is carried out according to the accessed data.
Or, the disaster recovery environment end can pre-create a disaster recovery basic information table with the same structure as the basic information table of the production environment end and a disaster recovery storage instance table with the same structure as the storage instance table of the production environment end. When the disaster recovery environment side requests to access the basic information table of the production environment side through the network, the data of the basic information table can be stored in the disaster recovery basic information table, and likewise, when the disaster recovery environment side requests to access the storage instance table of the production environment side through the network, the data of the storage instance table can be stored in the disaster recovery storage instance table.
S202: and comparing the disaster recovery backup data synchronization table with the storage instance table to obtain the data table to be synchronized.
The disaster recovery environment end can create a disaster recovery data synchronization table for storing synchronized data table information, wherein the synchronized data table information comprises a synchronized data table name, a creation date, a synchronization state of the data table and the like, and the synchronization state can be success or failure of synchronization.
Because the disaster recovery data synchronization table stores synchronized data table information, the data table which is not synchronized by the production environment can be obtained by comparing the disaster recovery data synchronization table with the storage instance table of the production environment. In specific implementation, comparing the data table name and the creation date in the disaster recovery data synchronous table with the data table name and the creation date in the storage instance table, and obtaining that the data table name existing in the storage instance table and not existing in the disaster recovery data synchronous table is the target data table name and the creation date existing in the storage instance table and not existing in the disaster recovery data synchronous table is the target creation date. That is, when the disaster recovery data synchronization table does not include the data table name of the production environment side, the data table corresponding to the data table name is not synchronized. And aiming at the names of the data tables existing in the disaster recovery data synchronization table, but excluding the creation date corresponding to the data table at the production environment side, the method also indicates that the data tables corresponding to the creation date are not synchronized. The data table to be synchronized may then be determined based on the determined target data table name and the data table name corresponding to the target creation date.
S203: and inquiring the basic information table to acquire a storage path of the data table to be synchronized.
After determining the data table to be synchronized, the storage path of the data table to be synchronized can be determined by querying the storage paths corresponding to the data tables in the basic information table.
Alternatively, when the data table is a hive table, the hive table may be stored in a distributed file system (Hadoop Distributed File System, hdfs), and hdfs may be applied in a Hadoop architecture, and the storage path corresponding to the hive table is a hdfs path. The storage path represents the location where the underlying storage file of the hive table is stored. When a data table is built in the database and data of the data table is stored, a bottom storage file corresponding to the data table can be generated for realizing the landing of the data table, and the process is not a midpoint of the present application and is not specifically described herein.
Alternatively, the underlying storage file format to which the hive table corresponds may be an optimized rank (The Optimized Row Columnar, ORC) file, or a TEXT format TEXT file. In this embodiment, an ORC file may be taken as an example, but the format of the underlying storage file does not affect implementation of the present solution.
S204: and acquiring the bottom storage file corresponding to the storage path, storing the bottom storage file in the target storage path, and synchronizing the data table to be synchronized through the bottom storage file.
And acquiring a bottom storage file corresponding to the storage path through the storage path of the data table to be synchronized, and storing the bottom storage file into a target storage path of the disaster recovery environment side. The target storage path and the storage path of the production environment end are matched paths, and the target storage path indicates the storage position of the bottom storage file corresponding to the data table of the disaster recovery environment end. That is, in order to synchronize the data of the production environment, the disaster recovery environment needs to create a data table with the same structure as the data table to be synchronized of the production environment, and designate that the target storage path of the data table matches the storage path of the data table to be synchronized. And after the acquired bottom storage file is stored in the target storage path, refreshing the data table of the disaster recovery environment end, and then synchronizing the content of the data table to be synchronized to the data table of the disaster recovery environment end.
In one possible implementation, the ORC file corresponding to the hdfs path may be copied into the target storage path by a distributed copy command. For example, the distributed copy command may be a distcp command, where the command includes the hdfs path of the production environment side, the ORC file name, information of the target storage path, and so on, that is, copying of the ORC file may be achieved.
After the data table to be synchronized is synchronized, the name, creation date, synchronization state and the like of the data table to be synchronized can be recorded in the disaster recovery data synchronization table. In one possible implementation manner, a data table name with a synchronization status of synchronization failure in the disaster recovery data synchronization table and a creation date corresponding to the data table name may also be obtained. And then inquiring the data table to be processed corresponding to the name of the data table with the failed synchronization and the creation date corresponding to the name of the data table with the failed synchronization and the storage path of the data table to be processed in the basic information table. And acquiring and storing the bottom storage file corresponding to the storage path of the data table to be processed, and synchronizing the data table to be processed through the bottom storage file corresponding to the storage path of the data table to be processed. That is, after determining the storage path of the data table to be processed, storing the corresponding bottom storage file in the storage path matched with the storage path of the data table to be processed, where the matched storage path corresponds to a data table with the same structure as the data table to be processed. At this time, the data of the data table to be processed can be synchronized into the data table by refreshing the data table.
The bottom storage file of the data table to be synchronized at the production environment end is copied and stored under the target storage path of the disaster recovery environment end, so that the data synchronization can be realized, the processes of exporting data and importing data are saved, and the data synchronization efficiency is improved.
Based on the method embodiment, the embodiment of the application also provides a data synchronization device. Referring to fig. 3, fig. 3 is a schematic diagram of a data synchronization device according to an embodiment of the present application.
The apparatus 300 includes:
a first obtaining unit 301, configured to obtain a basic information table and a storage instance table of a production environment side data table, where the storage instance table is used to store a log record of the data table;
a second obtaining unit 302, configured to compare a disaster recovery data synchronization table with the storage instance table, and obtain a data table to be synchronized, where the disaster recovery data synchronization table is used to represent synchronized data table information;
a third obtaining unit 303, configured to query the basic information table to obtain a storage path of the data table to be synchronized;
and the synchronization unit 304 is configured to acquire a bottom storage file corresponding to the storage path and store the bottom storage file in a target storage path, synchronize the data table to be synchronized through the bottom storage file, and match the target storage path with the storage path.
In one possible implementation, the basic information table includes: at least one of a database name, a data table name, a storage path of a data table, and a date field; the storage instance table includes: at least one of a data table name and a creation date; the disaster recovery data synchronization table comprises: at least one of a data table name, a creation date, a synchronization date, and a synchronization status.
In a possible implementation manner, the second obtaining unit 302 is specifically configured to compare a data table name and a creation date in the disaster recovery data synchronization table with a data table name and a creation date in the storage instance table, obtain that a data table name existing in the storage instance table and not existing in the disaster recovery data synchronization table is a target data table name, and a creation date existing in the storage instance table and not existing in the disaster recovery data synchronization table is a target creation date; and determining the data table to be synchronized based on the data table name corresponding to the target data table name and the target creation date.
In a possible implementation manner, the second obtaining unit 302 is further configured to obtain a data table name with a synchronization status of synchronization failure in the disaster recovery data synchronization table and a creation date corresponding to the data table name with synchronization failure;
the third obtaining unit 303 is further configured to query, in the basic information table, a storage path of a to-be-processed data table corresponding to the name of the data table with the synchronization failure and a creation date corresponding to the name of the data table with the synchronization failure;
the synchronization unit 304 is further configured to obtain and store a bottom storage file corresponding to a storage path of the to-be-processed data table, and synchronize the to-be-processed data table through the bottom storage file corresponding to the storage path of the to-be-processed data table.
In a possible implementation manner, the first obtaining unit 301 is specifically configured to create a disaster recovery basic information table and a disaster recovery storage instance table of the disaster recovery environment side, where the disaster recovery basic information table has the same structure as the basic information table, and the disaster recovery storage instance table has the same structure as the storage instance table; acquiring data of the basic information table and storing the data into the disaster recovery basic information table, and acquiring data of the storage instance table and storing the data into the disaster recovery storage instance table;
the second obtaining unit is specifically configured to compare the disaster recovery data synchronization table with the disaster recovery storage instance table, and obtain the data table to be synchronized.
In one possible implementation, the data table is a hive table, the storage path is in the format of a distributed file system hdfs path, and the underlying storage file is an optimized rank ORC file.
In a possible implementation, the synchronization unit 304 is configured to copy, using a distributed copy command, an ORC file corresponding to the hdfs path to the target storage path.
Based on the method embodiment and the device embodiment, the embodiment of the application also provides a data synchronization device. The following description will be made with reference to the accompanying drawings.
Referring to fig. 4, fig. 4 is a schematic diagram of a data synchronization device according to an embodiment of the present application.
The apparatus 400 includes: a memory 401 and a processor 402;
the memory 401 is used for storing relevant program codes;
the processor 402 is configured to invoke the program code to perform the data synchronization method described in the above method embodiment.
Furthermore, the embodiment of the application also provides a computer readable storage medium for storing a computer program for executing the data synchronization method according to the embodiment of the method.
It should be noted that the data synchronization method, device, equipment and storage medium provided by the application can be used in the financial field or other fields. Other fields are any field other than the financial field, for example, the data processing technical field. The foregoing is merely exemplary, and the application fields of the data synchronization method, apparatus, device and storage medium provided by the present application are not limited.
It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. In particular, for system or apparatus embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with relevant portions being referred to in the description of the method embodiments. The above-described apparatus embodiments are merely illustrative, in which units or modules illustrated as separate components may or may not be physically separate, and components shown as units or modules may or may not be physical modules, i.e. may be located in one place, or may be distributed over multiple network units, where some or all of the units or modules may be selected according to actual needs to achieve the purposes of the embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The data synchronization method is characterized by being applied to a disaster recovery environment end, and comprises the following steps:
acquiring a basic information table and a storage instance table of a production environment end data table, wherein the storage instance table is used for storing log records of the data table;
comparing the disaster recovery data synchronization table with the storage instance table to obtain a data table to be synchronized, wherein the disaster recovery data synchronization table is used for representing synchronized data table information;
inquiring the basic information table to acquire a storage path of the data table to be synchronized;
and acquiring a bottom storage file corresponding to the storage path, storing the bottom storage file into a target storage path, synchronizing the data table to be synchronized through the bottom storage file, and matching the target storage path with the storage path.
2. The method of claim 1, wherein the basic information table comprises: at least one of a database name, a data table name, a storage path of a data table, and a date field;
the storage instance table includes: at least one of a data table name and a creation date;
the disaster recovery data synchronization table comprises: at least one of a data table name, a creation date, a synchronization date, and a synchronization status.
3. The method of claim 2, wherein comparing the disaster recovery data synchronization table with the storage instance table, obtaining the data table to be synchronized, comprises:
comparing the data table name and the creation date in the disaster recovery data synchronization table with the data table name and the creation date in the storage instance table to obtain the data table name which exists in the storage instance table and does not exist in the disaster recovery data synchronization table as a target data table name and the creation date which exists in the storage instance table and does not exist in the disaster recovery data synchronization table as a target creation date;
and determining the data table to be synchronized based on the data table name corresponding to the target data table name and the target creation date.
4. The method according to claim 2, wherein the method further comprises:
acquiring a data table name with a synchronization state of synchronization failure in the disaster recovery data synchronization table, and a creation date corresponding to the data table name with the synchronization failure;
inquiring a storage path of a data table to be processed corresponding to the name of the data table with the synchronization failure and the creation date corresponding to the name of the data table with the synchronization failure in the basic information table;
and acquiring and storing the bottom storage file corresponding to the storage path of the data table to be processed, and synchronizing the data table to be processed through the bottom storage file corresponding to the storage path of the data table to be processed.
5. The method of claim 1, wherein the obtaining the basic information table and the storage instance table of the production environment side data table comprises:
creating a disaster recovery basic information table and a disaster recovery storage instance table of the disaster recovery environment end, wherein the disaster recovery basic information table has the same structure as the basic information table, and the disaster recovery storage instance table has the same structure as the storage instance table;
acquiring data of the basic information table and storing the data into the disaster recovery basic information table, and acquiring data of the storage instance table and storing the data into the disaster recovery storage instance table;
the disaster recovery backup data synchronization table and the storage instance table are compared to obtain a data table to be synchronized, and the method comprises the following steps:
and comparing the disaster recovery data synchronization table with the disaster recovery storage instance table to acquire the data table to be synchronized.
6. The method of any of claims 1 to 5, wherein the data table is a hive table, the storage path is in the format of a distributed file system hdfs path, and the underlying storage file is an optimized rank ORC file.
7. The method of claim 6, wherein the obtaining and storing the underlying storage file corresponding to the storage path comprises:
and copying the ORC file corresponding to the hdfs path to the target storage path by using a distributed copy command.
8. A data synchronization device, the device comprising:
the first acquisition unit is used for acquiring a basic information table and a storage instance table of the production environment side data table, wherein the storage instance table is used for storing log records of the data table;
the second acquisition unit is used for comparing the disaster recovery data synchronization table with the storage instance table to acquire a data table to be synchronized, wherein the disaster recovery data synchronization table is used for representing synchronized data table information;
a third obtaining unit, configured to query the basic information table to obtain a storage path of the data table to be synchronized;
and the synchronization unit is used for acquiring the bottom storage file corresponding to the storage path and storing the bottom storage file into a target storage path, synchronizing the data table to be synchronized through the bottom storage file, and matching the target storage path with the storage path.
9. A data synchronization device, the device comprising: a memory and a processor;
the memory is used for storing related program codes;
the processor is configured to invoke the program code to perform the data synchronization method of any of claims 1 to 7.
10. A computer readable storage medium for storing a computer program for executing the data synchronization method according to any one of claims 1 to 7.
CN202310627244.1A 2023-05-30 2023-05-30 Data synchronization method, device, equipment and storage medium Pending CN116756236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310627244.1A CN116756236A (en) 2023-05-30 2023-05-30 Data synchronization method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310627244.1A CN116756236A (en) 2023-05-30 2023-05-30 Data synchronization method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116756236A true CN116756236A (en) 2023-09-15

Family

ID=87958048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310627244.1A Pending CN116756236A (en) 2023-05-30 2023-05-30 Data synchronization method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116756236A (en)

Similar Documents

Publication Publication Date Title
CN107402963B (en) Search data construction method, incremental data pushing device and equipment
US10229152B2 (en) Automatically restoring data replication consistency without service interruption during parallel apply
US20150213100A1 (en) Data synchronization method and system
US8195606B2 (en) Batch data synchronization with foreign key constraints
US20130073516A1 (en) Extracting Incremental Data
CN109407977B (en) Big data distributed storage management method and system
CN109460438B (en) Message data storage method, device, computer equipment and storage medium
CN109918229B (en) Database cluster copy construction method and device in non-log mode
CN104809201A (en) Database synchronization method and device
CN107870982B (en) Data processing method, system and computer readable storage medium
US20110271145A1 (en) Efficient failure detection for long running data transfer jobs
CN110287251B (en) MongoDB-HBase distributed high fault-tolerant data real-time synchronization method
US20120278429A1 (en) Cluster system, synchronization controlling method, server, and synchronization controlling program
CN113806301B (en) Data synchronization method, device, server and storage medium
CN105900093A (en) Keyvalue database data table updating method and data table updating device
CN107451172A (en) Method of data synchronization and equipment for edition management system
CN111625396A (en) Backup data verification method, server and storage medium
CN110121694B (en) Log management method, server and database system
CN114610533A (en) Database processing method and device
CN109462661A (en) Method of data synchronization, device, computer equipment and storage medium
CN116756236A (en) Data synchronization method, device, equipment and storage medium
CN111522688A (en) Data backup method and device for distributed system
CN111966650B (en) Operation and maintenance big data sharing data table processing method and device and storage medium
CN111400243B (en) Development management system based on pipeline service and file storage method and device
CN113032408A (en) Data processing method, system and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination