CN113449043A - Data synchronization method and device, computer equipment and storage medium - Google Patents

Data synchronization method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN113449043A
CN113449043A CN202110822595.9A CN202110822595A CN113449043A CN 113449043 A CN113449043 A CN 113449043A CN 202110822595 A CN202110822595 A CN 202110822595A CN 113449043 A CN113449043 A CN 113449043A
Authority
CN
China
Prior art keywords
data
operation log
target
target database
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110822595.9A
Other languages
Chinese (zh)
Inventor
孙建涛
苏杭
郭斌
王瑞
付英伟
王一朝
黄煜东
于军
尹瑛
白云
杨劲
刘莹君
李成钢
余清华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
People's Liberation Army 61932 Troops
Original Assignee
People's Liberation Army 61932 Troops
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by People's Liberation Army 61932 Troops filed Critical People's Liberation Army 61932 Troops
Priority to CN202110822595.9A priority Critical patent/CN113449043A/en
Publication of CN113449043A publication Critical patent/CN113449043A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity

Abstract

The application relates to a data synchronization method, a data synchronization device, computer equipment and a storage medium. The method comprises the following steps: acquiring an operation log corresponding to an upper layer data structure of a target database, wherein data operation information synchronous with the operation log in a source database is recorded in the operation log; analyzing the operation log, and determining operation data in the operation log; and updating data in a target database bottom layer data structure according to the operation data, wherein the updated data in the target database is consistent with the data in the source database. By adopting the method, the data synchronization of the target database and the source database can be realized, and the accuracy of the data processing of the target database is improved.

Description

Data synchronization method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of data warehouse technologies, and in particular, to a data synchronization method and apparatus, a computer device, and a storage medium.
Background
With the rapid development of internet technology, data information plays a crucial role in daily work and life, and therefore, more and more manufacturers, enterprises and the like select to construct data warehouses to store and manage the data information. The data warehouse (also called a target database) extracts data from the source databases, realizes data synchronization with the source databases, and further performs data processing and application according to data information in the data warehouse.
The traditional data synchronization mode of the data warehouse is an incremental data extraction mode, namely, new data in a source end database is extracted, so that data synchronization of the source end database and a target database is realized. And for the deleted data in the source end database, the deleted data in the target end database can be determined only by relying on the deleted data record provided by the source end database.
However, the source database and the target database are often not managed in a unified manner, and many source databases cannot provide data records for deleting data for the target database, so that deleted data exists in the target database, data information is asynchronous, and a data processing result is inaccurate.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a data synchronization method, apparatus, computer device and storage medium for solving the above technical problems.
A method of data synchronization, the method comprising:
acquiring an operation log corresponding to an upper layer data structure of a target database, wherein data operation information synchronous with the operation log in a source database is recorded in the operation log;
analyzing the operation log, and determining operation data in the operation log;
and updating data in a target database bottom layer data structure according to the operation data, wherein the updated data in the target database is consistent with the data in the source database.
In one embodiment, before obtaining the operation log corresponding to the upper layer data structure of the target database, the method further includes:
monitoring data updating operation in a source database, and executing the same data updating operation in an upper layer data structure of a target database through a data synchronization tool;
and writing the data updating operation into an operation log of an upper layer data structure of the target database.
In one embodiment, the obtaining an operation log corresponding to an upper layer data structure of a target database includes:
monitoring the file writing process of each operation log in the upper layer data structure of the target database;
and when the size of the operation log is kept unchanged in a preset time range, determining that the writing of the operation log is finished, and acquiring the operation log.
In one embodiment, the parsing the operation log and determining operation data in the operation log includes:
analyzing the operation log, and filtering the operation log according to a target operation identifier to obtain a target operation log;
and performing format conversion on the target operation log to obtain an operation log file in a target format, and determining operation data in the target format in the operation log file in the target format.
In one embodiment, the updating the data in the target database bottom layer data structure according to the operation data includes:
in the partition table, determining a target partition divided by a time identifier corresponding to the timestamp information, and storing the operation data to the target partition; the target partition also stores all data of the target database bottom layer data structure at the moment of the time identification record;
determining update data in all data of the target partition according to the operation data;
and updating the data in the underlying data structure of the target database based on the determined updating data.
In one embodiment, the method further comprises:
and storing the operation data into an update record table in a bottom layer data structure of the target database, wherein the update record table is used for recording data update information of the target database.
A data synchronization apparatus, the apparatus comprising:
the acquisition module is used for acquiring an operation log corresponding to an upper layer data structure of a target database, and data operation information synchronous with the operation log in the source database is recorded in the operation log;
the analysis module is used for analyzing the operation log and determining operation data in the operation log;
and the data synchronization module is used for updating data in a bottom data structure of a target database according to the operation data, wherein the updated data in the target database is consistent with the data in the source database.
In one embodiment, the apparatus further comprises:
the monitoring module is used for monitoring data updating operation in the source database and executing the same data updating operation in an upper layer data structure of the target database through a data synchronization tool;
and the updating module is used for writing the data updating operation into an operation log of an upper layer data structure of the target database.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring an operation log corresponding to an upper layer data structure of a target database, wherein data operation information synchronous with the operation log in a source database is recorded in the operation log;
analyzing the operation log, and determining operation data in the operation log;
and updating data in a target database bottom layer data structure according to the operation data, wherein the updated data in the target database is consistent with the data in the source database.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring an operation log corresponding to an upper layer data structure of a target database, wherein data operation information synchronous with the operation log in a source database is recorded in the operation log;
analyzing the operation log, and determining operation data in the operation log;
and updating data in a target database bottom layer data structure according to the operation data, wherein the updated data in the target database is consistent with the data in the source database.
According to the data synchronization method, the data synchronization device, the computer equipment and the storage medium, an operation log corresponding to an upper layer data structure of a target database is obtained, and data operation information synchronized with operation in a source database is recorded in the operation log; analyzing the operation log, and determining operation data corresponding to a target operation in the operation log; and updating data in a target database underlying data structure according to the operation data so as to enable the data in the target database to be consistent with the source database. By adopting the method, the real-time synchronization of the data of the upper layer data structure of the target database and the data of the source end database is realized through the operation log, so that the operation log of the upper layer data structure of the target database is monitored, the data in the lower layer data structure of the target database is updated based on the operation data in the operation log, the data synchronization of the source end database and the target database is realized, and the accuracy of the result of data processing in the application target database is ensured.
Drawings
FIG. 1 is a flow diagram illustrating a method for data synchronization in one embodiment;
FIG. 2 is a flow diagram of operation log synchronization between a target database and a source database, under an embodiment;
FIG. 3 is a flowchart illustrating the steps for obtaining an oplog of an upper-level data structure of a target database in one embodiment;
FIG. 4 is a flow diagram of the steps for determining operational data in an oplog in one embodiment;
FIG. 5 is a flowchart of the steps for performing a target database data update based on operational data in one embodiment;
FIG. 6 is a flow diagram that illustrates an example of a method for data synchronization in one embodiment;
FIG. 7 is a block diagram showing the structure of a data synchronization apparatus according to an embodiment;
FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
First, before specifically describing the technical solution of the embodiment of the present application, a technical background or a technical evolution context on which the embodiment of the present application is based is described. In general, in the field of data warehouse technology, the current technical background is: specifically, the data warehouse can periodically extract incremental data of the plurality of source end databases according to a preset sampling period, process the incremental data, store the processed data in a uniform format to a data warehouse bottom layer, and directly call standardized data in a uniform format when data processing and application are performed based on the data in the data warehouse. However, the data warehouse can only extract the incremental data of the source database, and for data in other data processing forms in the source database, for example, deleted data, deletion of the deleted data in the data warehouse cannot be realized based on the extracted incremental data, so that noise data such as deleted data exists in the data warehouse, and further, the result of data processing and application based on the data in the data warehouse is inaccurate. Based on this background, the applicant finds that a situation that deletion data is not recorded or the recording conditions are not uniform exists among a plurality of source-end databases through long-term research and development and experimental data collection, demonstration and verification, and further how to determine the deletion data of the source-end databases to realize data synchronization of a data warehouse becomes a difficult problem to be solved urgently at present. In addition, it should be noted that, the applicant finds that deletion data in the source database needs to be determined and the technical solutions described in the following embodiments are all subjected to a great deal of creative work.
In an embodiment, as shown in fig. 1, a data synchronization method is provided, and this embodiment is illustrated by applying the method to a server, it is to be understood that the method may also be applied to a terminal, and may also be applied to a system including the terminal and the server, and is implemented by interaction between the terminal and the server. In this embodiment, the method includes the steps of:
step 101, obtaining an operation log corresponding to an upper layer data structure of a target database.
And the operation log in the upper layer data structure of the target database records data operation information synchronized with the operation log in the source database.
In implementation, a distributed system is deployed with an Hbase data store (also referred to as a target database) that is a distributed, column-oriented open source database. The data warehouse includes a plurality of data layers (also referred to as data structures) for processing and storing data, for example, a general data warehouse includes three layers, namely, a source layer, a processing layer and a market layer, in order to facilitate understanding of the relationship of the layers in the data warehouse in the data storage, and at the same time, the number of layers of the data warehouse is not limited specifically, and the present embodiment describes the logical relationship among the layers in the data warehouse by using an upper layer data structure, a middle layer data structure and a lower layer data structure, wherein each data structure is not limited to one layer in the data warehouse.
Any server node deploying the distributed target database realizes synchronization of an operation log in the source database and an operation log in an upper layer data structure of the database of the server node through a data synchronization tool, namely, data in the source database is consistent with data in the upper layer data structure of the target database, and further only data in the upper layer data structure of the target database needs to be subjected to data bottom layer storage, so that the operation log of the upper layer data structure in the target database is obtained, and information transmission between the upper layer data structure and the lower layer data structure is realized through analysis of the operation log of the upper layer data structure.
Optionally, the plurality of source databases may include mysql databases, Oracle databases, and the like, and the number and the type of the source databases are not limited in the embodiment of the present application.
Step 102, analyzing the operation log, and determining operation data in the operation log.
In implementation, any server node (also referred to as a data node) in the distributed system parses an operation Log (WAL, Write-Ahead-Log) in an upper layer data structure of a target database, and determines corresponding operation data in the operation Log. Wherein, the WAL (operation log) is a log used by a RegionServer (data node) in the Hbase target database to record the operation content during the process of data insertion and deletion.
And 103, updating data in the underlying data structure of the target database according to the operation data, wherein the updated data in the target database is consistent with the data in the source database.
In implementation, the distributed system establishes an incidence relation between the operation data and original data stored in a target database bottom layer data structure according to the determined operation data, and updates data in the original data in the target database bottom layer data structure based on the incidence relation, so that the updated data in the target database is consistent with the data in the source database.
In the data synchronization method, an operation log corresponding to an upper layer data structure of the target database is obtained, and data operation information synchronized with the operation log in the source database is recorded in the operation log. Analyzing the operation log, and determining operation data in the operation log; and updating the data in the underlying data structure of the target database according to the operation data, so that the updated data in the target database is consistent with the data in the source database. By adopting the method, the operation log in the upper layer data structure of the target database is triggered to be consistent with the operation log of the source end database, so that the data in the lower layer data structure of the target database is updated based on the operation data in the operation log of the upper layer data structure of the target database, the synchronization of the application data in the lower layer data structure of the source end database and the target database is realized, and the result accuracy of data processing and application in the target database is improved.
In one embodiment, as shown in fig. 2, before obtaining the operation log in the target database upper layer data structure and realizing that the data in the target database upper layer data structure is consistent with the data in the underlying data structure, any data operation in the target database upper layer data structure is based on a data operation triggered by maintaining consistency with a plurality of source databases, and therefore, before step 101, the method further includes the following steps:
step 201, monitoring data updating operation in the source database, and executing the same data updating operation in the upper layer data structure of the target database through the data synchronization tool.
In implementation, the data stored in the target database upper layer data structure is data that is kept synchronized with the multiple source databases by a data real-time synchronization tool (referred to simply as a data synchronization tool). Specifically, when a data update operation (e.g., a data delete operation) occurs in any source database, a source operation log in the source database records the data update operation and data operation information corresponding to the data update operation, so that the data real-time synchronization tool reads the source operation log, identifies operation data information of the data update operation therein, and triggers an upper data structure of a target database (i.e., a data warehouse of a target end) to perform the same data update operation, for example, after identifying that the source database has a delete operation, the target database is triggered to perform the same data delete operation. And the target database updates the operation log in the upper layer data structure of the target database according to the deletion operation, so that the operation log in the upper layer data structure and the operation log of the source database are kept synchronous in real time.
The data updating operation may be a replacement data updating operation or a deletion data updating operation, and the type of the data updating operation is not limited in the embodiment of the present application.
For example, canal (canal/pipe/trench) is mainly used for analyzing incremental logs of the source database, providing incremental data subscription and consumption, and performing real-time data synchronization. OGG (oracle golden language) is log-based structured data copying software, can realize real-time capture, transformation and delivery of a large amount of transaction data, and realizes data synchronization of a data structure on an upper layer of a source database and a target database. Different data synchronization tools are matched corresponding to different source databases, for example, if the source database is a mysql database, the canal data synchronization tool can be used for monitoring data updating operation and data synchronization. The source database is an Oracle database, and the monitoring of data updating operation and data synchronization can be performed by using an OGG data synchronization tool.
Step 202, writing the data updating operation into an operation log of an upper layer data structure of the target database.
In practice, a logging mechanism in the target database upper layer data structure writes the operation data information resulting from any data update operations that occur in the target database upper layer data structure into an operation log file of the upper layer data structure.
In this embodiment, the data updating operation occurring in the source database is synchronized to the target database through the data synchronization tool, and a data updating record mechanism of the source database does not need to be managed uniformly, so that the data in the source database is consistent with the data in the target database.
In an embodiment, as shown in fig. 3, according to the method in the foregoing embodiment, by monitoring with a data synchronization tool, it may be achieved that data in a source database is consistent with data in an upper layer data structure of a target database, and further, it is only necessary to achieve synchronization between data in the upper layer data structure of the target database and data in a bottom layer data structure, that is, it is possible to achieve consistency between applicable data in the target database and data in the source database, and then the specific processing procedure in step 101 includes:
step 301, monitoring the file writing process of each operation log in the upper layer data structure of the target database.
In implementation, since a distributed system includes a plurality of server nodes (master node (master) and data node (regionserver)), a target database (i.e., a data warehouse) is co-deployed among the plurality of server nodes, any server node can be used as a storage space in the data warehouse for data storage and processing, and when a user needs to synchronize data of a target database, the operation log in the upper layer data structure of the target database is downloaded to the local server in advance for local analysis and processing, because the operation log has the function of real-time writing, and the writing paging mechanism based on the operation log file, each time one operation log file is written, and processing the operation log, so that the server node monitors the operation log writing process of the upper data structure of the target database by running the shell script.
Step 302, when the size of the operation log is kept unchanged within a preset time range, determining that the writing of the operation log is completed, and acquiring the operation log.
In implementation, when the distributed system monitors the writing process of the operation log, the file size writing change of the operation log is used as a standard for whether the file writing is finished, when the size of the operation log is kept unchanged in a preset time range, the completion of the writing of the operation log is determined, the operation log with the writing finished is obtained, the subsequent analysis processing is carried out, and otherwise, the updating and writing of the operation log file are continuously waited.
Optionally, the operation log in the upper layer data structure of the target database may be obtained under a walsh operation log directory, where the walsh directory stores the operation log based on time sequence.
In one embodiment, as shown in fig. 4, the specific processing procedure of step 102 includes:
step 401, analyzing the operation log, and filtering in the operation log according to the target operation identifier to obtain a target operation log.
In implementation, the distributed system analyzes and processes the operation logs downloaded to the local, and identifies the operation logs carrying the target operation identifier in the locally downloaded operation logs according to the target screening condition, wherein the target operation identifier can represent the complete data operation information of the target operation. And then, performing operation log filtering through the target operation identifier to obtain a target operation log finally used for data synchronization. For example, the operation log for deleting data is identified from the operation logs downloaded locally, and the target operation log corresponding to the deletion operation is identified and filtered from all the operation logs according to the data operation information corresponding to the deletion data.
Optionally, for an operation log which has been downloaded to the local in the operation log directory, the distributed system may delete the same operation log in the directory cache, and reduce the cache pressure.
Step 402, performing format conversion on the target operation log to obtain an operation log file in the target format, and determining operation data in the target format in the operation log file in the target format.
In implementation, a preconfigured Python script service is run in a distributed system to implement format conversion of a target operation log, that is, a filtered and screened WAL (pre-written log) is converted into a standard json format file, and then the json (JavaScript Object Notation) format file is converted into a CSV (Comma Separated Value) format file available in a target database, and the CSV format operation data is identified and extracted in the CSV format file with a vlen main key as 0.
In this embodiment, the target operation logs are screened from all the operation logs through the target operation identifiers, and the operation data in the target format is determined by analyzing and converting the format of the target operation logs, so that the operation data in the operation logs is determined and extracted, and the operation data is used for updating the original data in the underlying data structure of the target database according to the operation data.
In an embodiment, as shown in fig. 5, the operation data carries timestamp information, a partition table of an underlying data structure of the target database includes at least one partition partitioned based on a time identifier, and the specific processing procedure of step 103 includes:
step 501, in the partition table, determining a target partition divided by a time identifier corresponding to the timestamp information, and storing the operation data to the target partition (hive). The target partition also stores all data of a target database bottom layer data structure at the moment of time identification record.
In implementation, each layer of data structure of the target database is configured with a partition table for storing data, where the partition table includes at least one partition partitioned based on time identification (may also be referred to as a partition sub-table of the partition table), for example, the time identification is divided into sub-tables for each day in units of days (24 hours), and further, in an actual data processing process of the middle layer data structure or the bottom layer data structure of the target database, a new sub-table for each day may be created to store newly added data of the source database extracted on the same day.
Furthermore, because the operation data determined in the operation log carries the timestamp information, a target partition (also referred to as a target partition sub-table) is determined in a partition table of a data structure at the bottom of the target database according to the association relationship between the timestamp information and the corresponding time identifier. Specifically, if the operation log of the previous day (for example, 7-month-8-day early morning 0) is acquired and analyzed, the operation data determined in the operation log carries the timestamp information of 7-month-8-day-0, and the operation data carrying the timestamp information of 0 is stored in the target partition of the current date (7-month-9-day) according to the time association relationship between the timestamp information and the current date (7-month-9-day). In addition, current full data corresponding to the current date is stored in the target partition, and the current full data is obtained by data combination of the current newly added data and the full data of the previous day of the target database. At this time, the full amount of data of the current date and the operation data exist in the target partition divided by the time identification of the current date.
Step 502, determining update data in all data of the target partition according to the operation data.
In implementation, in the target partition, according to the operation data, an incidence relation between the current full data in the target partition and the operation data is established, and then the update data to be updated existing in the current full data in the target partition is determined based on the established incidence relation.
Step 503, updating the data in the underlying data structure of the target database based on the determined update data.
In implementation, after determining update data in the current full data of the target database underlying data structure, updating the data in the target database underlying data structure according to a corresponding update operation, for example, if the determined update data is deletion data to be deleted, deleting the determined deletion data from the current full data stored in the target database underlying data structure, and obtaining the data in the updated target database underlying data structure.
In this embodiment, in the partition table of the underlying data structure of the target database, update data in the full amount of data in the target partition is identified based on the operation data, and the update data is updated to obtain the updated full amount of data in the underlying data structure of the target database, so that the timeliness of data synchronization of the target database is ensured.
In one embodiment, the method further comprises: and storing the operation data into an update record table in a bottom data structure of the target database, wherein the update record table is used for recording data update information of the target database.
In implementation, for operation data of a data update operation of a source database, for example, delete data corresponding to the data delete operation, since a plurality of source databases are independently managed by each data generation peer, and not every source database can backup the operation data of the data update operation, in the embodiment of the present application, an update record table is provided in a target database, the operation data corresponding to the data update operation (for example, the data delete operation) is stored in the update record table in the underlying data structure of the target database, and the target database can perform data replication on the target database according to information recorded in the update record table.
Optionally, the target database may separately restore the deleted data recorded in the update record table, so as to implement a backup function of the deleted data.
In this embodiment, the full data in the target data base data structure is updated by establishing an association relationship between the operation data and the update data in the target data base data structure, so as to obtain the updated full data, and the data in the target data base data structure and the data in the upper data structure are consistent with the data in the source data structure.
In one embodiment, as shown in fig. 6, an example of a data synchronization method is provided, and the specific processing steps of the example are as follows:
step 601, synchronizing data in the upper layer data structure of the source database and the target database through the data synchronization tool.
Step 602, monitoring an operation log in an upper layer data structure of the target database, judging whether the writing process of the operation log is finished according to whether the size of the operation log file changes within a preset time, and downloading the operation log to the local if the writing process of the operation log is finished.
Step 603, analyzing the operation logs downloaded to the local, filtering and screening the local operation logs, converting the screened operation logs into json standard files, further converting the json standard files into csv format files, extracting operation data in the csv format files, and storing the operation data into a database partition table (hive).
And step 604, identifying data to be updated in the original data contained in the lower-layer data structure of the target database according to the operation data stored in the partition table.
Step 605, updating the data in the lower-layer data structure of the target database to obtain the data of the updated and synchronized lower-layer data structure of the target database.
Step 606, retaining the operation data, and storing the operation data into the update record data table.
It should be understood that although the steps in the flowcharts of fig. 1 to 6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1 to 6 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 7, there is provided a data synchronization apparatus 700, including: an obtaining module 710, a parsing module 720 and a data synchronization module 730, wherein:
an obtaining module 710, configured to obtain an operation log corresponding to an upper layer data structure of a target database, where data operation information synchronized with the operation log in the source database is recorded in the operation log;
the analysis module 720 is configured to analyze the operation log, and determine operation data in the operation log;
and the data synchronization module 730 is configured to update data in the underlying data structure of the target database according to the operation data, where the updated data in the target database is consistent with the data in the source database.
In one embodiment, the apparatus 700 further comprises:
the synchronization module is used for monitoring data updating operation in the source end database and executing the same data updating operation in an upper layer data structure of the target database through a data synchronization tool;
and the writing module is used for writing the data updating operation into an operation log of an upper layer data structure of the target database.
In an embodiment, the obtaining module 710 is specifically configured to monitor a file writing process of each operation log in an upper layer data structure of a target database;
and when the size of the operation log is kept unchanged in a preset time range, determining that the writing of the operation log is finished, and acquiring the operation log.
In an embodiment, the parsing module 720 is specifically configured to parse the operation log, and filter the operation log according to the target operation identifier to obtain a target operation log;
and performing format conversion on the target operation log to obtain an operation log file in the target format, and determining operation data in the target format in the operation log file in the target format.
In one embodiment, the data synchronization module 730 is specifically configured to determine, in the partition table, a target partition divided by a time identifier corresponding to the timestamp information, and store the operation data to the target partition; the target partition also stores all data of a target database bottom layer data structure at the moment of time identification record;
determining update data in all data of the target partition according to the operation data;
and updating the data in the underlying data structure of the target database based on the determined update data.
In one embodiment, the apparatus 700 further comprises:
and the updating record module is used for storing the operation data into an updating record table in a bottom data structure of the target database, and the updating record table is used for recording the data updating information of the target database.
For specific limitations of the data synchronization apparatus 700, reference may be made to the above limitations of the data synchronization method, which is not described herein again. The various modules in the data synchronization apparatus 700 described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The target database of the computer device is used for storing the service data of the source databases. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data synchronization method.
Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for synchronizing data, the method comprising:
acquiring an operation log corresponding to an upper layer data structure of a target database, wherein data operation information synchronous with the operation log in a source database is recorded in the operation log;
analyzing the operation log, and determining operation data in the operation log;
and updating data in a target database bottom layer data structure according to the operation data, wherein the updated data in the target database is consistent with the data in the source database.
2. The method of claim 1, wherein before obtaining the operation log corresponding to the upper layer data structure of the target database, the method further comprises:
monitoring data updating operation in a source database, and executing the same data updating operation in an upper layer data structure of a target database through a data synchronization tool;
and writing the data updating operation into an operation log of an upper layer data structure of the target database.
3. The method of claim 1, wherein obtaining the operation log corresponding to the upper layer data structure of the target database comprises:
monitoring the file writing process of each operation log in the upper layer data structure of the target database;
and when the size of the operation log is kept unchanged in a preset time range, determining that the writing of the operation log is finished, and acquiring the operation log.
4. The method of claim 1, wherein parsing the oplog to determine operational data in the oplog comprises:
analyzing the operation log, and filtering the operation log according to a target operation identifier to obtain a target operation log;
and performing format conversion on the target operation log to obtain an operation log file in a target format, and determining operation data in the target format in the operation log file in the target format.
5. The method according to claim 1 or 4, wherein the operation data carries timestamp information, the partition table of the target database underlying data structure includes at least one partition partitioned based on the time identifier, and the updating the data in the target database underlying data structure according to the operation data includes:
in the partition table, determining a target partition divided by a time identifier corresponding to the timestamp information, and storing the operation data to the target partition; the target partition also stores all data of the target database bottom layer data structure at the moment of the time identification record;
determining update data in all data of the target partition according to the operation data;
and updating the data in the underlying data structure of the target database based on the determined updating data.
6. The method of claim 1, further comprising:
and storing the operation data into an update record table in a bottom layer data structure of the target database, wherein the update record table is used for recording data update information of the target database.
7. A data synchronization apparatus, the apparatus comprising:
the acquisition module is used for acquiring an operation log corresponding to an upper layer data structure of a target database, and data operation information synchronous with the operation log in the source database is recorded in the operation log;
the analysis module is used for analyzing the operation log and determining operation data in the operation log;
and the data synchronization module is used for updating data in a bottom data structure of a target database according to the operation data, wherein the updated data in the target database is consistent with the data in the source database.
8. The apparatus of claim 7, further comprising:
the monitoring module is used for monitoring data updating operation in the source database and executing the same data updating operation in an upper layer data structure of the target database through a data synchronization tool;
and the updating module is used for writing the data updating operation into an operation log of an upper layer data structure of the target database.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN202110822595.9A 2021-07-21 2021-07-21 Data synchronization method and device, computer equipment and storage medium Pending CN113449043A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110822595.9A CN113449043A (en) 2021-07-21 2021-07-21 Data synchronization method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110822595.9A CN113449043A (en) 2021-07-21 2021-07-21 Data synchronization method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113449043A true CN113449043A (en) 2021-09-28

Family

ID=77816929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110822595.9A Pending CN113449043A (en) 2021-07-21 2021-07-21 Data synchronization method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113449043A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020128996A1 (en) * 2001-03-09 2002-09-12 David Reed System and method for maintaining large-grained database concurrency with a log monitor incorporating dynamically redefinable business logic
CN104298760A (en) * 2014-10-23 2015-01-21 北京京东尚科信息技术有限公司 Data processing method and data processing device applied to data warehouse
CN109241174A (en) * 2018-06-26 2019-01-18 东软集团股份有限公司 Method of data synchronization, device, readable storage medium storing program for executing and electronic equipment
CN109933630A (en) * 2019-03-19 2019-06-25 武汉达梦数据库有限公司 Database data real-time synchronization method and equipment
CN110297866A (en) * 2019-05-20 2019-10-01 平安普惠企业管理有限公司 Method of data synchronization and data synchronization unit based on log analysis
CN110807067A (en) * 2019-09-29 2020-02-18 北京淇瑀信息科技有限公司 Data synchronization method, device and equipment for relational database and data warehouse
CN111209344A (en) * 2020-02-07 2020-05-29 浪潮软件股份有限公司 Data synchronization method and device
CN112286941A (en) * 2020-12-23 2021-01-29 武汉物易云通网络科技有限公司 Big data synchronization method and device based on Binlog + HBase + Hive
CN112434043A (en) * 2020-12-02 2021-03-02 新华三大数据技术有限公司 Data synchronization method, device, electronic equipment and medium
CN113094442A (en) * 2021-04-30 2021-07-09 广州虎牙科技有限公司 Full data synchronization method, device, equipment and medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020128996A1 (en) * 2001-03-09 2002-09-12 David Reed System and method for maintaining large-grained database concurrency with a log monitor incorporating dynamically redefinable business logic
CN104298760A (en) * 2014-10-23 2015-01-21 北京京东尚科信息技术有限公司 Data processing method and data processing device applied to data warehouse
CN109241174A (en) * 2018-06-26 2019-01-18 东软集团股份有限公司 Method of data synchronization, device, readable storage medium storing program for executing and electronic equipment
CN109933630A (en) * 2019-03-19 2019-06-25 武汉达梦数据库有限公司 Database data real-time synchronization method and equipment
CN110297866A (en) * 2019-05-20 2019-10-01 平安普惠企业管理有限公司 Method of data synchronization and data synchronization unit based on log analysis
CN110807067A (en) * 2019-09-29 2020-02-18 北京淇瑀信息科技有限公司 Data synchronization method, device and equipment for relational database and data warehouse
CN111209344A (en) * 2020-02-07 2020-05-29 浪潮软件股份有限公司 Data synchronization method and device
CN112434043A (en) * 2020-12-02 2021-03-02 新华三大数据技术有限公司 Data synchronization method, device, electronic equipment and medium
CN112286941A (en) * 2020-12-23 2021-01-29 武汉物易云通网络科技有限公司 Big data synchronization method and device based on Binlog + HBase + Hive
CN113094442A (en) * 2021-04-30 2021-07-09 广州虎牙科技有限公司 Full data synchronization method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN109241175B (en) Data synchronization method and device, storage medium and electronic equipment
CN112000737B (en) Data synchronization method, system, terminal and storage medium based on multi-cloud management
CN109460349B (en) Test case generation method and device based on log
CN107040578B (en) Data synchronization method, device and system
CN109522290B (en) HBase data block recovery and data record extraction method
CN110651265A (en) Data replication system
CN110569214B (en) Index construction method and device for log file and electronic equipment
CN105205053A (en) Method and system for analyzing database incremental logs
CN112559475B (en) Data real-time capturing and transmitting method and system
CN110727724B (en) Data extraction method and device, computer equipment and storage medium
CN104636242A (en) Method for automatically deleting repeated content in system logs on basis of Linux operating system
CN104714880B (en) Daily record data transmission method, system and log server
CN114490554A (en) Data synchronization method and device, electronic equipment and storage medium
CN110704442A (en) Real-time acquisition method and device for big data
US20220229821A1 (en) Data restoration using dynamic data structure altering
CN113449043A (en) Data synchronization method and device, computer equipment and storage medium
CN110245037B (en) Hive user operation behavior restoration method based on logs
CN111858767A (en) Synchronous data processing method, device, equipment and storage medium
CN112256649A (en) Medical file storage method and device
CN109783571B (en) Data processing method, device, computer equipment and storage medium for isolated environment
CN115858471A (en) Service data change recording method, device, computer equipment and medium
CN107633056A (en) Data managing method for power information acquisition terminal
US11455294B2 (en) Information lifecycle management notification framework
CN111897877A (en) High-performance and high-reliability data sharing system and method based on distributed thought
CN111176901A (en) HDFS deleted file recovery method, terminal device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination