CN115952238A - Data synchronization method and device - Google Patents

Data synchronization method and device Download PDF

Info

Publication number
CN115952238A
CN115952238A CN202310109936.7A CN202310109936A CN115952238A CN 115952238 A CN115952238 A CN 115952238A CN 202310109936 A CN202310109936 A CN 202310109936A CN 115952238 A CN115952238 A CN 115952238A
Authority
CN
China
Prior art keywords
target
database
source
target operation
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310109936.7A
Other languages
Chinese (zh)
Inventor
陈国杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Financial Technology Co Ltd
Original Assignee
Bank of China Financial Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Financial Technology Co Ltd filed Critical Bank of China Financial Technology Co Ltd
Priority to CN202310109936.7A priority Critical patent/CN115952238A/en
Publication of CN115952238A publication Critical patent/CN115952238A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The method comprises the steps of connecting a source database and a target database according to configuration parameters, wherein the configuration parameters are used for obtaining source data of the source database and target data of the target database, reading log information of the source database to obtain a first target operation, the log information indicates change records of the source database, and replaying the first target operation in the target database.

Description

Data synchronization method and device
Technical Field
The present application relates to the field of databases, and in particular, to a method and an apparatus for data synchronization.
Background
In the data migration process or in the development project using the databases, synchronous transmission of data between different databases is often performed, and data in a source database needs to be transmitted to a target database.
In the prior art, data is transmitted through an interface file, or an ETL (extract-transform-load) job is developed (which is an abbreviation used for describing a process of extracting, converting, and loading data from a source end to a destination end), and the ETL job is scheduled to transmit the data. However, the synchronization of data by the prior art has a disadvantage of poor timeliness.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for data synchronization, so as to achieve the purpose of improving timeliness of data synchronization.
The data synchronization method provided by the application is realized as follows:
connecting the source database and the target database according to configuration parameters, wherein the configuration parameters are used for acquiring source data of the source database and target data of the target database;
reading log information of a source database to obtain a first target operation, wherein the log information indicates a change record of the source database;
the first target operation is replayed in the target database.
Optionally, reading log information of the source database, and after obtaining the first target operation, further includes:
filtering the first target operation according to the log information of the source database to obtain a second target operation;
replaying the first target operation in the target database, including:
the second target operation is replayed in the target database.
Optionally, the first target operation comprises a plurality of operations, replaying the first target operation in the target database, including:
the first target operation is replayed in the target database in chronological order of the plurality of operations.
Optionally, the first target operation comprises a plurality of operations, and replaying the first target operation in the target database comprises:
segmenting the first target operation to obtain a plurality of sub-operations;
and replaying the plurality of sub-operations to the target database in batches.
Optionally, the configuration parameters include:
the system comprises source configuration parameters and target configuration parameters, wherein the source configuration parameters are used for acquiring source data of a source database, and the target configuration parameters are used for acquiring target data of a target database.
Optionally, the source configuration parameters include a drive class name, a uniform resource locator, a server internet protocol IP address, a listening port, and a table component of the source database, and the target configuration parameters include a drive class name, a uniform resource locator, a server IP address, a listening port, and a table component of the target database.
Optionally, replaying the first target operation in the target database includes:
and if the first target operation is interrupted in the replaying process, the first target operation is continuously replayed to the target database from the breakpoint through checkpoint.
Optionally, the first target operation comprises:
at least one of a table structure change operation, an insert data operation, a delete data operation, and a modify data operation.
The present application further provides a data synchronization apparatus, including: a connection unit, a reading unit and a playback unit;
the connection unit is used for connecting the source database and the target database according to configuration parameters, and the configuration parameters are used for acquiring source data of the source database and target data of the target database;
the reading unit is used for reading the log information of the source database to obtain a first target operation, and the log information indicates the change record of the source database;
a replay unit for replaying the first target operation in the target database.
The present application further provides a computer device, comprising: the processor is coupled with the memory, and the memory stores at least one computer program instruction which is loaded and executed by the processor so as to enable the computer equipment to realize the data synchronization method.
Therefore, the beneficial effects of the application are: the method comprises the steps of connecting a source database and a target database according to configuration parameters, wherein the configuration parameters are used for acquiring source data of the source database and target data of the target database, reading log information of the source database to obtain a first target operation, the log information indicates change records of the source database, and replaying the first target operation in the target database.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a first embodiment of the present application;
FIG. 2 is a flow chart of a second embodiment of the present application;
FIG. 3 is a flow chart of a third embodiment of the present application;
FIG. 4 is a flow chart of a fourth embodiment of the present application;
FIG. 5 is a schematic view of an apparatus of the present application;
FIG. 6 is a schematic diagram of a computer device of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
The inventor finds that the method is only suitable for full-scale refreshing and additional writing of data in a database by transferring data through an interface file or developing an ETL job and scheduling the ETL job to transfer the data, and is difficult to acquire updated/deleted data changes, so that the method has the defect of poor timeliness. According to the method and the device, the change record of the source database is obtained by reading the log information of the source database, and the change record is replayed in the target database according to the operation recorded in the change record, so that the timeliness of data synchronization is improved.
In the embodiment of the present application, the device for implementing data synchronization may include, but is not limited to, a computer device.
The computer device may include: the processor is coupled with the memory, and the memory stores at least one computer program instruction which is loaded and executed by the processor so as to enable the computer equipment to realize the data synchronization method.
Referring to fig. 1, the first embodiment of the present application includes the following specific steps:
s101: and the computer is connected with the source database and the target database according to the configuration parameters.
The configuration parameters are used for acquiring source data of the source database and target data of the target database.
The configuration parameters may include source configuration parameters and target configuration parameters, the source configuration parameters are used to obtain source data of the source database, and the target configuration parameters are used to obtain target data of the target database.
The source configuration parameters may include a driver-class name (driver-name), a Uniform Resource Locator (URL), a username (user name), a password (password), a server Internet Protocol (IP) address, a port (port), a table component (tableList), and a time zone (servetimezone) of the source database. The source configuration parameters can be increased and configured with other parameters according to actual requirements.
The drive class name, the uniform resource locator, the user name and the password of the source database are related configurations of the data source, and the server IP address, the monitoring port, the table component and the time zone of the source database are configured of the flink-cdc.
The target configuration parameters may include a drive class name, a uniform resource locator, a username, a password, a server IP address, a listening port, a form component, and a time zone for the target database. The target configuration parameters can be added with other parameters according to actual requirements.
The driver class name, the uniform resource locator, the user name and the password of the target database are related configurations of a data source, and the server IP address, the monitoring port, the table component and the time zone of the target database are configured of a flink-cdc.
The source database may include, but is not limited to: mySQL, postgreSQL, oracle, mongoDB, comming Soon, IBM DB2, vitess, tiDB, and SQL Server.
The target database may include, but is not limited to: kafka, pulsar, hudi, ibrerg, tiDB, mySQL, DORIS, HIVE, clickHouse, and Hologres.
S102: and the computer reads the log information of the source database to obtain a first target operation.
The log information indicates a change record of the source database. The log information may be information in a transaction log, a binary log or a pre-written log, and the log information may also be set as information in other logs according to actual requirements.
The first target operation may include at least one of a table structure change operation, an insert data operation, a delete data operation, and a modify data operation.
In some implementation manners, after the computer reads the log information of the source database to obtain the first target operation, the computer also filters the first target operation according to the log information of the source database to obtain the second target operation.
S103: the computer replays the first target operation in the target database.
In some implementations, the computer replays the second target operation in the target database.
In other implementations, the first target operation comprises a plurality of operations, and the computer replays the first target operation in the target database in chronological order of the plurality of operations.
In other implementations, the second target operation comprises a plurality of operations, and the computer replays the second target operation in the target database in chronological order of the plurality of operations.
In other implementations, the first target operation includes multiple operations, the computer splits the first target operation to obtain multiple sub-operations, and the multiple sub-operations are replayed in bulk to the target database.
In other implementations, the second target operation includes a plurality of operations, the computer performs a slicing operation on the second target operation to obtain a plurality of second sub-operations, and the plurality of second sub-operations are replayed to the target database in batch.
By dividing the first target operation or the second target operation and then replaying the divided first target operation or second target operation to the target database in batch, the efficiency of replay operation can be improved, the performance is higher, meanwhile, if the replay failure of the sub-operation or the second sub-operation exists, other sub-operations or the second sub-operation can take over to process, and the stability is improved.
In other implementations, if the replay of the first target operation is interrupted, the computer continues to replay the first target operation to the target database from the breakpoint through checkpoint.
In other implementations, if the replay of the second target operation is interrupted, the computer continues to replay the second target operation to the target database from the breakpoint.
In the first embodiment of the present application, by using preset configuration parameters, log information of a source database can be automatically collected, and because a change record of the source database is recorded in the log information of the source database, a first target operation, that is, a change operation in the source database is replayed in a target database, so that timeliness of data synchronization is improved.
Since the source database may perform various modification operations, the second embodiment sets the first target operation to include a plurality of operations, and explains the implementation of the present application.
Referring to fig. 2, the second embodiment of the present application includes the following specific steps:
s201: and the computer is connected with the source database and the target database according to the configuration parameters.
S202: and the computer reads the log information of the source database to obtain a first target operation.
The first target operation comprises a plurality of operations. The first target operation may comprise a plurality of table structure change operations, a plurality of insert data operations, a plurality of delete data operations, or a plurality of modify data operations, and the first target operation may also be any combination of a plurality of table structure change operations, insert data operations, delete data operations, and modify data operations.
S203: the computer replays the first target operation in the target database in chronological order of the plurality of operations.
In some implementations, the computer splits the first target operation to obtain a plurality of sub-operations, and replays the plurality of sub-operations to the target database in batch according to a time sequence of the plurality of sub-operations.
S204: and if the first target operation is interrupted in the replaying process, the computer continuously replays the first target operation to the target database from the breakpoint through checkpoint.
In the second embodiment of the present application, corresponding table updating operations are performed in the target database in sequence according to the change sequence of the source database updating operations, so that the timeliness of data synchronization in the database can be improved.
Since there is a case where the target database has synchronized data in the source database, the second embodiment describes an implementation of the present application for this case.
Referring to fig. 3, the third embodiment of the present application includes the following specific steps:
s301: and the computer is connected with the source database and the target database according to the configuration parameters.
S302: and the computer reads the log information of the source database to obtain a first target operation.
The first target operation comprises a plurality of operations.
S303: and the computer filters the first target operation according to the log information of the source database to obtain a second target operation.
The second target operation may be a single operation or may include multiple operations. The setting of the second target operation in this embodiment includes a plurality of operations.
S304: and the computer divides the second target operation to obtain a plurality of second sub-operations.
S305: and the computer replays the second sub-operations to the target database in batches.
In some implementations, the computer replays the plurality of sub-operations to the target database in bulk in a chronological order of the plurality of second sub-operations.
In other implementations, the second target operation is a single operation, and the computer replays the second target operation in the target database.
In the third embodiment of the present application, by filtering the first target operation according to the log information of the source database, a newly added change operation of the source database with respect to the target database can be obtained, and a full-table scan is not required to be initiated for filtering, so that higher efficiency and performance are achieved, and the method has the advantages of low delay and no increase in database load.
The fourth embodiment explains specific implementation of the present application by taking a source database as MySQL and a target database as Oracle as examples.
Referring to fig. 4, the fourth embodiment of the present application includes the following specific steps:
s401: and the computer is connected with the MySQL database and the Oracle database according to the configuration parameters.
The configuration parameters include: the drive class name, uniform resource locator, username, password, server IP address, listening port, form component, and time zone for MySQL database, and the drive class name, uniform resource locator, username, password, server IP address, listening port, form component, and time zone for Oracle database.
S402: and reading the log information of the MySQL database by the computer to obtain a first target operation.
The first target operation includes: deleting the first data, inserting the second data, changing the table structure field, modifying the third field, and adding the fourth field (ordered by the chronological order of the operations).
In some implementations, the computer reads the log information of the MySQL database in a distributed manner through the flink framework to obtain the first target operation.
S403: and the computer filters the first target operation according to the log information of the MySQL database to obtain a second target operation.
The second target operation includes: change the table structure field, modify the third field, and add the fourth field (ordered by chronological order of the operations).
S404: and the computer divides the second target operation to obtain a plurality of second sub-operations.
The plurality of second sub-operations are: change the table structure field, modify the third field, and add the fourth field (sorted by chronological order of the operations).
S405: the computer changes the table structure field in the Oracle database.
S406: the computer modifies the third field in the Oracle database.
S407: the computer adds a fourth field in the Oracle database.
In some implementations, the computer changes the table structure field, modifies the third field, and adds the fourth field in the Oracle database through the flink framework distribution.
In other implementations, the computer performs Resource and job scheduling via apache hadoop yann (yann is an abbreviation for Yet other Resource coordinator, another Resource coordinator).
S408: and if the second target operation is interrupted in replay, the computer continuously replays the second target operation to the Oracle database from the breakpoint.
In the fourth embodiment of the present application, breakpoint resuming in the process of data synchronization is supported through a checkpoint internal event.
Referring to fig. 5, the present application provides an apparatus 500 for data synchronization, comprising: a connection unit 501, a reading unit 502 and a playback unit 503.
The connection unit 501: the database management system is used for connecting the source database and the target database according to the configuration parameters, and the configuration parameters are used for acquiring the source data of the source database and the target data of the target database.
The reading unit 502: the method and the device are used for reading log information of the source database to obtain a first target operation, wherein the log information indicates a change record of the source database.
The playback unit 503: for replaying the first target operation in the target database.
Alternatively, the playback unit 503: and also for replaying the second target operation in the target database.
Alternatively, the playback unit 503: and for replaying the first target operation in the target database in chronological order of the plurality of operations.
Alternatively, the playback unit 503: and the method is also used for replaying a plurality of sub-operations to the target database in batches.
Alternatively, the playback unit 503: and if the first target operation is interrupted in replay, continuing to replay the first target operation to the target database from the breakpoint through checkpoint.
Optionally, the apparatus 500 for data synchronization further comprises: a filtering unit 504 and/or a slicing unit 505.
The filtering unit 504: and the system is used for filtering the first target operation according to the log information of the source database to obtain a second target operation.
The slicing unit 505: the method is used for segmenting the first target operation to obtain a plurality of sub-operations.
With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.
It should be noted that: in the data synchronization apparatus provided in the foregoing embodiment, when the data synchronization function is implemented, only the division of each functional module is illustrated, and in practical applications, the function distribution may be completed by different functional modules as needed, that is, the internal structure of the data synchronization apparatus is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the data synchronization apparatus and the data synchronization method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.
Referring to fig. 6, the present application further provides a computer device 600, comprising: a processor 601 and a memory 602.
The processor 601 is coupled to the memory 602, and the memory 602 stores therein at least one computer program instruction, which is loaded and executed by the processor 601 to enable the computer apparatus to implement the data synchronization method.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of data synchronization, the method comprising:
connecting a source database and a target database according to configuration parameters, wherein the configuration parameters are used for acquiring source data of the source database and target data of the target database;
reading log information of the source database to obtain a first target operation, wherein the log information indicates a change record of the source database;
replaying the first target operation in the target database.
2. The method of claim 1, wherein after the reading the log information of the source database and obtaining the first target operation, the method further comprises:
filtering the first target operation according to the log information of the source database to obtain a second target operation;
replaying said first target operation in said target database, including:
replaying the second target operation in the target database.
3. The method of claim 1, wherein the first target operation comprises a plurality of operations, and wherein replaying the first target operation in the target database comprises:
replaying said first target operation in said target database in a chronological order of said plurality of operations.
4. The method of claim 1, wherein the first target operation comprises a plurality of operations, and wherein replaying the first target operation in the target database comprises:
segmenting the first target operation to obtain a plurality of sub-operations;
and replaying the plurality of sub-operations to the target database in batch.
5. The method of claim 1, wherein the configuration parameters comprise:
the source configuration parameters are used for acquiring source data of the source database, and the target configuration parameters are used for acquiring target data of the target database.
6. The method of claim 5, wherein the source configuration parameters comprise a driver class name, a uniform resource locator, a server Internet Protocol (IP) address, a listening port, and a table component of the source database, and wherein the target configuration parameters comprise a driver class name, a uniform resource locator, a server IP address, a listening port, and a table component of the target database.
7. The method of claim 1, wherein replaying the first target operation in the target database comprises:
and if the first target operation is interrupted in the replaying process, continuing to replay the first target operation to the target database from a breakpoint through checkpoint.
8. The method of any of claims 1-7, wherein the first target operation comprises:
at least one of a table structure change operation, an insert data operation, a delete data operation, and a modify data operation.
9. An apparatus for data synchronization, the apparatus comprising: a connection unit, a reading unit and a playback unit;
the connection unit is used for connecting a source database and a target database according to configuration parameters, and the configuration parameters are used for acquiring source data of the source database and target data of the target database;
the reading unit is configured to read log information of the source database to obtain a first target operation, where the log information indicates a change record of the source database;
the replay unit is used for replaying the first target operation in the target database.
10. A computer device, characterized in that the computer device comprises: a processor coupled with a memory, the memory having stored therein at least one computer program instruction that is loaded and executed by the processor to cause the computer device to implement the method of any of claims 1-8.
CN202310109936.7A 2023-02-13 2023-02-13 Data synchronization method and device Pending CN115952238A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310109936.7A CN115952238A (en) 2023-02-13 2023-02-13 Data synchronization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310109936.7A CN115952238A (en) 2023-02-13 2023-02-13 Data synchronization method and device

Publications (1)

Publication Number Publication Date
CN115952238A true CN115952238A (en) 2023-04-11

Family

ID=87297975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310109936.7A Pending CN115952238A (en) 2023-02-13 2023-02-13 Data synchronization method and device

Country Status (1)

Country Link
CN (1) CN115952238A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775771A (en) * 2023-08-23 2023-09-19 北京逐风科技有限公司 Data synchronization method, device, system and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775771A (en) * 2023-08-23 2023-09-19 北京逐风科技有限公司 Data synchronization method, device, system and medium
CN116775771B (en) * 2023-08-23 2024-01-26 北京逐风科技有限公司 Data synchronization method, device, system and medium

Similar Documents

Publication Publication Date Title
US10467241B2 (en) Dynamically provisioning instances of a single-tenant application for multi-tenant use
CN109918349B (en) Log processing method, log processing device, storage medium and electronic device
US20180278725A1 (en) Converting a single-tenant application for multi-tenant use
CN111324610A (en) Data synchronization method and device
EP3125501B1 (en) File synchronization method, server, and terminal
CN108696372B (en) Method and system for keeping system configuration consistency
CN109144785B (en) Method and apparatus for backing up data
CN105653435A (en) Performance test method of NFS and performance test device of NFS
CN110555012A (en) data migration method and device
CN111008244A (en) Database synchronization and analysis method and system
WO2017113694A1 (en) File synchronizing method, device and system
CN115952238A (en) Data synchronization method and device
CN112632035A (en) Autonomous controllable database migration method and storage medium
Qiao et al. Gobblin: Unifying data ingestion for Hadoop
CN116204438A (en) Test case generation method, automatic test method and related device
CN112347192A (en) Data synchronization method, device, platform and readable medium
CN111048164A (en) Medical big data long-term storage system
CN116049142A (en) Data processing method, device, electronic equipment and storage medium
CN114896347A (en) Data processing method and device, electronic equipment and storage medium
CN112464049B (en) Method, device and equipment for downloading number detail list
CN107577680B (en) Real-time full-text retrieval system based on HBase big data and implementation method thereof
CN112015798B (en) Data processing method and device for guaranteeing data non-repetition and delay monitoring
CN113760966A (en) Data processing method and device based on heterogeneous database system
CN111596933A (en) File processing method and device, electronic equipment and computer readable storage medium
WO2024001280A1 (en) Data flow perception method and related apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination