CN116881371B - Data synchronization method, device, equipment and storage medium - Google Patents

Data synchronization method, device, equipment and storage medium Download PDF

Info

Publication number
CN116881371B
CN116881371B CN202311147251.8A CN202311147251A CN116881371B CN 116881371 B CN116881371 B CN 116881371B CN 202311147251 A CN202311147251 A CN 202311147251A CN 116881371 B CN116881371 B CN 116881371B
Authority
CN
China
Prior art keywords
data
synchronization
position information
stock
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311147251.8A
Other languages
Chinese (zh)
Other versions
CN116881371A (en
Inventor
陈肃
王绍
王浩
陈诚
陈雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhufeng Technology Co ltd
Original Assignee
Beijing Zhufeng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhufeng Technology Co ltd filed Critical Beijing Zhufeng Technology Co ltd
Priority to CN202311147251.8A priority Critical patent/CN116881371B/en
Publication of CN116881371A publication Critical patent/CN116881371A/en
Application granted granted Critical
Publication of CN116881371B publication Critical patent/CN116881371B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data synchronization method, a data synchronization device, electronic equipment and a storage medium. The method comprises the following steps: the method comprises the steps that if the stock data synchronization task from a source database to a target database is determined to be completed, the synchronization ending time of the stock data synchronization task is obtained; generating incremental synchronous data of the source database based on the synchronous ending time, the redo log of the source database and the stock data of the source database; the target database is updated based on the incremental synchronization data. After the completion of the stock data synchronization task is determined, the target database is updated through the synchronization end time of the stock data synchronization task and the redo log of the source database, and the incremental data synchronization is completed, so that seamless connection of full-volume synchronization and incremental synchronization can be realized, repeated data is avoided, and the data synchronization efficiency is improved.

Description

Data synchronization method, device, equipment and storage medium
Technical Field
The present application relates to the field of network information processing, and in particular, to a data synchronization method, apparatus, device, and storage medium.
Background
With the continuous development of internet technology, in order to realize data sharing, data synchronization is performed between databases. In general, a data synchronization scheme between databases includes: stock data synchronization (also known as full data synchronization) is performed first, and then incremental data synchronization is performed. The stock data synchronization refers to the synchronization of all the stock data in the source database to the target database, the incremental data synchronization refers to the acquisition of the data of the stock data change, namely the incremental synchronization data, based on the stock data synchronization, and the synchronization to the target database.
However, in practical applications, during the process of synchronizing the stock data, incremental data synchronization is also performed on part of the stock data. Therefore, after the synchronization of the stored data is completed, the incremental synchronous data is updated synchronously, repeated updating of the data can occur, seamless connection of the synchronization of the stored data and the incremental data cannot be realized, and the synchronization efficiency of the data is reduced.
Disclosure of Invention
In view of this, the embodiments of the present application provide a data synchronization method, apparatus, device, and storage medium, which aim to effectively improve the synchronization efficiency of data.
The technical scheme of the embodiment of the application is realized as follows:
in a first aspect, an embodiment of the present application provides a data synchronization method, where the method includes:
the method comprises the steps that if the stock data synchronization task from a source database to a target database is determined to be completed, the synchronization ending time of the stock data synchronization task is obtained;
generating incremental synchronous data of the source database based on the synchronous ending time, the redo log of the source database and the stock data of the source database;
updating the target database based on the incremental synchronization data.
In some embodiments, the generating incremental synchronization data for the source database based on the synchronization end time, the redo log for the source database, and the stock data for the source database includes:
Determining first position information based on the synchronization ending time and a preset mapping relation, wherein the first position information is the position information of a redo log corresponding to the synchronization ending time, and the preset mapping relation comprises the corresponding relation between each time and the position information of the redo log;
determining initial analysis position information of the redo log based on the first position information;
analyzing the redo log based on the initial analysis position information of the redo log to generate an analysis result, wherein the analysis result comprises operation information of stock data of the source database;
and operating the stock data of the source database based on the operation information to generate incremental synchronous data of the source database.
In some embodiments, the log type of the redo log includes an uncommitted log and a committed log, and the determining starting resolved location information of the redo log based on the first location information includes:
determining whether the log type of the redo log corresponding to the first position information is an uncommitted log, if so, acquiring second position information and determining the second position information as initial analysis position information of the redo log, wherein the second position information is position information of a committed log adjacent to the uncommitted log;
If not, determining the first position information as the initial analysis position information of the redo log.
In some embodiments, the stock data synchronization task includes a plurality of sub-stock data synchronization tasks, and the determining the first location information based on the synchronization end time and a preset mapping relationship includes:
determining a third position information set based on the synchronization end time of each sub-stock data synchronization task and a preset mapping relation, wherein each third position information in the third position information set is the position information of a redo log corresponding to the synchronization end time of each sub-stock data synchronization task;
and determining first position information based on the third position information set and a preset position information selection rule, wherein the first position information is position information meeting the preset position information selection rule in the third position information set.
In some embodiments, the operation information includes operation location information of the stock data and operation type information corresponding to the operation location information, and the generating incremental synchronization data of the source database based on the operation information by operating the stock data of the source database includes:
Acquiring operation type information corresponding to the operation position information, wherein the operation type information is one of the following: inserting operation information, updating operation information and deleting operation information;
and based on the operation type information, performing insertion operation, updating operation or deleting operation on the stock data corresponding to the operation position information, and generating incremental synchronous data of the source database.
In some embodiments, the inventory data synchronization task includes a plurality of sub-inventory data synchronization tasks, and the determining that the inventory data synchronization task from the source database to the target database is complete includes:
aiming at each sub-inventory data synchronization task in the inventory data synchronization tasks, acquiring sub-inventory synchronization data corresponding to the sub-inventory data synchronization tasks;
and determining that each sub-stock synchronous data is synchronous to the target database, and determining that the stock data synchronous task from the source database to the target database is completed.
In some embodiments, the inventory data of the source database includes identification information of the inventory data, the method further comprising:
cutting the stock data based on the identification information to generate at least one stock data block;
Determining at least one stock data block corresponding to each sub-stock data synchronization task based on the at least one stock data block, the number of sub-stock data synchronization tasks and a preset allocation rule;
the obtaining the sub-inventory synchronous data corresponding to the sub-inventory data synchronous task includes: generating the sub-inventory synchronous data corresponding to the sub-inventory data synchronous task based on at least one inventory data block corresponding to the sub-inventory data synchronous task.
In a second aspect, an embodiment of the present application provides a data synchronization apparatus, including:
the acquisition module is used for determining that the stock data synchronization task from the source database to the target database is completed, and acquiring the synchronization ending time of the stock data synchronization task;
the generation module is used for generating incremental synchronous data of the source database based on the synchronous ending moment, the redo log of the source database and the stock data of the source database;
and the updating module is used for updating the target database based on the increment synchronous data.
In a third aspect, an embodiment of the present application provides a data synchronization device, including: a processor and a memory for storing a computer program capable of running on the processor, wherein the processor is adapted to perform the steps of the method of the first aspect described above when the computer program is run.
In a fourth aspect, an embodiment of the present application provides a computer storage medium having a computer program stored thereon, the computer program implementing the steps of the method according to the first aspect when executed by a processor.
The technical scheme provided by the embodiment of the application comprises the following steps: the method comprises the steps that if the stock data synchronization task from a source database to a target database is determined to be completed, the synchronization ending time of the stock data synchronization task is obtained; generating incremental synchronous data of the source database based on the synchronous ending time, the redo log of the source database and the stock data of the source database; the target database is updated based on the incremental synchronization data. After the completion of the stock data synchronization task is determined, the target database is updated through the synchronization end time of the stock data synchronization task and the redo log of the source database, and the incremental data synchronization is completed, so that seamless connection of full-volume synchronization and incremental synchronization can be realized, repeated data is avoided, and the data synchronization efficiency is improved.
Drawings
FIG. 1 is a flow chart of a data synchronization method according to an embodiment of the application;
FIG. 2 is a schematic diagram of a deployment architecture of a data synchronization system according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a deployment architecture of yet another data synchronization system provided by an application example of the present application;
FIG. 4 is a schematic flow chart of a data synchronization scheme in an application example of the present application;
FIG. 5 is a schematic diagram illustrating the division of synchronization interval blocks in an application example of the present application;
FIG. 6 is a schematic flow chart of incremental data synchronization in an application example of the present application;
FIG. 7 is a schematic diagram illustrating a process of performing data reformation by parsing tasks in an application example of the present application;
fig. 8 is a schematic structural diagram of a data synchronization device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a data synchronization device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the accompanying drawings and examples.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
The embodiment of the application provides a data synchronization method, which is applied to data synchronization equipment, as shown in fig. 1, and comprises the following steps:
Step 110: and acquiring the synchronization ending time of the stock data synchronization task when the stock data synchronization task from the source database to the target database is determined to be completed.
Here, the source database stores the volume synchronization data, and the target database stores the volume synchronization data and the delta synchronization data synchronously loaded from the source database. Types of databases include, but are not limited to, at least one of: mySQL database, postgreSQL database, oracle database, and dream database. The synchronization between the source database and the target database may be the synchronization between heterogeneous databases or the synchronization between isomorphic databases.
Here, after determining that the stock data synchronization task from the source database to the target database is completed, the stock data of the source database is already synchronized into the target database, and the stock data synchronization is completed. At this time, the synchronization end time of the stock data synchronization task is acquired.
Step 120: and generating incremental synchronous data of the source database based on the synchronous ending time, the redo log of the source database and the stock data of the source database.
The redo log is also referred to herein as a redo log. The redo log records the modification operation of each transaction on the database, and ensures that the modified data is never lost after the transaction is submitted successfully. For example, after the database is not updated and modified, the data is down, and the modified data can be recovered by reworking the modification record of the redox log, so that the data is ensured not to be lost. The data synchronization device in the application collects the redox log in a log collection mode.
Here, the incremental synchronization data is captured based on the synchronization end time, the redox log, and the stock data of the source database, and the incremental synchronization data is generated. For different databases, the capturing schemes of the incremental synchronous data of the corresponding databases are different, for example, the MySQL database and the PostgreSQL database have sequential log data parsing capability (only the submitted transaction data can appear in the log stream and ensure the sequential property), and the incremental data can be captured in the form of the log stream, and the log stream refers to the log acquisition according to the sequential order of time. However, in databases such as Oracle and dream, incremental data capture relies on a redox log, and incremental synchronous data capture of the database needs to be performed based on the redox log.
Step 130: the target database is updated based on the incremental synchronization data.
Here, after the incremental synchronization data is generated, the target database may be updated based on the incremental synchronization data. Illustratively, when the stock data synchronization task from the source database to the target database is completed, i.e., the source database has synchronized the stock data to the target database, the updating of the target database is completed based on the incremental synchronization data.
The technical scheme provided by the embodiment of the application comprises the following steps: the method comprises the steps that if the stock data synchronization task from a source database to a target database is determined to be completed, the synchronization ending time of the stock data synchronization task is obtained; generating incremental synchronous data of the source database based on the synchronous ending time, the redo log of the source database and the stock data of the source database; the target database is updated based on the incremental synchronization data. After the completion of the stock data synchronization task is determined, the target database is updated through the synchronization end time of the stock data synchronization task and the redo log of the source database, and the incremental data synchronization is completed, so that seamless connection of full-volume synchronization and incremental synchronization can be realized, repeated data is avoided, and the data synchronization efficiency is improved.
In some embodiments, generating incremental synchronization data for the source database based on the synchronization end time, the redo log for the source database, and the stock data for the source database includes:
determining first position information based on the synchronization ending time and a preset mapping relation, wherein the first position information is the position information of the redo log corresponding to the synchronization ending time, and the preset mapping relation comprises the corresponding relation between each time and the position information of the redo log;
Determining initial analysis position information of the redo log based on the first position information;
based on the initial analysis position information of the redo log, analyzing the redo log to generate an analysis result, wherein the analysis result comprises the operation information of stock data of the source database;
and operating the stock data of the source database based on the operation information to generate incremental synchronous data of the source database.
Here, one log generation time and one position information are corresponding to each redo log. Illustratively, for a redox log table, each redox log has its corresponding location information in the table. The location information of the redox log may be location information of the redox log. The preset mapping relation can be stored in the data synchronization device in advance, and the mapping relation comprises a corresponding relation between each log generating moment and the position information of the redo log. In the mapping relationship, first location information, which is location information of the redox log corresponding to the synchronization end time, can be found.
Here, in generating incremental synchronous data, it is necessary to determine the start analysis position information of the redox log, that is, from which position of the redox log to start analysis, before analyzing the log. Here, the starting resolved position information of the redo log may be determined based on the first position information.
Here, after determining the initial analysis position of the redox log, the redox log may be analyzed, and the source database generally has an analysis process, where the process continuously monitors the redox log, analyzes a series of operation information, and obtains the operation information of the stock data of the relevant source database, that is, an analysis structure. As the database can be used by multiple users together, after each user is successfully connected, the transaction can be independently started to carry out addition, deletion and modification operations and submission, and the operations can be immediately written into a redox log along with the execution of the user. Therefore, the transaction analyzed by the redox file, i.e. the redox log, has a plurality of cases of simultaneous transactions in a period of time, and each transaction corresponds to a connection of a user, starts with Begin and ends with Commit along with the submitting operation of the user.
Here, in general, the redox logs are arranged according to the execution time of the sql statement, and in the process of parsing, operation data corresponding to the redox log is generated, where the operation data is operation information of the source database, the operation information corresponds to different transaction IDs, and in order to ensure the sequential property and timeliness of incremental data synchronization, the operation information generated by the redox log after parsing may be arranged according to the sequential order of the transaction ID commit time.
Here, the stock data of the source database is operated based on the operation information obtained by the analysis, incremental synchronous data of the source database is generated, the target database is provided with a loading process, the incremental synchronous data is loaded into the target database based on the loading process, and the target database is updated.
In some embodiments, the log types of the redo log include an uncommitted log and a committed log, determining starting resolved location information of the redo log based on the first location information, including:
determining whether the log type of the redo log corresponding to the first position information is an uncommitted log, if so, acquiring second position information and determining the second position information as initial analysis position information of the redo log, wherein the second position information is position information of a committed log adjacent to the uncommitted log;
if not, determining the first position information as the initial analysis position information of the redo log.
Here, in the process of data synchronization, for a database that does not depend on the redox log to perform synchronization and provides a sequential log data parsing capability, for example, a database such as MySQL, postgreSQL, in the database log that performs incremental data capturing, only committed transactions can exist, and records are strictly recorded in the order in which the transactions are committed. For a redox log, it may include: committed redox logs and uncommitted redox logs. Committed here refers to the Commit status of the transaction, typically representing the log committed in Commit.
Here, in order to avoid uncommitted data occurring in the incremental synchronization process based on the redox logs, influence the synchronization of the incremental data, ensure the final consistency and the sequence of the data synchronization, determine whether the redox log corresponding to the first position information is uncommitted when the position information of the redox log corresponding to the synchronization end time, that is, the first position information, is found, if yes, the redox log corresponding to the first position information is the uncommitted log, acquire the position information, that is, the second position information, of the submitted log adjacent to the uncommitted log, and determine that the second position information is the initial analysis position information of the redox log, that is, analyze the redox log from the second position information. If not, determining the first position information as the initial analysis position information of the redo log.
In some embodiments, the stock data synchronization task includes a plurality of sub-stock data synchronization tasks, and determining the first location information based on a synchronization end time and a preset mapping relationship includes:
determining a third position information set based on the synchronization end time of each sub-stock data synchronization task and a preset mapping relation, wherein each piece of third position information in the third position information set is the position information of the redo log corresponding to the synchronization end time of each sub-stock data synchronization task;
And determining first position information based on the third position information set and a preset position information selection rule, wherein the first position information is position information meeting the preset position information selection rule in the third position information set.
Here, in a practical application scenario, the stock data of the database table may reach tens of millions or even hundreds of millions of scales. In the process of stock data synchronization, if the data synchronization is performed in a single task mode, the data synchronization mode can be performed by adopting a database snapshot technology. A database snapshot is a read-only copy of the database, which is a map of all the data of the database, whose content is determined by the point in time the snapshot is executed, and if the amount of data stored is large, the reading time is long, which requires more resources to be consumed for maintaining the snapshot, which has a large impact on the database performance. Even read failures occur when the database resources are insufficient to maintain a snapshot, e.g., oracle can generate an "ORA-01555: snapshot tool" exception. In addition, databases are inherently designed for concurrent reading. The stock data is divided into sections according to a certain mode and is handed to different subtasks for reading, and the reading efficiency is better than that of a single-task mode.
Here, in order to improve the synchronization efficiency of the stock database, the stock data synchronization task includes a plurality of sub-stock data synchronization tasks. When the inventory data synchronization task includes a plurality of sub-inventory data synchronization tasks, the end time of each sub-inventory data synchronization task is different, and based on the end time of each sub-inventory data synchronization task and a preset mapping relationship, a third position information set composed of position information, i.e. a third position, of a redox log corresponding to the synchronization end time of each sub-inventory data synchronization task can be determined.
Here, the preset position information selection rule is to select the minimum position information in the second position information set. For example, the location information is the location information of the redox log, and at this time, the location information of the redox log corresponding to the end time of the stock data synchronization task is the minimum location information among the location information of the redox log corresponding to the end time of each sub-stock synchronization task. The locus is defined as a sequence number which is given when the redox log is generated, the sequence number is increased from small to large, the sequence number is also called locus, a small locus represents a record corresponding to a longer previous writing, and a large locus represents a newer writing log. When the read-write node is about to execute the write operation, the log record corresponding to the write operation needs to be generated first, and the information such as the execution step of the write operation and the corresponding data change is recorded through the log record. When the log record is generated, the Log Serial Number (LSN) corresponding to the current operation is also determined, and the larger the serial number is, the later the execution sequence of the operations recorded in the log is. In this embodiment, when the read-write node is about to write operation to the shared storage, the first log writing location is the log serial number LSN corresponding to the latest write operation recorded in the log.
Therefore, the first position information is determined based on the minimum position information, because the log data before the position is synchronized with the incremental data, repeated processing is not needed, the data synchronization efficiency is improved, and the occurrence of re-reading of the data is avoided.
In some embodiments, the operation information includes operation position information of the stock data and operation type information corresponding to the operation position information, and the operation is performed on the stock data of the source database based on the operation information to generate incremental synchronization data of the source database, including:
acquiring operation type information corresponding to the operation position information, wherein the operation type information is one of the following: inserting operation information, updating operation information and deleting operation information;
and based on the operation type information, performing insertion operation, updating operation or deleting operation on the stock data corresponding to the operation position information, and generating incremental synchronous data of the source database.
Here, when the stock data of the source database is executed to perform an operation, the redox log records the operation position information corresponding to the operation and the operation type information corresponding to the operation position information. The operation type information is at least one of the following: insert operation information, update operation information, and delete operation information.
Here, the operation type information is represented by DML (Data Manipulation Language, data operation language) which is an operation for a certain object including INSERT, UPDATE, and DELETE, and DDL (Data Definition Language ) which is an initialization work mainly for defining or changing the structure of a table, data type, links and constraints between tables, and the like. In the embodiment of the application, the DDL comprises operations such as table creation, renaming, column addition, column modification, column deletion and the like, and the DML comprises data addition, deletion and modification.
Here, in order to mask the type differences of the source database, so as to achieve synchronization between heterogeneous databases, after parsing into DML and DDL, DDL and DML may be converted into a unified data encoding format.
In particular embodiments, encoding schemes including, but not limited to, avro may be employed. Avro is a data serialization system of Hadoop designed to support applications for large volume data exchanges. The Avro format encoding method is as follows: avro relies on "schema" (schema) to implement the definition of the data structure, the Avro format consists of primitive types (i.e., basic types) (null, boolean, int, long, float, double, bytes and string) and complex types (record, enum, array, map, union and fixed). The Avro format can encode any structured data by a combination of primitive and complex types.
Assuming that the source table of the stock data of the source database in this embodiment is a user table (user_info) of MySQL source, an Avro pattern corresponding to DML data is defined as follows, including three columns of names (name, varchar (50) type), age (int type), interest (array type), creation time (datatime type): wherein, the entity represents the table name of the source end and is used for identifying the membership of the data; mTYPE represents the data type and has a value range of DDL/DML. This is mainly to facilitate the subsequent processing by the loading unit; the op field represents the DML type, the value range is I/U/D, and the op field represents insertion, update and deletion respectively; the actual data is placed under the data field, which is a record composite type that includes the specific definition of name, age, interest and createTime, where the string type is used for time type encoding.
For DDL, its structure is defined as follows: where ops represent DDL types, for additions and deletions to a table field, the value of that field may be set to "alter-table-column". The data field is an array type, where each item is defined as a conforming structure, including column (column name), op (type, I/U/D, corresponding to addition, deletion and modification of columns, respectively), def (specific column definition, which may be null for D type).
Here, based on the operation type information, an insert operation, an update operation, or a delete operation is performed on the stock data corresponding to the operation position information, and incremental synchronization data of the source database is generated. Illustratively, the DDL and the DML may be applied to perform an insert operation, an update operation or a delete operation, to generate incremental synchronization data of the source database, and to implement structure change and data synchronization.
In some embodiments, the inventory data synchronization task includes a plurality of sub-inventory data synchronization tasks, determining that the inventory data synchronization task of the source database to the target database is complete, including:
aiming at each sub-inventory data synchronization task in the inventory data synchronization tasks, sub-inventory data corresponding to the sub-inventory data synchronization tasks are obtained;
and determining that the stock data synchronization task from the source database to the target database is completed if each piece of sub-stock synchronization data is synchronized to the target database.
Here, in order to improve the stock synchronization efficiency of data, the synchronization may be performed in a multi-tasking synchronization mode. The set stock data synchronization task includes a plurality of sub-stock data synchronization tasks based on which synchronization is performed.
Here, for each of the inventory data synchronization tasks, there is corresponding inventory synchronization data. The aim of each sub-stock synchronous data synchronous task is to synchronously complete the corresponding sub-stock synchronous data. And determining that the stock data synchronization task from the source database to the target database is completed by determining that each piece of sub-stock synchronization data is synchronized to the target database.
In some embodiments, the inventory data of the source database includes identification information of the inventory data, the method further comprising:
cutting stock data based on the identification information to generate at least one stock data block;
determining at least one stock data block corresponding to each sub-stock data synchronization task based on the at least one stock data block, the number of sub-stock data synchronization tasks and a preset allocation rule;
the method for acquiring the sub-inventory synchronous data corresponding to the sub-inventory data synchronous task comprises the following steps: generating the sub-inventory data corresponding to the sub-inventory data synchronization task based on at least one inventory data block corresponding to the sub-inventory data synchronization task.
Here, the stock data of the source database all have unique identification information, and the identification information needs to be a numerical value or a character type. For example, a source table of a source database includes stock data of source data, and in the source table, identification information may be a primary key, a joint primary key, and an index identifier having a unique index. Here, the data synchronizing device acquires the stock data in a query manner.
Here, the stock data is cut based on the identification information of the stock data and a preset cutting rule, and at least one stock data block is generated. Illustratively, the identification information of the stock data is a Key, and the minimum value of the Key is 1 and the maximum value of the Key is 100. The user can cut the corresponding stock data of 1 to 100 according to the actual situation, and at least one stock data block is generated. Here, in the case of performing the cutting, the cutting rule should be determined by relatively uniformly distributing data to the respective sub-tasks according to the range of KEY values and the number of sub-stock data synchronization tasks formulated, and setting the data so as to avoid the occurrence of data skew. Or may be set as necessary. In general, the inventory data synchronization task is less than or equal to the number of inventory data blocks. If the number of sub-inventory synchronization tasks is greater than the number of inventory data blocks, then some of the sub-inventory synchronization tasks are not allocated to the inventory data blocks and are in an idle state.
Here, after the number of the at least one sub-inventory data block and the sub-inventory data sync task is determined, at least one data block allocated to each sub-inventory data sync task is determined based on a preset allocation rule. The method for acquiring the sub-inventory synchronous data corresponding to the sub-inventory data synchronous task comprises the following steps: generating the sub-inventory data corresponding to the sub-inventory data synchronization task based on at least one inventory data block corresponding to the sub-inventory data synchronization task.
Here, the preset allocation rule may include: 1) Sequentially numbering the sub-inventory synchronization tasks from 1; 2) Sequentially numbering the stock data blocks from 1; 3) And sequentially distributing the stock data blocks to the subtasks according to the serial number sequence, and circularly distributing the next round after distributing the data blocks for one round until all the synchronous interval blocks are distributed. Therefore, the multi-thread realizes the synchronization of the stock data at the same time by slicing the stock data, and improves the synchronization efficiency of the database.
The following describes embodiments of the present application in detail with reference to an application example.
Seamless joining supporting full-volume slice reading and full/incremental is one of the product functional features of DP (Data Parallelism). In the technical design of the DP, in order to realize the seamless connection of the slicing and the full/increment, locks with the levels above the table are not added to the database. In the related art, there is provided a delta data capture scheme of MySQL, postgreSQL, which is a database that provides sequential log data parsing capability (only committed transaction data will appear in the log stream and ensure the sequential) and has a precondition: in the database log used for incremental data capture, only committed transactions can exist, and records are strictly in the order in which the transactions committed. Likewise, this approach also requires that the database log contain only committed transactions and that order consistency be guaranteed. But is not satisfactory for scenarios such as Oracle where incremental data capture is performed through a redox log (and archive log). For example, in databases such as Oracle and dream, incremental data capture relies on a redox log. Uncommitted transaction data may exist in the redox log, and the basic premise of the scheme is not satisfied, so that a new method is needed to realize the problems of stock and increment connection under the multi-partition scene, and the final consistency of the data is ensured.
In addition, in the scenario where the redox log performs incremental capturing, since during the process of performing the stock data synchronization, part of the stock data also performs incremental data synchronization. Therefore, after the synchronization of the stored data is completed, the incremental synchronous data is updated synchronously, repeated updating of the data can occur, seamless connection of the synchronization of the stored data and the incremental synchronous data can not be realized, and the data synchronization efficiency is reduced.
Based on the above, the application example provides a shard data synchronization scheme in a database incremental capture scene by utilizing a redox log. The method is used for synchronizing the full quantity in a fragmentation inquiry mode, synchronizing the increment in a redox log analysis mode and ensuring that the connection of the full quantity and the increment does not generate repeated data under the condition that the data has unique identification.
The data synchronization scheme provided by the application example can be applied to a data synchronization system, and corresponds to different data synchronization system deployment architectures according to different manners of acquiring the redox log by the source database. May be implemented remotely or locally.
As shown in fig. 2, fig. 2 is a schematic deployment architecture diagram of a data synchronization system in which a source database remotely acquires a redox log. The data synchronization system includes a source database host 201, a target database host 202, and a data synchronization device host 203. The source database host 201 loads the database process 204, the target database host 202 loads the database process 205, and the data sync device host 203 loads the data sync system process 206. In addition, the data synchronizer host 203 may further include an acquisition unit, such as MySQL, where the acquisition unit of the redox log may run on a remote server.
As shown in fig. 3, if access to the log must be done locally in the database (e.g., stand-alone mode of Oracle), the data synchronization system includes a database host 201 to the source, database host 202 to the destination, and a data synchronization device host 203. The source database host 201 loads the database process 204 and the acquisition of the redox log may run on the source database host 201 as an acquisition agent process 207 alone. The target database host 202 loads the database process 205 and runs on the target database host 202 solely with the load agent process 208. The data synchronization device host 203 is loaded with a data synchronization process 206.
It should be noted that the two deployment architectures of fig. 2 and 3 represent two extreme scenarios in implementation. Furthermore, the acquisition agent 207 process and the loading agent process 208 in fig. 3 may be implemented in a freely combined manner as desired, for example, using only the acquisition agent process 207, or using only the loading agent process 208.
In this application example, the data synchronization system of fig. 2 or fig. 3 is adopted, and the host of the data synchronization device in the data synchronization system further includes an obtaining unit, where the obtaining unit is configured to obtain stock data by querying from a source database, or obtain log data by log collection.
In this application example, the source table data of the source database is the stock data (also referred to as full data), referring to fig. 4, fig. 4 is a schematic flow chart of a data synchronization scheme, and the specific implementation of the data synchronization scheme includes the following steps:
step 401: the data synchronization system divides the source table into intervals according to the unique identification to construct a plurality of synchronization interval blocks.
Based on the identification information, the stock data is cut, and at least one stock data block is generated.
Here, the source table is stock data of the target database, the data synchronization system cuts the stock data based on the identification information, and generates at least one stock data block, i.e. the source table is divided into sections according to the unique identification, so as to construct a plurality of synchronization section blocks (i.e. stock data blocks). In practical application scenarios, the stock data of the database table may reach tens of millions or even hundreds of millions of scales. In reading the full amount of data, the database needs to maintain a snapshot of the data before the query results are completely returned to the client. If the table is large, the read time is long, the data will need to consume more resources to maintain the snapshot, which has a greater impact on database performance. Even read failures occur when the database resources are insufficient to maintain a snapshot, e.g., oracle can generate an "ORA-01555: snapshot tool" exception. In addition, databases are inherently designed for concurrent reading. When a large table is read, the data is divided into sections according to a certain mode and is submitted to different subtasks (namely, subtank data synchronization tasks) for reading, and the reading efficiency is better than that of a single task mode.
Here, in order to divide the data table into synchronization section blocks (i.e., stock data blocks), it is required that the data table has to be provided with a unique identification column (i.e., identification information of data). The uniquely identified columns may be primary keys, federated primary keys, and columns/column sets with unique indices. The unique identifier needs to be a numerical value or a character type, and can be divided into intervals by comparing the sizes. In the case of interval division, it is preferable that data should be relatively uniformly distributed to each subtask (i.e., a subtask data synchronization task) according to a range of values and the number of formulated subtasks, so that data skew is avoided.
Fig. 5 gives a specific example of the division of the synchronization interval blocks (i.e., the stock data blocks). In this example, for visual considerations, assume that the source table, the stock synchronization data, has only two columns: the ID column is a unique identification column (expressed by Key), and the C1 column is char type. When the synchronization task (i.e., stock data synchronization task) is started, the minimum value of the identification information Key of the whole table is 1, and the maximum value is 100.
Illustratively, the user cuts the number of the synchronization interval blocks (i.e. the stock data blocks) designated by the interface into 10, and simultaneously starts 10 sub-stock synchronization tasks to read the stock data of the source table. Under this setting, as shown in fig. 5, each synchronization interval block (i.e., stock data block) finally contains 10 pieces of stock data within the key value range.
Step 402: the data synchronization system starts a certain number of stock synchronization tasks and distributes the synchronization interval blocks to the synchronization tasks.
And determining at least one stock data block corresponding to each sub-stock data synchronization task based on the at least one stock data block, the number of the sub-stock data synchronization tasks and a preset allocation rule.
The method for acquiring the sub-inventory synchronous data corresponding to the sub-inventory data synchronous task comprises the following steps: generating the sub-inventory data corresponding to the sub-inventory data synchronization task based on at least one inventory data block corresponding to the sub-inventory data synchronization task.
Here, the sub-inventory synchronization task (i.e., sub-inventory data synchronization task) is only used to synchronize inventory data, and the number of sub-inventory synchronization tasks may be less than the number of synchronization section blocks (i.e., inventory data blocks). Each sub-stock synchronization task is assigned 1 to a plurality of synchronization interval blocks (i.e., stock data blocks). If the number of the sub-stock synchronization tasks is greater than the number of the synchronization interval blocks, then some of the sub-stock synchronization tasks are not allocated to the synchronization interval blocks and are in an idle state.
Here, the data synchronization system may allocate the stock data to the synchronization interval blocks in such a manner that the allocation is performed based on a preset allocation rule, and at least one stock data block corresponding to each sub-stock synchronization task is determined: 1) Sequentially numbering the sub-inventory synchronization tasks from 1; 2) Sequentially numbering the synchronous interval blocks from 1; 3) And sequentially distributing the synchronous interval blocks to the sub-stock synchronous tasks according to the number sequence, and circularly distributing the next round after distributing the synchronous interval blocks for one round until all the synchronous interval blocks are distributed.
Step 403: each sub-stock synchronization task records the low position of the log, starts to read and cache the block data of the interval, and records the high position of the log at the moment after the completion.
Here, in the execution process of each sub-inventory synchronization task, the corresponding sub-inventory data is acquired by means of query. Namely, by constructing a query statement, the sub-stock data is read and put into an interval block cache. When reading is started, the reading start time of the sub-stock synchronization task (i.e., the synchronization start time of the sub-stock synchronization task) is recorded, the redox log site of the source database corresponding to the time is recorded as a low-offset site, and after the reading is completed (i.e., the end time of the sub-stock synchronization task), the database log site at that time is recorded as a high-offset site (high-offset). Every time the sub-inventory synchronization task synchronizes one interval block data (i.e., sub-inventory data block), one is generated:
the form is a record of < chunk-i, chunk-i-range, high-offset-i >. Where chunk-i represents the ith synchronization interval block, chunk-i-range represents the key value (i.e., identification information of the sub-stock data) range of chunk-i, and high-offset-i represents the high-order point of chunk-i. For example, the sub-inventory synchronization task 1 corresponds to two synchronization interval blocks, namely a synchronization interval block 1 and a synchronization interval block 2, and after synchronization of the synchronization interval block 2 is completed, < chunk-2, chunk-2-range, high-offset-2> is generated.
Step 404: each sub-stock synchronization task acquires log data between a low-level point and a high-level point, covers corresponding data in a cache according to a unique data identifier, and then sends the data to a processing downstream.
Illustratively, since there is uncommitted transaction data in the REDO log, the sub-inventory synchronizes the high and low points of the task, requiring the use of the "point of committed transaction" closest to the query time. And determining that the log type of the redo log corresponding to the starting and ending time of the sub-stock synchronous task is an uncommitted log, and determining that the site of the redo log corresponding to the starting and ending time of the sub-stock synchronous task is the site information of the committed log adjacent to the uncommitted log.
In the application example, in order to avoid excessive IO pressure on databases caused by analysis of log data by each of a plurality of sub-stock synchronization tasks, a data synchronization system only starts an independent log analysis task aiming at one source database, and transaction data submitted in a log is sent to a log cache according to transaction submission time. Each stock synchronization subtask and the subsequent increment synchronization subtask read the data needed by themselves from this log cache.
In this application example, the analysis of the synchronous task and the redox log is performed asynchronously, and thus incremental data synchronization of part of the inventory data, that is, coverage of the inventory data, occurs during the inventory synchronization. As shown in fig. 6, fig. 6 shows a specific processing procedure example, and it is assumed that at least one synchronization interval block corresponding to a certain sub-inventory synchronization task is chunk-1[1-10], where 1-10 is identification information, i.e. a key value, at this time, a log site corresponding to a synchronization start time is represented by chunk1-low-offset, and a log site corresponding to a synchronization end time is chunk1-high-offset. For the log after analysis, the analysis result corresponding to the log site from the chunk1-low-offset to the chunk1-high-offset is the operation information of the source database, the operation information comprises operation type information and operation position information, and the operation type information is one of the following: inserting operation information, updating operation information and deleting operation information, wherein the operation position information is key value information of a source database. Wherein the corresponding bond values of + (101, a) and + (102, w) are 101 and 102, and the + (101, a) and + (102, w) are not processed outside the range of the bond values of [1-10 ]. The read results of chunk-1 are (1, a), (2, b), (3, c), (4, d), (7,e), (8, f), (7,g).
The sub-stock synchronization task performs data coverage in the interval block cache, namely, performs insertion operation, update operation or deletion operation on stock data corresponding to the operation position information based on the operation type information, and generates incremental synchronization data of the source database:
1. if the log data is inserted or updated and the unique key is in the interval block cache, updating the corresponding key in the interval block cache to be a value in the log data; for example: and + (3, c) represents a record of key 3 inserted with a value of c. Where + (7,e) represents the insertion of a record of value e for a key of 7, - (3, c), + (3, p) represents the updating of the value of the record of key 7 from c to p.
2. If the log data is deleted and the unique key is in the interval block cache, deleting the key value corresponding to the interval block cache; for example, - (7,e) represents a record with a delete key value of 7.
3. In other cases, log data is ignored.
Thus, the combined output of the chunk-1 through the above operations is (1, a), (2, b), (3, q), (4, d), (5, c), (8, z), (9,g).
In order to ensure the sequence and consistency of data synchronization, since there may be non-handed over transactions in REDO log data, when a log analysis task of the data synchronization system encounters a transaction start flag, the non-submitted transaction data needs to be cached first, and when the submitted record of the transaction is analyzed, the analyzed data is sent to the log cache according to the sequence of the submitted time. Through this processing step, the data in the log cache is ordered according to the commit time of the transaction. Fig. 7 gives a specific example of the processing procedure. The arrow indicates the time sequence, the closer to the arrow, the later the commit order, the transaction tx-1, tex-2 and tx-3 are included in the redox log, and for each transaction, the uncommitted data is cached first, and the committed data is cached later. In order to ensure the sequence and consistency of data synchronization, after the log analysis task of the redox log analyzes the analysis result, the data is reformed based on the submitting sequence of each transaction, and a result after the data reforming is generated. In FIG. 1, transaction tx-2 is the earliest in commit order, and the reformed data is arranged in terms of transaction 2, transaction 3, and transaction 1. The transactions are also arranged in order of commit. Thus, the reformed data is sent to the log cache according to the sequence, and the data in the log cache is ensured to be orderly arranged according to the commit time of the transaction.
Here, the analysis result of the log may include that in an acquisition mode based on log acquisition, the data synchronization device host in the data synchronization system further includes an analysis unit, and the analysis unit obtains a data definition language (Data Definition Language, DDL) of a plain culture by analyzing the log; 2) Data manipulation language (Data Manipulation Language, DML).
The DDL focused by this application example includes operations such as creation of a table, renaming, addition of a column, modification of a column, deletion of a column, and the like, and the DML focused by this application example includes addition, deletion, and modification of data.
The host of the data synchronization device in the data synchronization system also comprises a conversion unit, and the conversion unit is used for transmitting plaintext data obtained by query to the conversion unit for conversion under the acquisition mode based on the query. The conversion unit herein refers to converting DDL and DML into a unified data encoding format. The unified data coding can shield the type difference of the source database, so that the synchronization between heterogeneous databases is realized.
In particular embodiments, encoding schemes including, but not limited to, avro may be employed. The Avro format encoding method is as follows: avro relies on "schema" (schema) to implement definition of data structures, composed of Avro format primitive types (i.e., basic types) (null, boolean, int, long, float, double, bytes and string) and complex types (record, enum, array, map, union and fixed). The Avro format can encode any structured data by a combination of primitive and complex types. Assuming that the source table in this embodiment is a user table (user_info) of MySQL source, an Avro pattern including three columns of names (name, varchar (50) type), ages (age, int type), interests (interest type), creation time (datetime type), and corresponding DML data is defined as follows: wherein, the entity represents the table name of the source end and is used for identifying the membership of the data; mTYPE represents the data type and has a value range of DDL/DML. This is mainly to facilitate the subsequent processing by the loading unit; the op field represents the DML type, the value range is I/U/D, and the op field represents insertion, update and deletion respectively; the actual data is placed under the data field, which is a record composite type that includes the specific definition of name, age, interest and createTime, where the string type is used for time type encoding.
{
"type":"record",
"name":"userInfoDML",
"fields": [
{
"name": "entity",
"type": "string"
},
{
"name": "mType",
"type": "string"
},
{
"name": "op",
"type": "string"
},
{
"name": "data",
"type": {
"name": "dmlField",
"type": "record",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "age",
"type": "int"
},
{
"name": "interest",
"type": {
"type":"array",
"items": "string"
}
},
{
"name": "createTime",
"type": "string"
}
]
}
}
]
}
The data encoded using this pattern definition is as follows. It should be noted that Avro would binary encode the data using Schema, and the following represents only a representation of its plaintext.
{
"entity":"user_info",
"mType":"DML",
"op": "I",
"data": {
"name": "Alice",
"age": 20,
"interest": ["pop", "travel", "sports"],
"createTime": "2023-06-25 15:34:32"
}
}
For DDL, its structure is defined as follows: where ops represent DDL types, for additions and deletions to a table field, the value of that field may be set to "alter-table-column". The data field is an array type, where each item is defined as a conforming structure, including column (column name), op (type, I/U/D, corresponding to addition, deletion and modification of columns, respectively), def (specific column definition, which may be null for D type).
{
"type":"record",
"name":"userInfoDDL",
"fields": [
{
"name": "entity",
"type": "string"
},
{
"name": "mType",
"type": "string"
},
{
"name": "op",
"type": "string"
},
{
"name": "data",
"type": "array",
"items": {
"name": "ddl-item",
"type": "record",
"fields": [
{
"name": "column",
"type": "string"
},
{
"name": "op",
"type": "string"
},
{
"name": "def",
"type": "string"
}
]
}
}
]
}
The data encoded by the pattern definition is shown below, representing the new address and generator columns at the source.
{
"entity":"user_info",
"mType":"DDL",
"op": "alter-table-by-tmp",
"data": [
{
"column":"address",
"op":"I",
"def":"varchar(255)"
},
{
"column":"gender",
"op":"I",
"def":"varchar(10)"
}
]
}
The host of the data synchronization device further comprises a conversion unit, which is used for applying DDL and DML to the target database to realize structural change and data synchronization. DDL and DML are applied to the target database, so that structural change and data synchronization are realized, and depending on a specific writing mode, writing to the target database can be realized in a remote or local mode. If it can be implemented remotely (e.g., JDBC writing), the acquisition unit can run on a remote server. A deployment run architecture corresponding to fig. 2; if the writing to the target database must be done locally on the database (typically occurs when high performance writing in the form of files is required and the target database does not support remote streaming), then the loading unit needs to run on the database hosts in a separate loading agent, corresponding to the deployment operating architecture of FIG. 3.
Step 405: the data synchronization system waits for all the sub-stock synchronization tasks to complete, starts an increment synchronization task, and starts to analyze data from the minimum high-level point of all the sub-stock tasks.
And after determining that each sub-stock synchronous data is synchronous to the target database, determining that the stock data synchronous task from the source database to the target database is completed. And starting an incremental synchronous task when the stock data synchronous task from the source database to the target database is completed, and acquiring the synchronous ending time of each sub-stock data synchronous task. In this application example, the incremental synchronization task is performed by a single thread, while the stock synchronization described above is performed by multiple threads.
And determining a third position information set based on the synchronization ending time of each sub-stock synchronization task and a preset mapping relation, wherein each piece of third position information in the third position information set is the position information of the redo log corresponding to the synchronization ending time of each sub-stock data task. From the above, the high-level information corresponding to the synchronization end time of each sub-inventory synchronization task may be represented as high-offset-i, i.e., the third location information, where i is a synchronization interval block corresponding to the synchronization end time of each sub-inventory synchronization task.
And determining first position information based on the third position information set and a preset position information selection rule, wherein the first position information is position information meeting the preset position information selection rule in the third position information set. The preset location information selection rule here is to select a location with the smallest location information in the high-offset-i set, that is, a minimum high location of all sub-inventory tasks. If there is uncommitted transaction data in the REDO log, the higher order point of the sub-inventory synchronization task needs to use the "committed transaction point" closest to the query time. And determining that the log type of the redo log corresponding to the ending time of the sub-stock synchronous task is an uncommitted log, and determining that the site of the redo log corresponding to the starting and ending time of the sub-stock synchronous task is the site information of the submitted log adjacent to the uncommitted log, wherein the site information is the starting analysis position information of the redo log. If not, determining the first position information as the initial analysis position information of the redo log.
That is, the data synchronization system traverses < chunk-i, chunk-i-range, high-offset-i > reported by all sub-inventory synchronization tasks, and selects the smallest high-offset-i as the synchronization starting point of the increment task (namely, the preset position information selection rule). The minimum high-offset-i is selected as the starting point of synchronization of the incremental task, i.e., the starting analytical position of the redo log, because the log data before this point has completed the merging with the stock data, i.e., the updating of the partial stock data, without the need for repeated processing. While each sub-inventory synchronization task is processed independently, there may be some log data between them that is not consolidated into inventory data, requiring processing in step S406. Therefore, the influence of the uncommitted log on log analysis and the consistency of data synchronization are avoided.
Based on the initial analysis position information of the redo log, analyzing the redo log to generate an analysis result, wherein the analysis result comprises the operation information of stock data of the source database;
operating the stock data of the source database based on the operation information to generate incremental synchronous data of the source database;
the target database is updated based on the incremental synchronization data.
And analyzing the redo log based on the position information corresponding to the minimum high-offset-i to generate an analysis result, wherein the analysis result comprises the operation information of the stock data of the source database. Based on the flow of incremental data synchronization shown in fig. 6, incremental data is generated, and the target database is updated based on the incremental synchronization data.
Step 406: the incremental sync task filters out duplicate data using the site information recorded by the stock sync task and sends the newly added data to the downstream processing.
After the incremental sync task parses the data, it is first determined whether the incremental sync task belongs to the chunk range of each sub-stock task according to the operation position, e.g., key value, in the operation information. If not, then directly downstream of the process; if belonging to chunk-i, it is further judged whether the position is lower than high-offset-i. If lower, ignore, and higher, go downstream of the process. When the log data locus processed by the increment synchronous task is higher than the maximum high locus of all the sub-stock tasks, the mark inside the task can be changed without judging and filtering one by one. Therefore, above this point, all parsed log data has no intersection with the stock synchronization task.
In order to implement the method according to the embodiment of the present application, the embodiment of the present application further provides a data synchronization device, where the data synchronization device corresponds to the data synchronization method, and each step in the embodiment of the data synchronization method is also completely applicable to the embodiment of the data synchronization device.
As shown in fig. 8, the data synchronization apparatus 800 includes: the acquiring module 810, the generating module 820 and the updating module 830, where the acquiring module 810 is configured to determine that the stock data synchronization task from the source database to the target database is completed, and then acquire a synchronization end time of the stock data synchronization task; the generating module 820 is configured to generate incremental synchronization data of the source database based on the synchronization end time, the redo log of the source database, and the stock data of the source database; the updating module 830 is configured to update the target database based on the incremental synchronization data.
In some embodiments, the data synchronization device further includes a determining module 840 and an analyzing module 850, where the determining module 840 is configured to determine first location information based on a synchronization end time and a preset mapping relationship, the first location information is location information of a redo log corresponding to the synchronization end time, and the preset mapping relationship includes a corresponding relationship between each time and location information of the redo log; determining initial analysis position information of the redo log based on the first position information; the parsing module 850 is further configured to parse the redo log based on the starting parsing location information of the redo log, to generate a parsing result, where the parsing result includes operation information of stock data of the source database; the generating module 820 is further configured to operate on stock data of the source database based on the operation information, and generate incremental synchronization data of the source database.
In some embodiments, the log type of the redo log includes an uncommitted log and a committed log, and the determining module 840 is further configured to determine whether the log type of the redo log corresponding to the first location information is an uncommitted log, and if so, obtain second location information and determine the second location information as starting analysis location information of the redo log, where the second location information is location information of a committed log adjacent to a location of the uncommitted log; if not, determining the first position information as the initial analysis position information of the redo log.
In some embodiments, the determining module 840 is further configured to determine a third set of location information, where each third location information in the third set of location information is location information of a redo log corresponding to a synchronization end time of each sub-inventory data synchronization task, based on the synchronization end time of each sub-inventory data synchronization task and a preset mapping relationship; and determining first position information based on the third position information set and a preset position information selection rule, wherein the first position information is position information meeting the preset position information selection rule in the third position information set.
In some embodiments, the operation information includes operation position information of the stock data and operation type information corresponding to the operation position information, and the obtaining module 810 is further configured to obtain the operation type information corresponding to the operation position information, where the operation type information is one of the following: inserting operation information, updating operation information and deleting operation information; the generating module 820 is further configured to perform an inserting operation, an updating operation, or a deleting operation on the stock data corresponding to the operation location information based on the operation type information, and generate incremental synchronization data of the source database.
In some embodiments, the stock data synchronization task includes a plurality of sub-stock data synchronization tasks, and the obtaining module 810 is further configured to obtain sub-stock synchronization data corresponding to each of the sub-stock data synchronization tasks; the determining module 840 is further configured to determine that each sub-stock synchronization data is synchronized to the target database, and then determine that the stock data synchronization task from the source database to the target database is completed.
In some embodiments, the generating module 820 is further configured to cut the inventory data based on the identification information to generate at least one inventory data block; the determining module 840 is further configured to determine at least one inventory data block corresponding to each sub-inventory data synchronization task based on at least one inventory data block, the number of sub-inventory data synchronization tasks, and a preset allocation rule;
the method for acquiring the sub-inventory synchronous data corresponding to the sub-inventory data synchronous task comprises the following steps: generating the sub-inventory data corresponding to the sub-inventory data synchronization task based on at least one inventory data block corresponding to the sub-inventory data synchronization task.
In practical applications, the acquiring module 810, the generating module 820, the updating module 830, the determining module 840 and the analyzing module 850 may be implemented by a processor in the data synchronizing device. Of course, the processor needs to run a computer program in memory to implement its functions.
It should be noted that: in the data synchronization device provided in the above embodiment, only the division of each program module is used for illustration, and in practical application, the processing allocation may be performed by different program modules according to needs, that is, the internal structure of the device is divided into different program modules, so as to complete all or part of the processing described above. In addition, the foregoing embodiments provide the same concept as the data synchronization device and the data synchronization method embodiment, and specific implementation processes of the foregoing embodiments are detailed in the method embodiment, which is not repeated herein.
Based on the hardware implementation of the program modules, and in order to implement the method of the embodiment of the present application, the embodiment of the present application further provides a data synchronization device. Fig. 9 shows only an exemplary structure of the data synchronization apparatus, not all of which may be implemented as needed.
As shown in fig. 9, a data synchronization device 900 provided in an embodiment of the present application includes: at least one processor 901, memory 902, a user interface 903, and at least one network interface 904. The various components in the data synchronization device 900 are coupled together by a bus system 905. It is appreciated that the bus system 905 is used to enable connected communications between these components. The bus system 905 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus system 905 in fig. 9.
The user interface 903 may include, among other things, a display, keyboard, mouse, trackball, click wheel, keys, buttons, touch pad, or touch screen, etc.
The memory 902 in embodiments of the present application is used to store various types of data to support the operation of the data synchronization device. Examples of such data include: any computer program for operating on a data synchronization device.
The data synchronization method disclosed by the embodiment of the application can be applied to the processor 901 or realized by the processor 901. Processor 901 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the data synchronization method may be performed by integrated logic circuits of hardware in the processor 901 or instructions in the form of software. The processor 901 may be a general purpose processor, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. Processor 901 may implement or perform the methods, steps and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the application can be directly embodied in the hardware of the decoding processor or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium in the memory 902, and the processor 901 reads information in the memory 902, and in combination with the hardware thereof, performs the steps of the data synchronization method provided by the embodiment of the present application.
In an exemplary embodiment, the data synchronization device may be implemented by one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSPs, programmable logic devices (PLD, programmable Logic Device), complex programmable logic devices (CPLD, complex Programmable Logic Device), field programmable gate arrays (FPGA, field Programmable Gate Array), general purpose processors, controllers, microcontrollers (MCU, micro Controller Unit), microprocessors (Microprocessor), or other electronic components for performing the aforementioned methods.
It is to be appreciated that the memory 902 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Wherein the nonvolatile Memory may be Read Only Memory (ROM), programmable Read Only Memory (PROM, programmable Read-Only Memory), erasable programmable Read Only Memory (EPROM, erasable Programmable Read-Only Memory), electrically erasable programmable Read Only Memory (EEPROM, electrically Erasable Programmable Read-Only Memory), magnetic random access Memory (FRAM, ferromagnetic random access Memory), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or compact disk Read Only Memory (CD-ROM, compact Disc Read-Only Memory); the magnetic surface memory may be a disk memory or a tape memory. The volatile memory may be random access memory (RAM, random Access Memory), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static Random Access Memory), dynamic random access memory (DRAM, dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random Access Memory), double data rate synchronous dynamic random access memory (ddr SDRAM, double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). The memory described by embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.
In an exemplary embodiment, the present application further provides a storage medium, i.e. a computer storage medium, which may be a computer readable storage medium in particular, for example, including a memory 902 storing a computer program, where the computer program may be executed by a processor 901 of a data synchronization device to perform the steps of the method according to the embodiment of the present application. The computer readable storage medium may be ROM, PROM, EPROM, EEPROM, flash Memory, magnetic surface Memory, optical disk, or CD-ROM.
It should be noted that: "first," "second," etc. are used to distinguish similar objects and not necessarily to describe a particular order or sequence.
In addition, the embodiments of the present application may be arbitrarily combined without any collision.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily appreciate variations or alternatives within the scope of the present application.

Claims (9)

1. A method of data synchronization, the method comprising:
The method comprises the steps that if the stock data synchronization task from a source database to a target database is determined to be completed, the synchronization ending time of the stock data synchronization task is obtained;
generating incremental synchronous data of the source database based on the synchronous ending time, the redo log of the source database and the stock data of the source database;
updating the target database based on the incremental synchronization data;
wherein the generating incremental synchronization data of the source database based on the synchronization end time, the redo log of the source database, and the stock data of the source database includes:
determining first position information based on the synchronization ending time and a preset mapping relation, wherein the first position information is the position information of a redo log corresponding to the synchronization ending time, and the preset mapping relation comprises the corresponding relation between each time and the position information of the redo log;
determining initial analysis position information of the redo log based on the first position information;
analyzing the redo log based on the initial analysis position information of the redo log to generate an analysis result, wherein the analysis result comprises operation information of stock data of the source database;
And operating the stock data of the source database based on the operation information to generate incremental synchronous data of the source database.
2. The method of claim 1, wherein the log type of the redo log comprises an uncommitted log and a committed log, the determining starting resolved location information of the redo log based on the first location information comprising:
determining whether the log type of the redo log corresponding to the first position information is an uncommitted log, if so, acquiring second position information and determining the second position information as initial analysis position information of the redo log, wherein the second position information is position information of a committed log adjacent to the uncommitted log;
if not, determining the first position information as the initial analysis position information of the redo log.
3. The method of claim 1, wherein the inventory data synchronization task comprises a plurality of sub-inventory data synchronization tasks, and wherein the determining the first location information based on the synchronization end time and a preset mapping relationship comprises:
determining a third position information set based on the synchronization end time of each sub-stock data synchronization task and a preset mapping relation, wherein each third position information in the third position information set is the position information of a redo log corresponding to the synchronization end time of each sub-stock data synchronization task;
And determining first position information based on the third position information set and a preset position information selection rule, wherein the first position information is position information meeting the preset position information selection rule in the third position information set.
4. The method according to claim 1, wherein the operation information includes operation position information of the stock data and operation type information corresponding to the operation position information, the operating the stock data of the source database based on the operation information, generating incremental synchronization data of the source database includes:
acquiring operation type information corresponding to the operation position information, wherein the operation type information is one of the following: inserting operation information, updating operation information and deleting operation information;
and based on the operation type information, performing insertion operation, updating operation or deleting operation on the stock data corresponding to the operation position information, and generating incremental synchronous data of the source database.
5. The method of claim 1, wherein the inventory data synchronization task comprises a plurality of sub-inventory data synchronization tasks, and wherein the determining that the inventory data synchronization task from the source database to the target database is complete comprises:
Aiming at each sub-inventory data synchronization task in the inventory data synchronization tasks, acquiring sub-inventory synchronization data corresponding to the sub-inventory data synchronization tasks;
and determining that each sub-stock synchronous data is synchronous to the target database, and determining that the stock data synchronous task from the source database to the target database is completed.
6. The method of claim 5, wherein the inventory data of the source database includes identification information of the inventory data, the method further comprising:
cutting the stock data based on the identification information to generate at least one stock data block;
determining at least one stock data block corresponding to each sub-stock data synchronization task based on the at least one stock data block, the number of sub-stock data synchronization tasks and a preset allocation rule;
the obtaining the sub-inventory synchronous data corresponding to the sub-inventory data synchronous task includes: generating the sub-inventory synchronous data corresponding to the sub-inventory data synchronous task based on at least one inventory data block corresponding to the sub-inventory data synchronous task.
7. A data synchronization device, the device comprising:
The acquisition module is used for determining that the stock data synchronization task from the source database to the target database is completed, and acquiring the synchronization ending time of the stock data synchronization task;
the generation module is used for generating incremental synchronous data of the source database based on the synchronous ending moment, the redo log of the source database and the stock data of the source database; wherein the generating incremental synchronization data of the source database based on the synchronization end time, the redo log of the source database, and the stock data of the source database includes:
determining first position information based on the synchronization ending time and a preset mapping relation, wherein the first position information is the position information of a redo log corresponding to the synchronization ending time, and the preset mapping relation comprises the corresponding relation between each time and the position information of the redo log;
determining initial analysis position information of the redo log based on the first position information;
analyzing the redo log based on the initial analysis position information of the redo log to generate an analysis result, wherein the analysis result comprises operation information of stock data of the source database;
Operating the stock data of the source database based on the operation information to generate incremental synchronous data of the source database;
and the updating module is used for updating the target database based on the increment synchronous data.
8. A data synchronization device, comprising: a processor and a memory for storing a computer program capable of running on the processor, wherein,
the processor being adapted to perform the steps of the method of any of claims 1 to 6 when the computer program is run.
9. A computer storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method according to any of claims 1 to 6.
CN202311147251.8A 2023-09-07 2023-09-07 Data synchronization method, device, equipment and storage medium Active CN116881371B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311147251.8A CN116881371B (en) 2023-09-07 2023-09-07 Data synchronization method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311147251.8A CN116881371B (en) 2023-09-07 2023-09-07 Data synchronization method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116881371A CN116881371A (en) 2023-10-13
CN116881371B true CN116881371B (en) 2023-11-14

Family

ID=88266686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311147251.8A Active CN116881371B (en) 2023-09-07 2023-09-07 Data synchronization method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116881371B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189852A (en) * 2018-08-01 2019-01-11 武汉达梦数据库有限公司 A kind of method that data are synchronous and the device synchronous for data
CN111694840A (en) * 2020-04-29 2020-09-22 平安科技(深圳)有限公司 Data synchronization method, device, server and storage medium
CN113468135A (en) * 2021-09-01 2021-10-01 阿里云计算有限公司 Data migration method, system, device and product
CN114579534A (en) * 2020-11-18 2022-06-03 北京金山云网络技术有限公司 Data migration method and device and electronic equipment
CN115374102A (en) * 2021-07-30 2022-11-22 北京大杏科技有限责任公司 Data processing method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5061166B2 (en) * 2009-09-04 2012-10-31 Kii株式会社 Data synchronization system and data synchronization method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189852A (en) * 2018-08-01 2019-01-11 武汉达梦数据库有限公司 A kind of method that data are synchronous and the device synchronous for data
CN111694840A (en) * 2020-04-29 2020-09-22 平安科技(深圳)有限公司 Data synchronization method, device, server and storage medium
CN114579534A (en) * 2020-11-18 2022-06-03 北京金山云网络技术有限公司 Data migration method and device and electronic equipment
CN115374102A (en) * 2021-07-30 2022-11-22 北京大杏科技有限责任公司 Data processing method and system
CN113468135A (en) * 2021-09-01 2021-10-01 阿里云计算有限公司 Data migration method, system, device and product

Also Published As

Publication number Publication date
CN116881371A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
Bacon et al. Spanner: Becoming a SQL system
CN105144080B (en) System for metadata management
KR101149994B1 (en) External metadata processing
US10146837B1 (en) RLE-aware optimization of SQL queries
CN105556519A (en) Multi-version concurrency control on in-memory snapshot store of ORACLE in-memory database
CN105556520A (en) Mirroring, in memory, data from disk to improve query performance
Chavan et al. Survey paper on big data
Sreemathy et al. Data integration in ETL using TALEND
CN111400408A (en) Data synchronization method, device, equipment and storage medium
CN103460208A (en) Methods and systems for loading data into a temporal data warehouse
US10296542B2 (en) Integration database framework
US20190034453A1 (en) Flexible synchronous file system replication
CN111736964A (en) Transaction processing method and device, computer equipment and storage medium
EP3958142A1 (en) Projections for big database systems
CN116804994B (en) Data synchronization method, system, device, electronic equipment and storage medium
CN115827660B (en) Data updating method and device, electronic equipment and nonvolatile storage medium
CN111651519A (en) Data synchronization method, data synchronization device, electronic device, and storage medium
CN105353988A (en) Metadata reading and writing method and device
CN114153809A (en) Parallel real-time incremental statistic method based on database logs
CN115373889A (en) Method and device for data comparison verification and data repair in data synchronization
CN116881371B (en) Data synchronization method, device, equipment and storage medium
JP2023546818A (en) Transaction processing method, device, electronic device, and computer program for database system
CN110647518B (en) Data source fusion calculation method, component and device
CN114356945A (en) Data processing method, data processing device, computer equipment and storage medium
CN109635038B (en) Remote double-reading and writing method for structured data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant