CN112307117A - Synchronization method and synchronization system based on log analysis - Google Patents

Synchronization method and synchronization system based on log analysis Download PDF

Info

Publication number
CN112307117A
CN112307117A CN202011056091.2A CN202011056091A CN112307117A CN 112307117 A CN112307117 A CN 112307117A CN 202011056091 A CN202011056091 A CN 202011056091A CN 112307117 A CN112307117 A CN 112307117A
Authority
CN
China
Prior art keywords
interval
rollback
partial
transaction
partial rollback
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011056091.2A
Other languages
Chinese (zh)
Other versions
CN112307117B (en
Inventor
孙峰
付铨
彭青松
刘启春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Dameng Database Co Ltd
Original Assignee
Wuhan Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Dameng Database Co Ltd filed Critical Wuhan Dameng Database Co Ltd
Priority to CN202011056091.2A priority Critical patent/CN112307117B/en
Publication of CN112307117A publication Critical patent/CN112307117A/en
Application granted granted Critical
Publication of CN112307117B publication Critical patent/CN112307117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1471Saving, restoring, recovering or retrying involving logging of persistent data for recovery
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a synchronization method and a synchronization system based on log analysis, wherein the synchronization method comprises the following steps: judging the type of operation by a log receiving thread; when the operation is a DML operation, adding the DML operation and the operation number into a corresponding transaction cache file, updating a variable y to be equal to the operation number of the current DML operation, and updating a storage LSN to be equal to the log serial number of the current DML operation; when partial rollback operation is performed, a partial rollback interval [ x, y ] is constructed by adopting a target operation number x and a target variable y, the partial rollback interval [ x, y ] is added into a partial rollback chain table, and the update storage LSN is equal to the log serial number of the current partial rollback operation; and when the operation is a commit operation, distributing the corresponding transaction to the execution thread, and carrying out data synchronization by the execution thread according to the operation number of the operation to be executed and the corresponding partial rollback chain table.

Description

Synchronization method and synchronization system based on log analysis
Technical Field
The invention belongs to the technical field of synchronization, and particularly relates to a synchronization method and a synchronization system based on log analysis.
Background
The real-time synchronization of database data is a technical scheme for improving the availability of an information system and ensuring the continuity of services. Through real-time synchronization of data, the service data of the target database and the source database are kept consistent in real time, and when the source database fails and is interrupted in service, the application system can be quickly switched to the target database, so that the requirement of service continuity is met.
The database data real-time copying technology based on log analysis has the characteristics of small influence on the performance and data mode of a source database, support of a heterogeneous operating system and a database platform, high data copying performance and the like, and is widely applied to the fields of emergency disaster recovery, multi-service centers, heterogeneous resource integration, data migration and the like. The technology captures an online log or an archived log of a source database through a log capture process of a source end, analyzes that INSERT (insertion), UPDATE (UPDATE) and DELETE (deletion) operations of the database are converted into message packets with an internal specific format, sends the message packets to a destination end of a replication system through a TCP/IP (Transmission Control Protocol/Internet Protocol, abbreviated as TCP/IP) network, unpacks the message packets after the destination end receives the message packets, restores transaction information of the source end into corresponding SQL (Structured Query Language, abbreviated as SQL) statements, and performs real-time replication on a target database through a local database interface to achieve database data synchronization.
In the data synchronization system, a source end data synchronization service captures database operations according to the sequence generated by database logs, a target end data synchronization service receives management transactions according to the sequence of operations sent by the source end, the transactions are managed in a classification mode at the target end according to transaction IDs, and after a commit message of a certain transaction is received, the transaction is executed, so all the operations of the transaction need to be cached before the commit message is not received. The operation quantity scale of the transaction is unlimited, if the operation of the transaction is cached in the memory, the memory resource is inevitably strained, and the operation system is seriously crashed. Therefore, caching transaction operations by using a disk is a commonly used means for data synchronization software at present, but due to the particularity of the transaction operations, partial rollback actions of the transaction operations may occur after the transaction operations are cached in the disk, at this time, operations cached in the disk need to be cleaned, a currently commonly used cleaning mode is to truncate files after reverse positioning operations or mark corresponding rollback operations, and the like, if a mode of compressing and packaging the cache in batches by adopting a plurality of operations also needs more complicated processes such as decompressing compressed data packets, random IO can be generated in the operation process of the cleaning modes, and if large-scale partial rollback occurs, IO resources can be crowded, and the performance of other programs on a server where the random IO is located is affected.
In view of this, overcoming the deficiencies of the prior art products is an urgent problem to be solved in the art.
Disclosure of Invention
In view of the above defects or improvement requirements of the prior art, the present invention provides a synchronization method and a synchronization system based on log parsing, and aims to form a partial rollback operation interval according to an operation number, collect partial rollback actions in a partial rollback list, and do not process packed and cached operations, thereby saving IO overhead of partial rollback deletion or marking although disk space is wasted.
In order to achieve the above object, according to an aspect of the present invention, a synchronization method based on log parsing is provided, where the synchronization method is applied to a destination data synchronization system, the destination data synchronization system is provided with a log receiving thread and an executing thread in a matching manner, and a transaction cache file is set in a disk space for each transaction, where the transaction cache file is provided with a variable y in a matching manner, and the transaction cache file includes a partial rollback chain table and a storage LSN; the synchronization method comprises the following steps:
the log receiving thread judges the type of operation;
when the operation is a DML operation, acquiring an operation number of the DML operation and a transaction ID to which the DML operation belongs, and determining a corresponding transaction cache file according to the transaction ID;
adding the DML operation and the operation number into a corresponding transaction cache file, wherein an updating variable y is equal to the operation number of the current DML operation, and an updating storage LSN is equal to the log serial number of the current DML operation;
when partial rollback operation is performed, acquiring a transaction ID to which the partial rollback operation belongs and a rollback target operation number x, and determining a corresponding transaction cache file according to the transaction ID to obtain a target variable y;
constructing a partial rollback interval [ x, y ] by adopting the target operation number x and the target variable y, adding the partial rollback interval [ x, y ] into a partial rollback chain table, and updating the disk storage LSN to be equal to the log sequence number of the current partial rollback operation;
and when the operation is a commit operation, distributing the corresponding transaction to the execution thread, and carrying out data synchronization by the execution thread according to the operation number of the operation to be executed and the corresponding partial rollback chain table.
Preferably, the constructing a partial rollback interval [ x, y ] by using the target operation number x and the target variable y, and adding the partial rollback interval [ x, y ] in the partial rollback list includes:
constructing a partial rollback interval [ x, y ] by using the target operation number x and the target variable y;
adding a part of rollback intervals [ x, y ] into a part of rollback chain table according to the sequence of the target operation number x from small to large;
judging whether the newly added partial rollback interval [ x, y ] is adjacent to the existing partial rollback interval [ x, y ];
if the rolling interval is an adjacent interval, combining the newly added partial rolling interval [ x, y ] with the existing partial rolling interval [ x, y ] to obtain a combined partial rolling interval;
updating the variable y according to the value obtained by subtracting 1 from the initial value x of the merged partial rollback interval;
and if the rolling interval is not the adjacent interval, updating the variable y by the value obtained by subtracting 1 from the initial value x of the newly added partial rolling interval.
Preferably, the neighboring interval means that the value of y of the previous interval is added with 1 and the value of x of the next interval is equal.
Preferably, the synchronization method further comprises:
and when the operation is a rollback operation, deleting the transaction cache file corresponding to the rollback operation, and releasing all operations cached in the memory.
Preferably, adding the DML operation and the operation number to the corresponding transaction cache file, where an update variable y is equal to the operation number of the current DML operation, and the updating the disk LSN to be equal to the log sequence number of the current DML operation includes:
firstly, storing the DML operation and the operation number in a corresponding memory;
judging whether a cache critical point is reached;
if the cache critical point is reached, compressing all DML operations in the memory to obtain compressed data, and adding the compressed data and part of rollback list interval information into a corresponding transaction cache file;
and the updating variable y is equal to the operation number of the current DML operation, and the updating storage LSN is equal to the log sequence number of the current DML operation.
Preferably, the data synchronization performed by the execution thread according to the operation number of the operation to be executed and the corresponding partial rollback list includes:
after receiving a transaction to be executed, the execution thread takes out an operation to be executed from a corresponding transaction cache file and obtains an operation number z of the operation to be executed;
sequentially extracting partial rollback intervals [ x, y ] from the partial rollback list;
and determining whether to perform partial rollback operation according to the relative relation between the operation number z and the partial rollback interval [ x, y ] so as to perform data synchronization.
Preferably, determining whether to perform a partial rollback operation according to the relative relationship between the operation number z and the partial rollback section [ x, y ] includes:
judging whether the operation number z is smaller than an interval starting value x or not;
if the operation number z is smaller than the interval initial value x, executing the operation to be executed, and taking out the next operation to be executed;
if the operation number z is not less than the interval starting value x, judging whether the operation number z is greater than an interval ending value y;
and if the operation number z is not larger than the interval termination value y, the operation to be executed belongs to the operation in a part of rollback intervals, the operation to be executed is discarded, and the next operation to be executed is taken out.
Preferably, the synchronization method further comprises:
and if the operation number z is larger than the interval termination value y, taking out the next partial rollback interval [ x, y ], and determining whether to perform partial rollback operation according to the relative relation between the operation number z and the next partial rollback interval [ x, y ] until the traversal of all partial rollback intervals is completed.
Preferably, the synchronization method further comprises:
if the operation number z is larger than the interval termination value y of the last partial rollback interval, executing the operation to be executed;
and directly executing the operation to be executed after taking out the next operation to be executed.
To achieve the above object, according to another aspect of the present invention, there is provided a synchronization system including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor programmed to perform the synchronization method of the present invention.
Generally, compared with the prior art, the technical scheme of the invention has the following beneficial effects: in the invention, each DML operation has an increasing operation number, when a transaction generates partial rollback, a partial rollback operation interval is formed according to the operation number, partial rollback actions are collected in a partial rollback list, operations of packed cache are not processed, and IO expenses of partial rollback deletion or marking are saved although disk space is wasted. In the process of caching, part of the rollback list is stored in a cache file to improve the speed of recovering the synchronization abnormity, and repeated transaction operation before cache failure is avoided.
Drawings
Fig. 1 is a schematic flowchart of a synchronization method based on log parsing according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating an execution process of a log receiving thread according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an execution process of a thread according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a data structure of a transaction cache file according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a data structure of another transaction cache file according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a data structure of another transaction cache file according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a synchronization system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, the terms "inner", "outer", "longitudinal", "lateral", "upper", "lower", "top", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are for convenience only to describe the present invention without requiring the present invention to be necessarily constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1:
caching data into a disk file is a simple sequential write operation aiming at the file, but in an environment based on log analysis synchronization, because a transaction has the possibility of partial rollback (the partial rollback means that only a part of operations are rolled back in a certain transaction, but not all operations are rolled back), a large number of operations in the early stage are cached into the disk file, and then if the partial rollback is performed, one rollback is required, so that the complexity of caching the transaction is increased.
In order to solve the problems, the invention adopts a space exchange performance strategy to cache the affairs, the affair operation is sequentially written into the cache file in a multi-operation packing compression mode during caching, the sequential writing operation of the file is beneficial to improving the IO performance, and the multi-operation packing compression can improve the compression ratio of the operation data so as to save the disk overhead. When the transaction generates partial rollback, the action of the partial rollback is collected in a partial rollback list, and the operation of the packed cache is not processed, so that the IO overhead of partial rollback deletion or marking is saved although the disk space is wasted. When a plurality of partial rollback actions are received, adjacent partial rollback actions are combined, and the actions needing rollback are expressed in a range interval mode, so that the length of a partial rollback chain table can be effectively reduced. The checkpoint thread is matched with the fixed area of the head of each file for storing the affair to record partial rollback linked list information at regular time, and the persistent cache function of the affair data can be further realized, so that the situation that all data needs to be collected again when the fault is recovered is prevented, and the recovery speed is accelerated.
The embodiment provides a synchronization method based on log parsing, which is applied to a destination data synchronization system, where the destination data synchronization system is provided with a log receiving thread and an execution thread in a matching manner. Specifically, the destination-end data synchronization system needs to create a log receiving thread and an execution thread after starting. The log receiving thread is responsible for receiving the operation sent by the source end; the execution thread is responsible for warehousing the transaction that is committed by the acknowledgement.
The method comprises the steps of setting a transaction cache file on a disk space for each transaction by taking the transaction as a unit, wherein the transaction cache file is provided with a variable y in a matching way, and comprises a partial rollback list, a storage LSN, transaction information and an offset of the tail of the file.
When the target end data synchronization system is started after a fault, a transaction cache file before the fault is loaded, and transaction information, a storage LSN, file tail offset and partial rollback chain list information in the transaction cache file are read, so that the state of a transaction received inside the target end data synchronization system when the last fault or the last stop is recovered, a breakpoint continuous transmission function of a source end is connected conveniently, and the consistency of the transaction in the synchronization process is guaranteed.
Referring now to fig. 1, the synchronization method includes the steps of:
step 101: and the log receiving thread judges the type of operation.
And a synchronization system is deployed in the source end database and the destination end database, the source end data synchronization system reads logs from the source end database, and the destination end database synchronization system is responsible for applying the synchronization operation sent by the source end to the destination end database.
With reference to fig. 2, the log receiving thread at the destination analyzes the log to obtain an operation, determines the type of the operation, and if the operation is a DML operation, executes step 102; when the operation is a partial rollback operation, step 104 is executed; when it is a commit operation, step 106 is performed.
And when the operation is a rollback operation, deleting the transaction cache file corresponding to the rollback operation, and releasing all operations cached in the memory.
Step 102: when the operation is a DML operation, the operation number of the DML operation and the transaction ID to which the DML operation belongs are obtained, and the corresponding transaction cache file is determined according to the transaction ID.
When the source end sends an operation in a transaction, the source end needs to fill the operation number information into the operation, so that the destination end can realize a partial rollback operation through the operation number. Specifically, each operation in the database log stream has a separate operation number inside the transaction in which it resides, and the operation number is incremented from 1. Some databases (e.g., ORACLE) do not have a number in the log of operations, but may implement simulated operation numbers for each operation by other technical means during the source-side log parsing process.
Step 103: and adding the DML operation and the operation number into a corresponding transaction cache file, wherein an updating variable y is equal to the operation number of the current DML operation, and an updating storage LSN is equal to the log serial number of the current DML operation.
In this embodiment, the classification management is performed according to the transaction ID in the operation, the operation is first added to the memory, and when the cache critical point is reached, the operation in the memory is added to the transaction cache file, and the operation number of the current operation is recorded in the dedicated variable y of the transaction.
Specifically, the DML operation and the operation number are stored in a corresponding memory; judging whether a cache critical point is reached; if the cache critical point is reached, compressing all DML operations in the memory to obtain compressed data, and adding the compressed data and part of rollback list interval information into a corresponding transaction cache file; and the updating variable y is equal to the operation number of the current DML operation, and the updating storage LSN is equal to the log sequence number of the current DML operation.
The cache critical point may be whether the number of operations reaches a set value N, and when the number of operations cached in the memory reaches the set value N, the N operations are packed and compressed and then added to the corresponding transaction cache file, and the offset at the end of the file is recorded, the LSN of the current operation is stored as the storage LSN of the transaction, and then the next operation is continuously received. And when the number of the operations cached in the memory does not reach the set value N, taking the LSN of the current operation as the storage LSN of the transaction, and continuing to receive the next operation.
According to the scheme, each transaction cache operand critical point N is set to control the scale of transaction disk refreshing, the value of N can be adjusted according to different use scenes in the actual implementation process, for example, the value of N is set to be larger under the condition that the memory is enough, the probability of IO (input/output) generated by a target end data synchronization system can be reduced, and the synchronization performance is prevented from being influenced by the bottleneck of IO.
In this embodiment, the operation data is cached in units of transactions, each transaction is cached in an independent transaction cache file, the file name is named by a transaction ID for convenient management and location, when the transaction cache file is created, a space of 4K is left in front of the file for storing part of the rollback chain table information, that is, the initial offset of the file cache is 4096.
Because the operating system takes the sector as the unit when operating the file, the data cached in the transaction cache file each time is aligned according to the byte number of the sector, which is beneficial to reducing the complexity of cache operation and improving the IO performance.
Step 104: and when the partial rollback operation is performed, acquiring the transaction ID to which the partial rollback operation belongs and the rollback target operation number x, and determining a corresponding transaction cache file according to the transaction ID to obtain a target variable y.
The meaning of the partial rollback operation is that the transaction moves back from the current operation number position to the specified operation number (including the operation of the operation number).
Step 105: and constructing a partial rollback interval [ x, y ] by adopting the target operation number x and the target variable y, adding the partial rollback interval [ x, y ] into a partial rollback list, and updating the disk storage LSN to be equal to the log sequence number of the current partial rollback operation.
In this embodiment, the current operation is a partial rollback operation, a transaction ID in the operation and a rollback target operation number x are extracted, and the corresponding transaction cache file is determined according to the transaction ID, so as to obtain a target variable y.
And then constructing a partial rollback interval [ x, y ] by using the target operation number x and the target variable y, adding the partial rollback list, and sequencing the partial rollback list according to the initial number x of the interval when adding the partial rollback list to ensure that the intervals in the partial rollback list are in the order from small to large.
In a preferred embodiment, after each time of adding a part of rollback intervals, judging whether the added part of rollback intervals are adjacent to the previous or subsequent intervals, if so, merging the previous and subsequent intervals to form a larger interval to replace a small-range interval in the original linked list.
The specific implementation mode is as follows: and constructing a partial rollback interval [ x, y ] by using the target operation number x and the target variable y.
Adding a part of rollback intervals [ x, y ] into a part of rollback chain table according to the sequence of the target operation number x from small to large; judging whether the newly added partial rollback interval [ x, y ] is adjacent to the existing partial rollback interval [ x, y ]; if the rolling interval is an adjacent interval, combining the newly added partial rolling interval [ x, y ] with the existing partial rolling interval [ x, y ] to obtain a combined partial rolling interval; updating the variable y by a value obtained by subtracting 1 from the initial value x of the combined partial rollback interval, specifically, subtracting 1 from the initial value x of the combined partial rollback interval to obtain a new value x ', and assigning the new value x' to the variable y so as to update the value of the variable y; and if the new rolling interval is not the adjacent interval, updating the variable y by using the value obtained by subtracting 1 from the initial value x of the newly added partial rolling interval, specifically, subtracting 1 from the initial value x of the newly added partial rolling interval to obtain a new value x ', and assigning the new value x' to the variable y so as to update the value of the variable y. Wherein, the adjacent interval means that the value of y of the previous interval is added with 1 and the value of x of the next interval is equal.
Step 106: and when the operation is a commit operation, distributing the corresponding transaction to the execution thread, and carrying out data synchronization by the execution thread according to the operation number of the operation to be executed and the corresponding partial rollback chain table.
The current operation is a commit operation, a corresponding transaction is found according to the transaction ID of the operation, and the transaction is distributed to an execution thread to be executed and put in storage.
In addition, the destination data synchronization system is provided with a check point thread in a matching way, the check point thread regularly stores the transaction information received by the destination data synchronization system, and a recovery point during failure is set. In this embodiment, the checkpoint thread updates partial rollback link information for the transaction to the corresponding transaction file every S seconds, traverses the currently cached transaction information, and executes the following operations. By setting the thread interval S of the check point, the time for fault recovery can be adjusted, and in a frequent service environment, the interval time is shortened, so that the fault recovery can be quickly realized.
Judging the sizes of the LSN of the current transaction at the last check point and the LSN of the current storage, if the LSN of the last check point is less than or equal to the LSN of the current storage, indicating that the transaction does not receive new operation since the last check point, directly skipping without carrying out storage operation, and taking the next transaction; otherwise, performing a disk storage operation, firstly packaging and compressing the operation of the transaction memory cache, then adding the operation into a transaction cache file corresponding to the transaction, recording the offset of the tail of the file, then storing the disk storage LSN, the file tail offset and the section information in a part of rollback list of the transaction into a 4K space reserved at the head of the transaction cache file, and taking down the next transaction after the completion.
In this embodiment, the significance of the checkpoint thread is that the target needs to set a recovery point at a time of failure in the operation process of the receiving source. Before the recovery point is set, the data in the current active transaction needs to be saved, so that the source end can analyze the log from the recovery point after the fault is recovered, and the breakpoint resume function is realized.
In this embodiment, each DML operation has an incremental operation number, a partial rollback operation section is formed according to the operation number, the buffered operation that needs rollback is not deleted or marked to reduce the influence of the partial rollback operation on the cache transaction, and a policy of space exchange performance is adopted to implement the transaction caching function. In the process of caching, part of the rollback list is stored in a cache file to improve the speed of recovering the synchronization abnormity, and repeated transaction operation before cache failure is avoided.
The following describes a specific implementation process of step 106 with reference to fig. 3:
firstly, after receiving a transaction to be executed, the execution thread takes out an operation to be executed from a corresponding transaction cache file, and obtains an operation number z of the operation to be executed.
Sequentially extracting partial rollback intervals [ x, y ] from the partial rollback list; and determining whether to perform partial rollback operation according to the relative relation between the operation number z and the partial rollback interval [ x, y ] so as to perform data synchronization.
Specifically, it is determined whether the operation number z is smaller than an interval starting value x, and if the operation number z is smaller than the interval starting value x, that is, z < x, the operation to be executed is executed, and a next operation to be executed is taken out.
If the operation number z is not less than the interval starting value x, judging whether the operation number z is greater than an interval ending value y, if the operation number z is not greater than the interval ending value y, namely, if z > is equal to x and z < equalto y, the operation to be executed belongs to the operation in a part of rollback intervals, discarding the operation to be executed, and taking out the next operation to be executed.
And if the operation number z is larger than the interval termination value y, namely z > y, taking out the next partial rollback interval [ x, y ], and determining whether to perform partial rollback operation according to the relative relation between the operation number z and the next partial rollback interval [ x, y ] until the traversal of all partial rollback intervals is completed.
In an actual application scene, if the operation number z is larger than the interval termination value y of the last partial rollback interval, executing the operation to be executed; and directly executing the operation to be executed after taking out the next operation to be executed. If the operation number z is larger than the section termination value y of the last partial rollback section, it indicates that the partial rollback section has been extracted, so that operations after the operation number need to be executed, and rollback is not needed.
The basic steps of the above embodiment can be explained as follows:
firstly, when the rollback operation of the transaction part is processed, the invention does not need to delete the operation needing rollback (preventing the generation of random IO during rollback) in the cache file, and does not need to add a deletion mark on the cached operation record (namely the cache operation does not need to be stored in plaintext), the whole transaction cache always keeps the characteristic of sequential writing, so that a mode of packing a plurality of operations and compressing and then storing can be adopted when the file is written, thereby effectively saving the overhead of disk space and reducing the pressure of IO.
Secondly, the operation of partial rollback of the transaction is managed in a linked list mode, and a plurality of continuous partial rollback operations are combined by combining adjacent partial rollback intervals, so that the length of the partial rollback linked list is effectively reduced, and partial rollback operations are conveniently stored when the thread of the check point is flushed. During the execution of the transaction, the interval of the operation number in the partial rollback list is used for positioning, and the operation falling into the rollback interval is discarded, so that the partial rollback function is realized.
Example 2:
to facilitate understanding of the foregoing embodiment 1, the following description will be made by way of example. The above scheme is exemplified as follows:
the source database and the destination database both now have table T1(ID INT), and the source application has a transaction to perform the following operations on table T1:
INSERT INTO T1(ID)VALUES('1');
SAVEPOINT SP2;
INSERT INTO T1(ID)VALUES('2');
SAVEPOINT SP3;
INSERT INTO T1(ID)VALUES('3');
ROLLBACK TO SAVEPOINT SP3;
ROLLBACK TO SAVEPOINT SP2;
INSERT INTO T1(ID)VALUES('4');
SAVEPOINT SP5;
INSERT INTO T1(ID)VALUES('5');
ROLLBACK TO SAVEPOINT SP5;
INSERT INTO T1(ID)VALUES('6');
COMMIT;
the above operations form the following log operations:
operation number Operation of LSN
1 INSERT INTO T1(ID)VALUES('1') 1
2 INSERT INTO T1(ID)VALUES('2') 2
3 INSERT INTO T1(ID)VALUES('3') 3
ROLLBACK TO OPERATION NUMBER 3 4
ROLLBACK TO OPERATION NUMBER 2 5
4 INSERT INTO T1(ID)VALUES('4') 6
5 INSERT INTO T1(ID)VALUES('5') 7
ROLLBACK TO OPERATION NUMBER 5 8
6 INSERT INTO T1(ID)VALUES('6') 9
COMMIT; 10
The transaction caching process is as follows:
the destination synchronization system starts, assuming that the critical value of the transaction cache operation number is 3, the current operating system sector size is 512 bytes, receives three INSERT operations, the numbers of which are 1, 2 and 3 respectively, packs, compresses and stores the three operations, and forms the file format shown in fig. 4:
after the above operation is completed, the operation number in the variable y is 3.
And receiving a partial rollback operation, wherein the operation needs to be rolled back to an operation number 3, and constructing a rollback interval by the operation and the number in the variable y according to a rule and adding the rollback interval into a partial rollback list to form { [3, 3] }.
The section start number x is subtracted from 1 and then given to y, and the operation number of y is 2.
And receiving a partial rollback operation, wherein the operation needs to be rolled back to an operation number 2, and constructing a rollback interval by the operation and the number in the variable y according to a rule and adding the rollback interval into a partial rollback list to form { [2, 2], [3, 3] }.
And (4) checking to find that adjacent rollback intervals exist in the added partial rollback list, and need to be combined, and forming a new interval { [2, 3] } after combination.
The section start number x is subtracted from 1 and then given to y, where the operation number of y is 1.
Two INSERT operations are received, numbered 4 and 5, respectively, with y's operation numbered 5.
And receiving a partial rollback operation, wherein the operation needs to be rolled back to an operation number of 5, and constructing a rollback interval by the operation and the number in the variable y according to a rule and adding the rollback interval into a partial rollback list to form { [2, 3], [5, 5] }.
Receiving an INSERT operation with the number of 6, and packaging, compressing and storing three operations with the numbers of 4, 5 and 6 cached in the memory to form a file format as shown in fig. 5:
if the checkpoint thread is logged for the transaction at this time, the current file end offset, the disk LSN, and the partial rollback list of the transaction are stored in the first 4K spaces of the transaction file, resulting in the file format shown in fig. 6:
a COMMIT operation is received, assigned to execution by an execution thread.
The executing thread fetches the first INSERT (ID 1) operation, whose operation number is 1.
The first partial rollback interval [2, 3] is extracted.
According to the rule, the operation number 1 is smaller than the interval starting value 2, and the operation needs to be executed to execute:
INSERT INTO T(ID)VALUES(1);
and extracting a second INSERT (ID is 2) operation with the operation number of 2, wherein the operation number 2 falls into a partial rollback interval [2, 3] according to a rule, and directly discarding the operation without executing.
And extracting a third INSERT (ID is 3) operation with the operation number of 3, wherein the operation number 3 falls into a partial rollback interval [2, 3] according to a rule, and directly discarding the operation without executing.
A fourth INSERT (ID 4) operation is extracted, its operation number is 4, and according to the rule, operation number 4 is larger than the partial rollback interval [2, 3 ].
The next partial rollback interval [5, 5] is extracted.
Continuing to judge the fourth operation, wherein the operation number 4 is smaller than the interval starting value 5, and the operation needs to be executed to execute:
INSERT INTO T(ID)VALUES(4);
the fifth INSERT (ID 5) operation is extracted, its operation number is 5, and according to the rule, the operation number 5 falls in the partial rollback interval [5, 5], and the operation is directly discarded without execution.
The sixth INSERT (ID 6) operation is extracted with an operation number of 6, and according to the rule, the operation number 5 is larger than the partial rollback interval [5, 5 ].
At this time, the partial rollback section is already extracted, so the operations after the number need to be executed:
INSERT INTO T(ID)VALUES(6);
COMMIT is performed, completing the synchronization.
In the process, the operation in the cache file is traversed, and then the operation needing to be discarded is identified by combining the rollback operation number interval recorded in the partial rollback list, so that the partial rollback function of the transaction is realized.
Example 3:
referring to fig. 7, fig. 7 is a schematic structural diagram of a synchronization system according to an embodiment of the present invention. The synchronization system of the present embodiment includes one or more processors 41 and a memory 42. Fig. 7 illustrates an example of one processor 41.
The processor 41 and the memory 42 may be connected by a bus or other means, and fig. 7 illustrates the connection by a bus as an example.
The memory 42, which is a non-volatile computer-readable storage medium based on a synchronization method, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, the methods of the above embodiments, and corresponding program instructions. The processor 41 implements the methods of the foregoing embodiments by executing non-volatile software programs, instructions, and modules stored in the memory 42 to thereby execute various functional applications and data processing.
The memory 42 may include, among other things, high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 42 may optionally include memory located remotely from processor 41, which may be connected to processor 41 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It should be noted that, for the information interaction, execution process and other contents between the modules and units in the apparatus and system, the specific contents may refer to the description in the embodiment of the method of the present invention because the same concept is used as the embodiment of the processing method of the present invention, and are not described herein again.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A synchronization method based on log analysis is characterized in that the synchronization method is applied to a target end data synchronization system, the target end data synchronization system is provided with a log receiving thread and an executing thread in a matching way, a transaction cache file is arranged on each transaction in a disk space, wherein the transaction cache file is provided with a variable y in a matching way, and the transaction cache file comprises a partial rollback chain table and a storage disk LSN;
the synchronization method comprises the following steps:
the log receiving thread judges the type of operation;
when the operation is a DML operation, acquiring an operation number of the DML operation and a transaction ID to which the DML operation belongs, and determining a corresponding transaction cache file according to the transaction ID;
adding the DML operation and the operation number into a corresponding transaction cache file, wherein an updating variable y is equal to the operation number of the current DML operation, and an updating storage LSN is equal to the log serial number of the current DML operation;
when partial rollback operation is performed, acquiring a transaction ID to which the partial rollback operation belongs and a rollback target operation number x, and determining a corresponding transaction cache file according to the transaction ID to obtain a target variable y;
constructing a partial rollback interval [ x, y ] by adopting the target operation number x and the target variable y, adding the partial rollback interval [ x, y ] into a partial rollback chain table, and updating the disk storage LSN to be equal to the log sequence number of the current partial rollback operation;
and when the operation is a commit operation, distributing the corresponding transaction to the execution thread, and carrying out data synchronization by the execution thread according to the operation number of the operation to be executed and the corresponding partial rollback chain table.
2. The synchronization method according to claim 1, wherein the constructing a partial rollback interval [ x, y ] using the target operation number x and the target variable y, and adding the partial rollback interval [ x, y ] in a partial rollback list comprises:
constructing a partial rollback interval [ x, y ] by using the target operation number x and the target variable y;
adding a part of rollback intervals [ x, y ] into a part of rollback chain table according to the sequence of the target operation number x from small to large;
judging whether the newly added partial rollback interval [ x, y ] is adjacent to the existing partial rollback interval [ x, y ];
if the rolling interval is an adjacent interval, combining the newly added partial rolling interval [ x, y ] with the existing partial rolling interval [ x, y ] to obtain a combined partial rolling interval;
updating the variable y according to the value obtained by subtracting 1 from the initial value x of the merged partial rollback interval;
and if the rolling interval is not the adjacent interval, updating the variable y by the value obtained by subtracting 1 from the initial value x of the newly added partial rolling interval.
3. Synchronization method according to claim 2, characterized in that adjacent intervals refer to the x value of the preceding interval being equal to the x value of the following interval after adding 1 to the y value of the preceding interval.
4. The synchronization method of claim 1, further comprising:
and when the operation is a rollback operation, deleting the transaction cache file corresponding to the rollback operation, and releasing all operations cached in the memory.
5. The synchronization method of claim 1, wherein adding the DML operation and the operation number to a corresponding transaction cache file, wherein an update variable y equals an operation number of a current DML operation, and wherein updating a disk save LSN to equal a log sequence number of the current DML operation comprises:
firstly, storing the DML operation and the operation number in a corresponding memory;
judging whether a cache critical point is reached;
if the cache critical point is reached, compressing all DML operations in the memory to obtain compressed data, and adding the compressed data and part of rollback list interval information into a corresponding transaction cache file;
and the updating variable y is equal to the operation number of the current DML operation, and the updating storage LSN is equal to the log sequence number of the current DML operation.
6. The synchronization method according to claim 1, wherein the data synchronization of the execution thread according to the operation number of the operation to be executed and the corresponding partial rollback list comprises:
after receiving a transaction to be executed, the execution thread takes out an operation to be executed from a corresponding transaction cache file and obtains an operation number z of the operation to be executed;
sequentially extracting partial rollback intervals [ x, y ] from the partial rollback list;
and determining whether to perform partial rollback operation according to the relative relation between the operation number z and the partial rollback interval [ x, y ] so as to perform data synchronization.
7. The synchronization method according to claim 6, wherein determining whether to perform a partial rollback operation according to the relative relationship between the operation number z and the partial rollback interval [ x, y ] comprises:
judging whether the operation number z is smaller than an interval starting value x or not;
if the operation number z is smaller than the interval initial value x, executing the operation to be executed, and taking out the next operation to be executed;
if the operation number z is not less than the interval starting value x, judging whether the operation number z is greater than an interval ending value y;
and if the operation number z is not larger than the interval termination value y, the operation to be executed belongs to the operation in a part of rollback intervals, the operation to be executed is discarded, and the next operation to be executed is taken out.
8. The synchronization method of claim 7, further comprising:
and if the operation number z is larger than the interval termination value y, taking out the next partial rollback interval [ x, y ], and determining whether to perform partial rollback operation according to the relative relation between the operation number z and the next partial rollback interval [ x, y ] until the traversal of all partial rollback intervals is completed.
9. The synchronization method of claim 8, further comprising:
if the operation number z is larger than the interval termination value y of the last partial rollback interval, executing the operation to be executed;
and directly executing the operation to be executed after taking out the next operation to be executed.
10. A synchronization system, characterized in that the synchronization system comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor and programmed to perform a synchronization method as claimed in any one of claims 1 to 9.
CN202011056091.2A 2020-09-30 2020-09-30 Synchronization method and synchronization system based on log analysis Active CN112307117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011056091.2A CN112307117B (en) 2020-09-30 2020-09-30 Synchronization method and synchronization system based on log analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011056091.2A CN112307117B (en) 2020-09-30 2020-09-30 Synchronization method and synchronization system based on log analysis

Publications (2)

Publication Number Publication Date
CN112307117A true CN112307117A (en) 2021-02-02
CN112307117B CN112307117B (en) 2023-12-12

Family

ID=74488248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011056091.2A Active CN112307117B (en) 2020-09-30 2020-09-30 Synchronization method and synchronization system based on log analysis

Country Status (1)

Country Link
CN (1) CN112307117B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060253502A1 (en) * 2005-05-06 2006-11-09 Microsoft Corporation Maintenance of link level consistency between database and file system
US20180144015A1 (en) * 2016-11-18 2018-05-24 Microsoft Technology Licensing, Llc Redoing transaction log records in parallel
US20180246947A1 (en) * 2017-02-28 2018-08-30 Sap Se Persistence and Initialization of Synchronization State for Serialized Data Log Replay in Database Systems
US10452648B1 (en) * 2015-12-07 2019-10-22 Gravic, Inc. Method of ensuring transactional integrity of a system that includes a plurality of subsystems, one of which takes an action upon a loss of transactional integrity
KR20200056357A (en) * 2020-03-17 2020-05-22 주식회사 실크로드소프트 Technique for implementing change data capture in database management system
CN111694798A (en) * 2020-04-23 2020-09-22 武汉达梦数据库有限公司 Data synchronization method and data synchronization system based on log analysis
CN111694893A (en) * 2020-04-23 2020-09-22 武汉达梦数据库有限公司 Partial rollback analysis method based on log analysis and data synchronization system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060253502A1 (en) * 2005-05-06 2006-11-09 Microsoft Corporation Maintenance of link level consistency between database and file system
US10452648B1 (en) * 2015-12-07 2019-10-22 Gravic, Inc. Method of ensuring transactional integrity of a system that includes a plurality of subsystems, one of which takes an action upon a loss of transactional integrity
US20180144015A1 (en) * 2016-11-18 2018-05-24 Microsoft Technology Licensing, Llc Redoing transaction log records in parallel
US20180246947A1 (en) * 2017-02-28 2018-08-30 Sap Se Persistence and Initialization of Synchronization State for Serialized Data Log Replay in Database Systems
KR20200056357A (en) * 2020-03-17 2020-05-22 주식회사 실크로드소프트 Technique for implementing change data capture in database management system
CN111694798A (en) * 2020-04-23 2020-09-22 武汉达梦数据库有限公司 Data synchronization method and data synchronization system based on log analysis
CN111694893A (en) * 2020-04-23 2020-09-22 武汉达梦数据库有限公司 Partial rollback analysis method based on log analysis and data synchronization system

Also Published As

Publication number Publication date
CN112307117B (en) 2023-12-12

Similar Documents

Publication Publication Date Title
CN109241185B (en) Data synchronization method and data synchronization device
CN110262929B (en) Method for ensuring consistency of copying affairs and corresponding copying device
JP7271670B2 (en) Data replication method, device, computer equipment and computer program
CN109460349B (en) Test case generation method and device based on log
US8868512B2 (en) Logging scheme for column-oriented in-memory databases
CN106709043A (en) Data synchronous loading method based on database log
CN111221907B (en) Database added column synchronization method and device based on log analysis
CN111241094B (en) Database deleted column synchronization method and device based on log analysis
CN111694800A (en) Method for improving data synchronization performance and data synchronization system
CN111858501B (en) Log reading method based on log analysis synchronization and data synchronization system
CN104778225A (en) Method for synchronizing data in unstructured data multi-storage system
CN109558452B (en) Synchronization method for query table building operation
CN111177254B (en) Method and device for data synchronization between heterogeneous relational databases
CN112559626B (en) Synchronous method and synchronous system of DDL operation based on log analysis
CN111694893A (en) Partial rollback analysis method based on log analysis and data synchronization system
CN114706836B (en) Data life cycle management method based on airborne embedded database
CN111694798A (en) Data synchronization method and data synchronization system based on log analysis
CN112307117B (en) Synchronization method and synchronization system based on log analysis
CN111930828B (en) Data synchronization method and data synchronization system based on log analysis
CN112559473B (en) Priority-based two-way synchronization method and system
CN111858504B (en) Operation merging execution method based on log analysis synchronization and data synchronization system
CN111221909A (en) Database modification column synchronization method and device based on log analysis
CN111858503A (en) Parallel execution method and data synchronization system based on log analysis synchronization
CN108121514B (en) Meta information updating method and device, computing equipment and computer storage medium
CN114297216B (en) Data synchronization method and device, computer storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Applicant after: Wuhan dream database Co.,Ltd.

Address before: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Applicant before: WUHAN DAMENG DATABASE Co.,Ltd.

CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Sun Feng

Inventor after: Peng Qingsong

Inventor after: Liu Qichun

Inventor before: Sun Feng

Inventor before: Fu Quan

Inventor before: Peng Qingsong

Inventor before: Liu Qichun

GR01 Patent grant
GR01 Patent grant