CN112035464B - Data synchronization filtering method and synchronization device based on log analysis - Google Patents

Data synchronization filtering method and synchronization device based on log analysis Download PDF

Info

Publication number
CN112035464B
CN112035464B CN202010714030.4A CN202010714030A CN112035464B CN 112035464 B CN112035464 B CN 112035464B CN 202010714030 A CN202010714030 A CN 202010714030A CN 112035464 B CN112035464 B CN 112035464B
Authority
CN
China
Prior art keywords
result set
column
filtering
target
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010714030.4A
Other languages
Chinese (zh)
Other versions
CN112035464A (en
Inventor
孙峰
彭青松
刘启春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Dream Database Co ltd
Original Assignee
Wuhan Dream Database Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Dream Database Co ltd filed Critical Wuhan Dream Database Co ltd
Priority to CN202010714030.4A priority Critical patent/CN112035464B/en
Publication of CN112035464A publication Critical patent/CN112035464A/en
Application granted granted Critical
Publication of CN112035464B publication Critical patent/CN112035464B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a data synchronization filtering method and a synchronization device based on log analysis, wherein the filtering method comprises the following steps: the target end establishes a database connection connected to the source end, and establishes a result set of the associated column of the reference table in the memory of the target end; receiving operation logs of a reference table and a synchronous table, analyzing the operation logs of the reference table, and dynamically maintaining a result set of a correlation column according to operation types and filtering conditions; analyzing the operation log of the synchronous table to obtain the column value of the associated column of the synchronous table; judging whether the column value of the associated column of the synchronous table exists in the result set; and if so, adding the current operation log into an operation queue to be executed of the synchronous table. The result set is cached in the memory, and is dynamically maintained, so that when the data in the synchronous table is synchronized, the filtering of the data can be realized without the selection query operation of the reference table, the efficiency is higher, and the real-time synchronization performance of the data of the synchronous table is improved.

Description

Data synchronization filtering method and synchronization device based on log analysis
Technical Field
The invention belongs to the technical field of synchronization, and particularly relates to a data synchronization filtering method and a synchronization device based on log analysis.
Background
In real-time synchronization of heterogeneous database data, a corresponding data filtering strategy is usually implemented on a synchronous data table based on service requirements, so as to achieve the purpose of synchronizing only part of the required data. For such a data synchronization rule: the source database table a and the table B are associated through corresponding fields, and the synchronization table a needs to refer to the data in the table B in the source database to realize synchronous data filtering, that is, the data synchronized in the table a needs to satisfy such a filtering condition, and the data not satisfying such a condition is not synchronized. For the data filtering condition of the association through the table field, when the synchronization tool synchronizes each row of data in the table a, the selection query operation is required to be performed in the table B according to the association field (field), and for the row records meeting the filtering condition, the data synchronization is performed, otherwise, the filtering is performed. Obviously, frequent select queries affect the performance of the source database and affect the use of table B, and on the other hand, when the amount of data in table B is large, select queries are more time consuming and affect the performance of table a synchronization.
In view of this, overcoming the shortcomings of the prior art products is a problem to be solved in the art.
Disclosure of Invention
Aiming at the above defects or improvement demands of the prior art, the invention provides a filtering method and a synchronization device for data synchronization based on log analysis, which aims to cache a result set in a memory, cache the column values of the associated columns of a reference table to obtain the result set, store the result set in the memory of a target end, dynamically maintain the result set by capturing the operation log of the reference table in the synchronization process, and directly search the result set according to the associated field (the column value of the associated column) when synchronizing the data in the synchronization table, thereby realizing the filtering of the data without the select query operation of a source table, having higher efficiency and improving the real-time synchronization performance of the data of the synchronization table.
To achieve the above object, according to one aspect of the present invention, there is provided a filtering method for data synchronization based on log parsing, the filtering method comprising:
the target end establishes a database connection connected to the source end, and establishes a result set of the association column in the memory of the target end;
receiving an operation log of a reference table, analyzing the operation log of the reference table, and dynamically updating a result set of an associated column according to the operation type and the filtering condition;
Receiving an operation log of a synchronous table, and analyzing the operation log of the synchronous table to obtain a column value of an associated column of the synchronous table;
judging whether the column value of the associated column of the synchronous table exists in the result set;
if so, adding the current operation log into an operation queue to be executed of the synchronous table.
Preferably, the creating the database connection by the target end includes:
the target end reads the data source configuration file, extracts data source connection information, and creates connection to the source end database according to the data source connection information, wherein the data source connection information comprises a database address, a database connection user name, a password and a connection port.
Preferably, the establishing the result set of the association column in the target memory includes:
reading a result set cache configuration file, and extracting table names, associated column names and filtering conditions of a reference table;
constructing a query statement according to the table name, the associated column name and the filtering condition of the reference table;
inquiring a target column value meeting the filtering condition through the inquiry statement based on connection to a source database;
and constructing a hash cache structure in the target-side memory, taking the target column value as a hash search key value, and constructing a result set based on the hash structure.
Preferably, the filtering method further comprises:
receiving an operation log of the reference table, and dynamically updating the result set according to the operation log of the reference table;
wherein, the process of dynamic update includes:
judging the operation type of the operation log of the reference table;
if the operation log is the DML operation, judging whether the operation log meets the filtering condition or not;
and if yes, extracting a target column value of an associated column of the reference table, and strategically updating the result set according to the operation type of the DML operation.
Preferably, the method further comprises:
when the DML operation is an INSERT operation, judging whether the INSERT operation meets the filtering conditions set in the configuration file;
if not, not updating the result set;
if yes, judging whether a target column value corresponding to the INSERT operation exists in the result set;
if so, not updating the result set;
and if the result set does not exist, carrying out hash caching on the target column value corresponding to the INSERT operation so as to update the result set.
Preferably, the method further comprises:
when the DML operation is an UPDATE operation, judging whether a target column value corresponding to the UPDATE operation exists in the result set;
If yes, judging whether the UPDATE operation meets the filtering conditions set in the configuration file;
if yes, releasing the old value of the UPDATE operation in the result set, and carrying out hash caching on the new value of the UPDATE operation to UPDATE the result set;
if not, releasing the old value of the UPDATE operation in the result set to UPDATE the result set;
if not, judging whether the UPDATE operation meets the filtering conditions set in the configuration file;
if yes, carrying out hash caching on a target column value corresponding to the UPDATE operation so as to UPDATE the result set;
if not, the result set is not updated.
Preferably, the method further comprises:
when the DML operation is a DELETE operation, the column value of the DELETE operation in the result set is released.
Preferably, the method further comprises:
and if the result set is a TRUNCATE operation or a DROP operation, releasing the hash structure of the result set and resetting to an uninitialized state.
Preferably, determining whether a column value of an associated column of the synchronization table exists in the result set includes:
and taking the column value of the associated column of the synchronous table as a hash search key value, performing hash search in a hash cache structure, and judging whether the column value of the associated column of the synchronous table exists in the result set.
To achieve the above object, according to another aspect of the present invention, there is provided a synchronization apparatus including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being programmed to perform the filtering method of the present invention.
In general, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects: the invention provides a filtering method and a synchronizing device for data synchronization based on log analysis, wherein the filtering method comprises the following steps: the target end establishes a database connection connected to the source end, and establishes a result set of the association column in the memory of the target end; receiving an operation log of a reference table for analysis, and dynamically maintaining a result set of a correlation column according to the operation type and the filtering condition; receiving an operation log of a synchronous table, and analyzing the operation log of the synchronous table to obtain a column value of an associated column of the synchronous table; judging whether the column value of the associated column of the synchronous table exists in the result set; if so, adding the current operation log into an operation queue to be executed of the synchronous table.
In the invention, the operation log of the reference table is captured in real time to maintain and buffer the result set of the column values of the associated columns of the reference table, the result set is stored in the memory of the target end and is maintained in the latest state in the synchronous maintenance process, when the data in the synchronous table are synchronized, the direct search is carried out in the result set only according to the associated fields (the column values of the associated columns), the filtering of the data can be realized without the select query operation of the source table, the efficiency is higher, and the real-time synchronous performance of the data of the synchronous table is improved.
Drawings
Fig. 1 is a flow chart of a filtering method for data synchronization based on log parsing according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a dynamic update result set according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a synchronization device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In the description of the present invention, the terms "inner", "outer", "longitudinal", "transverse", "upper", "lower", "top", "bottom", etc. refer to an orientation or positional relationship based on that shown in the drawings, merely for convenience of describing the present invention and do not require that the present invention must be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Example 1:
referring to fig. 1, the present embodiment provides a filtering method for data synchronization based on log parsing, where the filtering method includes the following steps:
step 101: the target end creates a connection to the source end database and establishes a result set of the association column in the target end memory.
In this embodiment, the process of creating the connection to the source database by the destination is: the target end reads the data source configuration file, extracts data source connection information, and creates connection to the source end database according to the data source connection information, wherein the data source connection information comprises a database address, a database connection user name, a password and a connection port.
The process of establishing the result set of the association column in the memory of the target end is as follows: reading a result set cache configuration file, and extracting table names, associated column names and filtering conditions of a reference table; constructing a query statement according to the table name, the associated column name and the filtering condition of the reference table; inquiring a target column value meeting the filtering condition through the inquiry statement based on connection to a source database; and constructing a hash cache structure in the target-side memory, taking the target column value as a hash search key value, and constructing a result set based on the hash structure.
Step 102: and analyzing the operation log of the reference table, and dynamically updating the result set of the associated column according to the operation type and the filtering condition.
That is, in the present embodiment, the acquisition of the result set is largely divided into an initial setup process and a dynamic update process.
The initial establishment process is as follows: the destination end establishes database connection with the source end, and inquires a target column value meeting the filtering condition through an inquiry statement; and constructing a hash cache structure in the target-side memory, taking the target column value as a hash search key value, and constructing a result set based on the hash structure.
The dynamic updating process is as follows: and dynamically updating the result set of the associated column according to the operation type and the filtering condition of the operation log of the reference table. See the corresponding text description of figure 2 for details.
Step 103: and receiving an operation log of a synchronous table, and analyzing the operation log of the synchronous table to obtain a column value of an associated column of the synchronous table.
Step 104: and judging whether column values of associated columns of the synchronous table exist in the result set.
Step 105: if so, adding the current operation log into an operation queue to be executed of the synchronous table.
In this embodiment, at the heterogeneous data synchronization source, the data synchronization service captures and analyzes the transaction operation logs of the synchronization table and the reference table in the source database based on the log capturing and analyzing technology, and sends the transaction operation logs to the target data synchronization service after internal message conversion.
And at the heterogeneous data synchronization target end, after the data synchronization service finishes the initialization preparation operation in the step 101, waiting for receiving the transaction operation log sent by the source end. When receiving the transaction operation log of the synchronization table, the synchronization executing steps are as follows: analyzing the transaction information sent by the source end, and extracting the column value of the associated column of the synchronous table from the transaction information; taking the column value of the associated column as a hash search key value, performing hash search in the hash cache structure constructed in the step 101, judging whether the column value of the associated column of the synchronous table exists in the result set, and discarding the transaction operation log if the column value of the associated column does not exist; if yes, the current operation log is added into an operation queue to be executed of the synchronous table, and after the commit operation (commit) of the transaction is received, the log execution module is awakened to execute synchronous execution.
For example, the synchronization table in the source database system is defined as a table a, where the table a needs to perform partial data filtering in the synchronization process, and the filtering rule is as follows:
for convenience of description, a column data in table a needs to refer to the associated column data of other tables B (reference tables) in the source database, and for convenience of description, the associated column in table a is defined as a CA column, and the associated column in table B is defined as a CB column.
When the data in table a are synchronized, the filtering condition needs to be satisfied: select from a sphere a.ca in (select b.cb from B where condition), wherein the condition of the filtering condition can be selected according to the actual service requirement to obtain a CB column value set meeting a specific condition.
When the CA column value in the table A does not meet the rule, filtering is performed, and data synchronization is not needed. That is, when the CA column value in table a does not exist in the CB column value set satisfying the specific filtering condition in table B, data synchronization is not required.
In an actual application scenario, the target end receives the operation log of the reference table sent by the source end, dynamically updates the result set according to the operation log of the reference table, and in combination with fig. 2, in step 102, the process of dynamically updating the result set includes the following steps:
step 201: and receiving the operation log of the reference table, and judging the operation type of the operation log of the reference table.
In this embodiment, if the DML operation is performed, step 202 is performed; if it is DDL operation, then step 204 is performed.
Step 202: and if the operation log is the DML operation, judging whether the operation log meets the filtering condition or not.
In this embodiment, an operation log of the reference table is received, a target column value of the associated column is extracted according to the associated column name, and whether the target column value meets the filtering condition is determined. If yes, step 203 is executed, if not, the operation log is ignored, step 201 is returned, and a next operation log is received to update the result set.
Step 203: and if yes, extracting a target column value of an associated column of the reference table, and strategically updating the result set according to the operation type of the DML operation.
In this embodiment, when the DML operation is an INSERT operation, it is determined whether the INSERT operation satisfies a filtering condition set in a configuration file, and if not, the result set is not updated; if yes, judging whether a target column value corresponding to the INSERT operation exists in the result set; if so, not updating the result set; and if the result set does not exist, carrying out hash caching on the target column value corresponding to the INSERT operation so as to update the result set.
When the DML operation is an UPDATE operation, determining to release the old value of the UPDATE operation in the result set according to the filtering condition; and/or hash the new value of the UPDATE operation to UPDATE the result set.
In the actual application scene, aiming at the UPDATE, whether the UPDATE exists in the result set is firstly judged, if so, whether the UPDATE meets the filtering condition is judged, because the target column which can be originally cached in the result set is possibly not met with the filtering condition due to the limitation of the filtering condition, and the target column is deleted from the result set at the moment because a certain column value is updated; conversely, when a row that is not originally present in the cached results set becomes satisfied with a filtering condition due to an update of a column value, it is added to the results set.
The specific implementation process is as follows: when the DML operation is an UPDATE operation, judging whether a target column value before UPDATE of the UPDATE operation exists in the result set; if yes, judging whether the UPDATE operation meets the filtering conditions set in the configuration file; if yes, releasing the old value (target column value before updating) of the UPDATE operation in the result set, and carrying out hash caching on the new value (target column value after updating) of the UPDATE operation to UPDATE the result set; if not, releasing the old value of the UPDATE operation in the result set to UPDATE the result set; if not, judging whether the UPDATE operation meets the filtering conditions set in the configuration file; if yes, carrying out hash caching on the target column value updated by the UPDATE operation to UPDATE the result set; if not, the result set is not updated.
When the DML operation is a DELETE operation, the column value of the DELETE operation in the result set is released.
Step 204: and if the result set is a TRUNCATE operation or a DROP operation, releasing the hash structure of the result set and resetting to an uninitialized state.
In this embodiment, a hash structure is built in the memory, the column values of the associated columns of the cache reference table are cached to obtain a result set, and dynamic real-time update of the result set is realized based on a log capturing and analyzing technology. When the data in the synchronous table is synchronized, the data can be filtered only by directly searching in the hash structure according to the associated field (the column value of the associated column) without the select query operation of the source table, the hash searching efficiency is higher, and the real-time data synchronization performance of the synchronous table is improved. On the other hand, the associated data required by the hash cache can be realized through the filtering condition, the size of the data cache is reduced, and the whole data table is not required to be queried by selection.
Example 2:
in combination with embodiment 1, an embodiment of the present invention provides a synchronization device, which can implement the filtering method of the foregoing embodiment 1, and the synchronization device includes:
the system comprises a source end data synchronization tool, a log acquisition and analysis module and a log data network transmission module, wherein the source end data synchronization tool comprises a log acquisition and analysis module and a log data network transmission module; and the transaction log is responsible for capturing and analyzing the source-side synchronization table and is sent to the target-side data synchronization tool.
The target end data synchronization tool comprises a log receiving cache module and a log synchronization execution module; and the system is responsible for receiving the transaction log sent by the source terminal and synchronously executing or filtering and discarding according to the corresponding synchronous filtering rule.
The data source configuration file mainly comprises configuration information such as an address, a connection user name, a connection password and the like of the data source and is mainly used for connecting the data source.
The result set cache configuration file mainly comprises a data source name, a table mode name, a table name, an associated field column name, filtering conditions and the like and is mainly used for acquiring a data source result set.
Example 3:
in order to make the method of the present invention easy to understand, firstly, the structures and initial values of the synchronization table a and the association table B in the source database are set as follows:
cteate table A(c_key varchar(100),c2 int);
create table B(c1 int,c_key varchar(100),c3 varchar(100));
insert into B values(1,‘key1’,‘test1’);
insert into B values(2,‘key2’,‘test2’);
insert into B values(2,’key3’,’test3’);
Insert into B values(2,’key4’,’test4’);
insert into B values(3,‘key5’,‘test5’);
commit;
the field c_key in the synchronization table a and the reference table B is an associated field, and when the synchronization table a is synchronized, the following conditions need to be satisfied:
Select*from A where c_key in(select c_key from B where c1=2)
according to the above description, the following synchronization effect example is given:
when synchronization table a performs the following insertion:
insert into A values(‘key1’,111);
commit;
for this row record, the filtering is performed because the c_key field value in the a table does not satisfy the synchronization rule described above.
When synchronization table a performs the following insertion:
insert into A values(‘key2’,222);
commit;
for this line record, since the c_key field value in the a table satisfies the above synchronization rule, synchronization execution is performed.
The implementation steps of the embodiment of the invention are as follows:
s1, at a heterogeneous data synchronization source terminal, a data synchronization service captures and analyzes transaction operation logs of a table A and a table B of two categories in a source terminal database based on a log capturing and analyzing technology, and sends the transaction operation logs to a target terminal data synchronization service after internal message conversion.
S2, at the heterogeneous data synchronization target end, the data synchronization service receives the transaction operation log of the source end and performs data synchronization execution, and firstly starts initialization preparation operation, specifically comprising the following initialization flow:
s2-1, reading a data source configuration file, and extracting data source connection information, wherein the data source connection information comprises information such as a database address, a database connection user name, a password, a connection port and the like, and creating a database connection to a source end.
S2-2, reading a result set cache configuration file, extracting parameter information comprising a table name B, an associated column name c_key, a filtering condition c_key=2 and the like, and constructing a select query statement: select c_ key from B where c _key=2, and the associated column c_key column result set of the referenced table B is queried out using the connection to the source database connection in S2-1, in this embodiment the result set is ('key 2', 'key3', 'key 4').
S2-3, according to the query result set in the S2-2, taking the c_key column value as a hash search key value, constructing a hash cache structure in the memory of the target end, and caching the query result set of the c_key column.
And S3, at the heterogeneous data synchronization target end, after the data synchronization service finishes the initialization preparation operation of the S2, waiting for receiving the transaction operation log sent by the source end.
For the operations of insert into A values ('key 1', 111) and insert into A values ('key 2', 222) of table a, synchronization is performed as follows:
analyzing the transaction information sent by the source end, and extracting a c_key column value 'key1', 'key2' of a table A from the transaction information;
step (2) performing hash search in the hash buffer structure constructed in the above step S2-3 by using the c_key column value as a hash search key value, wherein the 'key1' is not in the above hash buffer, so as to filter the synchronization operation, and the 'key2' is in the above hash buffer, so as to continue the following step (3);
and (3) adding the transaction operation of the table A into an operation queue to be executed of the transaction, and waking up a log execution module to synchronously execute after receiving the commit operation (commit) of the transaction.
Operation for the embodiment of table B: insert into B values (6, 'key6', 'test 6'); insert into B values (2, 'key7', 'test 7').
When receiving the transaction operation log of the source end table B, the data synchronization service updates the hash cache structure, and the processing steps are as follows:
analyzing the transaction information sent by the source end, and extracting each column value of the table B from the transaction information; if resolve to DDL operation of Table B, go to step (4);
step (2) judges whether the column value of the present table B is satisfied according to the filtering condition of select c_ key from B where c1 =2 in the result set cache configuration file. For insert into B values (6, 'key6', 'test 6') operations, the cache update operation is ignored because the c1 column value is 6, the above filtering condition is not satisfied; for insert into B values (2, 'key7', 'test 7') operations, continuing with step (3) because the c1 column value is 2, satisfying the upper filtering condition;
and (3) carrying out hash searching in the hash cache structure constructed in the step (S2-3) according to the c_key column value of the table B provided in the step (1), and if the hash cache structure is not initialized, firstly carrying out initialization (S2-2 and S2-3).
For the INSERT operation, the 'key7' hits, so that the hit is newly added to the hash cache, and the current cache result set is ('key 2', 'key3', 'key4', 'key 7'), thereby completing the cache update operation.
Step (4) for DDL operations on table B: for a trunk operation, releasing the hash structure of the table and resetting to an uninitialized state; for DROP operation, the hash structure of the table is released and reset to an uninitialized state.
Example 4:
in an actual application scenario, a large number of batch execution operations may exist in each transaction executed concurrently on the source database, the database system will execute the operation according to the concurrency control mechanism, mutually exclusive execute the conflicting transaction operations, and record the operation logs of each transaction in the log file sequentially, and the batch operation of the source should be restored as far as possible during data synchronization to improve the synchronization performance. If the target-end data copying software performs serial execution strictly according to the transaction submitting sequence in the source-end log stream, the same operations in the transactions can be merged and then are performed in batches, but the serial execution efficiency is very low, so that a multithreaded parallel execution strategy is often adopted when the target-end synchronously executes the transactions. In the parallel execution environment, the same operations are combined in the execution of a single transaction, and then the synchronous performance is improved in a batch execution mode, however, the parallel execution needs to consider whether the problem of data correlation exists between the executing transactions, and the transactions cannot irregularly combine the same operations in the execution. Therefore, on the premise of ensuring the consistency of data replication, the method combines the operations in the transaction, improves the parallel execution efficiency of the data replication of the target end, and becomes an important technical problem to be solved in the industry.
In this embodiment, the log synchronous execution module is configured with a log receiving thread, a group of transaction executing threads and an executing thread linked list, where the log receiving thread is responsible for receiving and managing the transaction sent from the source end data synchronization system; the transaction execution thread is responsible for executing and warehousing the transaction, and a plurality of transaction execution threads can execute in parallel; the execution thread linked list is used for registering the commit order of the to-be-executed transaction in the execution transaction thread at the source end, and is arranged in sequence according to the size of the commit log sequence number of the transaction.
In combination with the foregoing embodiment 1, in step 104, the synchronous execution module obtains the transaction identification number of the current operation log, adds the current operation to be executed to the corresponding transaction to be executed according to the transaction identification number, and distributes the transaction to be executed to the corresponding transaction execution thread for data synchronization after receiving the commit operation of the corresponding transaction.
In this embodiment, the types of the operation log include DML operation and commit operation, and the specific process of performing data synchronization is as follows:
(1) And after receiving the commit operation, the log receiving thread sets a commit number for the commit operation according to the sequence, and distributes the transaction to be executed, to which the commit operation belongs, to the corresponding transaction executing thread.
When distributing the transaction to be executed to the transaction execution thread, the transaction with small commit log sequence number is required to be distributed according to the sequence of the log sequence number of the commit operation of the transaction, and represents that the transaction is committed first at the source end, and then when executing at the target end, the transaction is required to be distributed to the transaction execution thread first, so that the transaction execution thread can be ensured to begin executing the transaction committed first.
(2) After receiving a DML operation, a log receiving thread acquires a target commit number of a commit operation which occurs before the DML operation and is closest to the DML operation, and marks the DML operation by adopting the target commit number.
In the practical application scenario, when the database log is written into the log, the serial mode is adopted, that is, the log generated by the transaction operation executed in parallel in the database is written into the log file alternately, and if only one active transaction performs batch UPDATE operation on a certain table, the UPDATE log of the table is recorded continuously in the database log; if two active transactions do UPDATE operations with respect to the same table, respectively, then the database log will alternately record the UPDATE log of the two transactions on that table. Thus, the commit number of the previous commit operation may be appended to each DML operation to determine whether there are other conflicting transactions between two neighboring operations in a single transaction.
In this embodiment, after receiving an operation of a source end, a log receiving thread parses the operation to obtain a type of the operation, and when receiving a DML operation, the log receiving thread occurs before the DML operation and is closest to a target commit number of a commit operation of the DML operation, and marks the DML operation with the target commit number.
In an actual application scene, when receiving a DML operation, the log receiving thread also analyzes the DML operation to obtain object information related to the DML operation, an operation type of the DML operation and a transaction identification number to which the DML operation belongs, wherein the object information comprises table information, view information or index information, and the operation type of the DML operation comprises a deleting operation, an inserting operation and an updating operation.
Then classifying the DML operation into corresponding transactions according to the transaction identification number to which the DML operation belongs; and adding object information related to the DML operation and the operation type of the DML operation into corresponding transactions, and distributing the transaction to be executed, to which the commit operation belongs, to corresponding transaction execution threads after receiving the commit operation.
In an actual application scene, the object information related to the DML operation is used for judging whether operations related to the same object as the DML operation exist in other transactions to be executed or not when synchronization is carried out, so as to determine whether the operations can be combined or not; the operation type of the DML operation is used for judging the compatibility of the operation and the DML operation when the operation related to the same object with the DML operation exists in other transactions to be executed during data synchronization so as to determine whether the operation can be combined.
(3) And the transaction execution thread fetches the current operation to be executed from the transaction to be executed.
Each transaction execution thread also needs to initialize a waiting operation queue after starting to collect the same type of operation so as to combine and realize batch execution.
In this embodiment, a plurality of transaction execution threads may execute in parallel, and each transaction execution thread takes out an operation to be executed from the transaction to be executed that is responsible for the transaction execution thread, determines a type of the operation to be executed currently, and if the operation to be executed currently is a DML operation, determines whether the operation type of the operation to be executed currently is the same as the operation type of the operation existing in the operation queue to be executed. If the operation types are the same, executing the following step (4), and further determining whether operation merging can be performed; if the operation types are different, executing and emptying the existing operation in the operation queue to be executed, and executing the following step (4).
(4) And determining compatibility of the current operation to be executed and other transactions to be executed according to the target commit number carried by the current operation to be executed, so as to determine whether operation merging can be performed.
In this embodiment, according to the target commit number carried by the current operation to be executed, compatibility between the current operation to be executed and other transactions to be executed is determined, so as to determine whether operation merging can be performed.
In this embodiment, the manner of determining compatibility is: and in other transactions to be executed, determining conflict transactions which conflict with the current operations to be executed according to target commit numbers carried by the current operations to be executed, wherein the conflict transactions refer to that commit operations are also included between two adjacent operations of the current transactions in a log stream, and the transactions to be executed to which the commit operations belong are conflict transactions. After the conflict transaction is determined, judging whether an associated object associated with the current operation to be executed exists in the conflict transaction, and if not, making the current operation to be executed compatible with the conflict transaction; if yes, further judging whether the operation of the conflict transaction on the associated object is compatible with the current operation to be executed, and if yes, adding the current operation to be executed to the tail end of an operation queue to be executed; if the operation queue is incompatible, after the conflict transaction is submitted, the current operation to be executed is added to the tail end of the operation queue to be executed.
In this embodiment, the operation sequence recorded in the log stream of the database may directly reflect the sequence of the operations of each transaction executed in the database, and the commit operation in the log stream is used as a boundary to reflect the parallelism of the operations of each transaction executed in the data. The method has the advantages that the operations are merged by judging whether the commit operation of other transactions exists between two operations in a single transaction or not, and the merging operations are carried out as far as possible by judging strategies such as a table, an operation compatibility and the like involved in the transactions corresponding to the operations and the commit operation when the commit operation is mingled, so that the synchronous performance can be effectively improved to the maximum extent.
The following specifically describes the implementation procedure of step (4):
step 4-1: and judging whether the target commit number carried by the current operation to be executed is the same as the target commit number carried by the last operation in the operation queue to be executed.
If the submitted numbers of the two are the same, the two can be combined and executed, and the step 4-2 is executed; if the submitted numbers of the two are different, it is indicated that the two may not be executed in combination, and step 4-3 is executed.
Step 4-2: and if the submitted numbers are the same, adding the current operation to be executed to the tail part of the operation queue to be executed.
In a preferred embodiment, after adding the current operation to be executed to the operation queue to be executed, judging whether the number of the existing operations in the operation queue to be executed has reached a set value, if so, warehousing the existing operations in the operation queue to be executed in batches to empty the operation queue to be executed, thereby preventing the operation queue to be executed from caching too many operations and influencing the occupation of the memory.
Step 4-3: if the commit numbers are different, the conflicting transactions in the two target commit numbers are sequentially extracted.
In this embodiment, if the commit numbers of the two operations are different, it is indicated that the two operations may not be executed together, and it is necessary to continuously determine whether the conflict transaction between the current operation to be executed and the two operations is consistent. Therefore, it is necessary to determine the conflict transaction between two operations first, specifically, sequentially extract the conflict transactions in the two target commit numbers, where the two target commit numbers may differ by only 1, there is only one conflict transaction, the two target commit numbers may differ by 2, 3 or more, there are 2, 3 or more conflict transactions correspondingly, all the conflict transactions need to be acquired, and then sequentially determine whether the operation to be executed currently is compatible with the conflict transaction.
In this embodiment, a log sequence number of a commit operation and a commit number of the commit operation are registered in an execution thread linked list, a commit number of a conflict transaction located between two target commit numbers is obtained, and based on the obtained commit number of the conflict transaction, the log sequence number of the commit operation of the conflict transaction is obtained from the execution thread linked list, so as to determine the conflict transaction.
In this embodiment, a transaction is a conflicting transaction involved between any two operations in the log stream, and if the object involved in the conflicting transaction does not conflict with the object involved in the currently pending operation, the two operations may be merged. Since commit numbers are sequentially incremented, the corresponding transaction can be found by calculating all commit numbers for the intervening commit number in both operations.
Step 4-4: and judging whether an associated object associated with the operation to be currently executed exists in the conflict transaction.
After the conflict transaction is acquired, acquiring operation objects of all operations contained in the conflict transaction, and judging whether the object related to the current operation to be executed is associated with the acquired operation object or not so as to judge whether the associated object associated with the current operation to be executed exists in the conflict transaction or not.
If the associated object does not exist, executing the step 4-5; if there is an associated object, step 4-6 is performed.
Step 4-5: and if the associated object does not exist, adding the current operation to be executed to the tail part of the operation queue to be executed.
Step 4-6: if the associated object exists, judging whether the operation of the conflict transaction on the associated object is compatible with the operation to be executed currently or not so as to determine whether operation merging can be carried out or not.
In this embodiment, if there is an associated object, it is necessary to determine whether the conflict transaction is compatible with the current operation to be executed by combining the current operation to be executed and the type of operation performed on the associated object by the conflict transaction. Specific rules are as follows, steps 4-7 and 4-9.
Step 4-7: and if the conflict transaction and the current operation to be executed perform insertion operation or deletion operation on the associated object, the conflict transaction and the current operation to be executed are compatible.
Step 4-8: and adding the current operation to be executed to the tail of the operation queue to be executed.
Step 4-9: and if the conflict transaction and the current operation to be executed perform update operation on the associated object, or the operation type of the operation performed on the associated object by the conflict transaction is different from the operation type of the current operation to be executed, the conflict transaction and the current operation to be executed are incompatible.
Wherein, in this step, the operation types include: an insert operation, a delete operation, and an update operation, for example, one of which performs an insert operation on the associated object and the other performs a delete operation or an update operation on the associated object, then the two are incompatible; one of which performs a delete operation on the associated object and the other performs an insert operation or an update operation on the associated object, the two are incompatible.
Step 4-10: and emptying the existing operations in the operation queue to be executed in a batch execution mode.
Step 4-11: and after the conflict transaction is submitted, adding the current operation to be executed to the tail part of the operation queue to be executed.
In this embodiment, when the conflict transaction is incompatible with the current operation to be executed, the current operation to be executed and the operation existing in the operation queue to be executed cannot be combined and executed together, and the operation existing in the operation queue to be executed needs to be emptied in a batch execution mode.
And then, after waiting for the conflict transaction to be submitted, adding the current operation to be executed to the tail of the operation queue to be executed.
And after the current operation to be executed is added to the operation queue to be executed, the next operation to be executed is taken out from the transaction to be executed, and the operations are combined and executed according to the mode.
In the present invention, the principle of operation merging is mainly: judging whether the submitting operation is mingled between two identical operations needing to be combined and at the position of the log stream, if not, combining; if there is a conflict between the table involved in the current operation and the table involved in the transaction submitted between the two operations, then the operation compatibility rule is used to determine whether the two operations can be executed in combination.
Example 5:
referring to fig. 3, fig. 3 is a schematic structural diagram of a synchronization device according to an embodiment of the invention. The synchronization device of the present embodiment includes one or more processors 41 and a memory 42. In fig. 3, a processor 41 is taken as an example.
The processor 41 and the memory 42 may be connected by a bus or otherwise, which is illustrated in fig. 3 as a bus connection.
The memory 42 is used as a non-volatile computer readable storage medium based on the filtering method for storing non-volatile software programs, non-volatile computer executable programs and modules, the methods of the above embodiments and corresponding program instructions. The processor 41 implements the methods of the foregoing embodiments by executing nonvolatile software programs, instructions, and modules stored in the memory 42 to perform various functional applications and data processing.
The memory 42 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 42 may optionally include memory located remotely from processor 41, which may be connected to processor 41 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It should be noted that, because the content of information interaction and execution process between modules and units in the above-mentioned device and system is based on the same concept as the processing method embodiment of the present invention, specific content may be referred to the description in the method embodiment of the present invention, and will not be repeated here.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the embodiments may be implemented by a program that instructs associated hardware, the program may be stored on a computer readable storage medium, the storage medium may include: read Only Memory (ROM), random access Memory (Random AccessMemory, RAM), magnetic disk, optical disk, or the like.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. The filtering method for data synchronization based on log analysis is characterized by comprising the following steps:
the target end establishes a connection to the source end database and establishes a result set of the associated column of the reference table in the memory of the target end; the process of establishing the result set of the association column in the memory of the target end is as follows: reading a result set cache configuration file, and extracting table names, associated column names and filtering conditions of a reference table; constructing a query statement according to the table name, the associated column name and the filtering condition of the reference table; inquiring a target column value meeting the filtering condition through the inquiry statement based on connection to a source database; constructing a hash cache structure in a target-side memory, taking the target column value as a hash search key value, and constructing a result set based on the hash structure;
analyzing the operation log of the reference table, and dynamically updating the result set of the associated column according to the operation type and the filtering condition;
Receiving an operation log of a synchronous table, and analyzing the operation log of the synchronous table to obtain a column value of an associated column of the synchronous table;
judging whether the column value of the associated column of the synchronous table exists in the result set;
if so, adding the current operation log into an operation queue to be executed of the synchronous table.
2. The filtering method of claim 1, wherein the target creating a connection to the source database comprises:
the target end reads the data source configuration file, extracts data source connection information, and creates connection to the source end database according to the data source connection information, wherein the data source connection information comprises a database address, a database connection user name, a password and a connection port.
3. The filtering method according to claim 1, wherein the creating the result set of the reference table association column in the target memory includes:
reading a result set cache configuration file, and extracting table names, associated column names and filtering conditions of a reference table;
constructing a query statement according to the table name, the associated column name and the filtering condition of the reference table;
inquiring a target column value meeting the filtering condition through the inquiry statement based on connection to a source database;
And constructing a hash cache structure in the target-side memory, taking the target column value as a hash search key value, and constructing a result set based on the hash structure.
4. A filtering method according to claim 3, wherein parsing the operation log of the reference table and dynamically updating the result set of the associated column according to the operation type and the filtering condition comprises:
receiving an operation log of the reference table, and dynamically updating the result set according to the operation log of the reference table;
wherein, the process of dynamic update includes:
judging the operation type of the operation log of the reference table;
if the operation log is the DML operation, judging whether the operation log meets the filtering condition or not;
and if yes, extracting a target column value of an associated column of the reference table, and strategically updating the result set according to the operation type of the DML operation.
5. The filtering method of claim 4, wherein the strategically updating the result set according to the operation type of the DML operation comprises:
when the DML operation is an INSERT operation, judging whether the INSERT operation meets the filtering conditions set in the configuration file;
if not, not updating the result set;
If yes, judging whether a target column value corresponding to the INSERT operation exists in the result set;
if so, not updating the result set;
and if the result set does not exist, carrying out hash caching on the target column value corresponding to the INSERT operation so as to update the result set.
6. The method of filtering according to claim 4, further comprising:
when the DML operation is an UPDATE operation, judging whether a target column value before UPDATE of the UPDATE operation exists in the result set;
if yes, judging whether the UPDATE operation meets the filtering conditions set in the configuration file;
if yes, releasing the old value of the UPDATE operation in the result set, and carrying out hash caching on the new value of the UPDATE operation to UPDATE the result set;
if not, releasing the old value of the UPDATE operation in the result set to UPDATE the result set;
if not, judging whether the UPDATE operation meets the filtering conditions set in the configuration file;
if yes, carrying out hash caching on the target column value updated by the UPDATE operation to UPDATE the result set;
if not, the result set is not updated.
7. The method of filtering according to claim 4, further comprising:
When the DML operation is a DELETE operation, the column value of the DELETE operation in the result set is released.
8. The method of filtering according to claim 4, further comprising:
and if the result set is a TRUNCATE operation or a DROP operation, releasing the hash structure of the result set and resetting to an uninitialized state.
9. The filtering method of claim 3, wherein determining whether column values of associated columns of the synchronization table are present in the result set comprises:
and taking the column value of the associated column of the synchronous table as a hash search key value, performing hash search in a hash cache structure, and judging whether the column value of the associated column of the synchronous table exists in the result set.
10. A synchronization device, characterized in that the synchronization device comprises at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being programmed to perform the filtering method of any one of claims 1-9.
CN202010714030.4A 2020-07-22 2020-07-22 Data synchronization filtering method and synchronization device based on log analysis Active CN112035464B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010714030.4A CN112035464B (en) 2020-07-22 2020-07-22 Data synchronization filtering method and synchronization device based on log analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010714030.4A CN112035464B (en) 2020-07-22 2020-07-22 Data synchronization filtering method and synchronization device based on log analysis

Publications (2)

Publication Number Publication Date
CN112035464A CN112035464A (en) 2020-12-04
CN112035464B true CN112035464B (en) 2024-03-15

Family

ID=73582950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010714030.4A Active CN112035464B (en) 2020-07-22 2020-07-22 Data synchronization filtering method and synchronization device based on log analysis

Country Status (1)

Country Link
CN (1) CN112035464B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114598747B (en) * 2022-03-15 2023-09-29 国网安徽省电力有限公司信息通信分公司 Data acquisition method and system of virtualized cloud platform

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6175930B1 (en) * 1998-02-17 2001-01-16 International Business Machines Corporation Demand based sync bus operation
CN103064976A (en) * 2013-01-14 2013-04-24 浙江水利水电专科学校 Data for exchanging data among homogenous and heterogenous DBMSs (database management systems) on basis of active database technology
CN107291926A (en) * 2017-06-29 2017-10-24 搜易贷(北京)金融信息服务有限公司 A kind of binlog analysis methods
CN109271444A (en) * 2018-08-10 2019-01-25 武汉达梦数据库有限公司 A kind of table level bi-directional synchronization method and system based on trigger
CN109788053A (en) * 2019-01-04 2019-05-21 深圳壹账通智能科技有限公司 Method of data synchronization and system
CN111400408A (en) * 2020-04-13 2020-07-10 上海东普信息科技有限公司 Data synchronization method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7203712B2 (en) * 2004-02-26 2007-04-10 International Business Machines Corporation Algorithm to find LOB value in a relational table after key columns have been modified
US8037040B2 (en) * 2008-08-08 2011-10-11 Oracle International Corporation Generating continuous query notifications
US8788525B2 (en) * 2012-09-07 2014-07-22 Splunk Inc. Data model for machine data for semantic search
US10067708B2 (en) * 2015-12-22 2018-09-04 Arm Limited Memory synchronization filter

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6175930B1 (en) * 1998-02-17 2001-01-16 International Business Machines Corporation Demand based sync bus operation
CN103064976A (en) * 2013-01-14 2013-04-24 浙江水利水电专科学校 Data for exchanging data among homogenous and heterogenous DBMSs (database management systems) on basis of active database technology
CN107291926A (en) * 2017-06-29 2017-10-24 搜易贷(北京)金融信息服务有限公司 A kind of binlog analysis methods
CN109271444A (en) * 2018-08-10 2019-01-25 武汉达梦数据库有限公司 A kind of table level bi-directional synchronization method and system based on trigger
CN109788053A (en) * 2019-01-04 2019-05-21 深圳壹账通智能科技有限公司 Method of data synchronization and system
CN111400408A (en) * 2020-04-13 2020-07-10 上海东普信息科技有限公司 Data synchronization method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112035464A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
CN112035463B (en) Bidirectional synchronization method and synchronization device of heterogeneous database based on log analysis
US7200624B2 (en) Systems and methods for versioning based triggers
US9697253B2 (en) Consistent client-side cache
JP4416663B2 (en) Transparent network edge data cache
CN109189852B (en) Data synchronization method and device for data synchronization
US20150142733A1 (en) System and method for efficient management of big data in a database using streaming tables
CN109656958B (en) Data query method and system
US20160253382A1 (en) System and method for improving a query response rate by managing a column-based store in a row-based database
CN110362632A (en) A kind of method of data synchronization, device, equipment and computer readable storage medium
US11275759B2 (en) Data storage method and apparatus, server, and storage medium
US10664459B2 (en) Database managing method, database managing system, and database tree structure
CN111241094B (en) Database deleted column synchronization method and device based on log analysis
CN112559626B (en) Synchronous method and synchronous system of DDL operation based on log analysis
US9811560B2 (en) Version control based on a dual-range validity model
JP5558514B2 (en) Method and apparatus for optimally processing N-sort queries in multi-range scanning
CN111694893B (en) Partial rollback analysis method and data synchronization system based on log analysis
CN112035464B (en) Data synchronization filtering method and synchronization device based on log analysis
CN106020847A (en) Method and device for configuring SQL for persistent layer development framework
CN112559473B (en) Priority-based two-way synchronization method and system
CN111858504B (en) Operation merging execution method based on log analysis synchronization and data synchronization system
CN114003660B (en) Method and device for efficiently synchronizing real-time data to click House based on flash
CN112307118B (en) Method for guaranteeing data consistency based on log analysis synchronization and synchronization system
CN110413642B (en) Application-unaware fragmentation database parsing and optimizing method
CN112307117B (en) Synchronization method and synchronization system based on log analysis
DE202023104916U1 (en) Distributed execution of transactional queries

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Applicant after: Wuhan dream database Co.,Ltd.

Address before: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Applicant before: WUHAN DAMENG DATABASE Co.,Ltd.

CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Sun Feng

Inventor after: Peng Qingsong

Inventor after: Liu Qichun

Inventor before: Sun Feng

Inventor before: Fu Quan

Inventor before: Peng Qingsong

Inventor before: Liu Qichun

GR01 Patent grant
GR01 Patent grant