CN111984729A

CN111984729A - Heterogeneous database data synchronization method, device, medium and electronic equipment

Info

Publication number: CN111984729A
Application number: CN202010820904.4A
Authority: CN
Inventors: 王凯龙
Original assignee: Beijing Kingbase Information Technologies Co Ltd
Current assignee: Beijing Kingbase Information Technologies Co Ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-11-24

Abstract

The present disclosure relates to the field of database technologies, and in particular, to a heterogeneous database data synchronization method, a heterogeneous database data synchronization apparatus, a computer-readable storage medium, and an electronic device, where the method includes: reading data and indication information from an incremental data file transmitted from a source database; the indication information comprises table information indicating that the data is to be written; determining whether the data relates to an association relation table in a target database or not based on a preset configuration file and the indication information; and when the data are determined not to relate to the incidence relation table, distributing the data to a designated write thread in a plurality of write threads based on the preset configuration file and the indication information, so that the designated write thread corresponding to each of the plurality of read data writes the data into the target database in parallel. The embodiment of the disclosure can improve the loading performance of the target database in the data storage process when the data are synchronized in the heterogeneous database system environment.

Description

Heterogeneous database data synchronization method, device, medium and electronic equipment

Technical Field

The embodiment of the disclosure relates to the technical field of databases, and in particular, to a heterogeneous database data synchronization method, a heterogeneous database data synchronization device, and a computer-readable storage medium and an electronic device for implementing the heterogeneous database data synchronization method.

Background

With the continuous development of computer network technology, database synchronization technology, as a method for maintaining data consistency among various database nodes, becomes a key technology for ensuring system performance and improving system reliability.

For heterogeneous database systems, data may be synchronized using a data synchronization tool, which may generally include three phases: the first stage is to carry out initialization loading of stock data to obtain a basic point of data synchronization; in the second stage, incremental data synchronization is carried out by taking a synchronization base point established by initializing data loading as a reference; and the third stage is used for regularly comparing and checking the data of the source database and the data of the target database during data synchronization so as to confirm that no data is lost during the data synchronization process. In the second stage of incremental data synchronization, the performance metrics of primary concern include maximum delay and maximum throughput. The maximum delay refers to how long the source database can be transmitted to the target database after new data is added. The maximum throughput refers to starting a data synchronization software tool after a large number of log files are pre-accumulated at the source database end, and the synchronization software tool can synchronize accumulated data at the source database end to the target database within a long time.

In order to improve the data synchronization performance, the two performance indexes may be optimized, for example, the analysis speed of the incremental data at the source database end may be increased, the data transmission speed may be increased, and the data loading speed of the target database may be increased. However, when incremental data are synchronized in a heterogeneous database system environment, how to improve the data storage loading performance of a target database is an urgent problem to be solved.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, the present disclosure provides a heterogeneous database data synchronization method, a heterogeneous database data synchronization apparatus, and a computer-readable storage medium and an electronic device implementing the heterogeneous database data synchronization method.

In a first aspect, an embodiment of the present disclosure provides a data synchronization method for heterogeneous databases, including:

reading data and indication information from an incremental data file transmitted from a source database; the indication information comprises table information indicating that the data is to be written;

determining whether the data relates to an association relation table in a target database or not based on a preset configuration file and the indication information;

and when the data are determined not to relate to the incidence relation table, distributing the data to a designated write thread in a plurality of write threads based on the preset configuration file and the indication information, so that the designated write thread corresponding to each of the plurality of read data writes the data into the target database in parallel.

In some embodiments of the present disclosure, a correspondence between table information of different tables in a target database and corresponding write thread identifiers is preset in the preset configuration file;

the allocating the data to a designated write thread of a plurality of write threads based on the preset configuration file and the indication information includes:

determining a write thread identifier corresponding to the data based on the table information to be written in the data in the indication information and the corresponding relation;

and distributing the data to the specified write thread indicated by the write thread identifier corresponding to the data based on the write thread identifier corresponding to the data.

In some embodiments of the present disclosure, the preset configuration file further includes a preset data allocation policy indicating that data is allocated among the plurality of write threads based on a preset polling algorithm;

determining whether the table in which the data is to be written is a first preset data table or not based on the table information in which the data is to be written in the indication information and the corresponding relation; the first preset data table is at least one table related in the corresponding relation;

when the data is determined not to be the first preset data table, distributing the data to a designated write thread in a plurality of write threads based on the preset data distribution strategy;

and when the data is determined to be the first preset data table, distributing the data to a designated write thread in a plurality of write threads based on the corresponding relation and the indication information.

In some embodiments of the present disclosure, the preset configuration file further includes table information of an association table; the determining whether the data relates to an association relation table in a target database based on a preset configuration file and the indication information includes:

acquiring table information of an association relation table in the preset configuration file;

determining whether the table information of the association relation table is the same as the table information to be written in the data in the indication information;

determining that the data relates to an association table in the target database is the same;

when the data are determined to be different, the data do not relate to the incidence relation table in the target database.

In some embodiments of the present disclosure, the preset configuration file further includes a consistency protection policy, where the consistency protection policy includes a data allocation rule and a serial execution order when operating on the association table; the method further comprises the following steps:

when the data are determined to relate to the incidence relation table, determining whether all the write threads finish writing the data;

when all the write threads finish writing data, distributing the data to a specified write thread in a plurality of write threads based on a data distribution rule in the consistency protection strategy;

and based on the serial execution sequence in the consistency protection strategy, serially executing the specified write thread corresponding to the data so as to write the data into the incidence relation table in the target database.

In some embodiments of the present disclosure, the data allocation rule comprises an allocation rule based on a preset polling algorithm; and/or the table information comprises a table name and/or a unique identification of the table.

In some embodiments of the present disclosure, before reading the data and the indication information from the incremental data file transmitted from the source database, the method further includes:

initializing and starting a reading thread and a plurality of writing threads on the side of the target database;

respectively configuring corresponding cache queues for the plurality of write threads, and connecting the plurality of write threads with the target database; and the data distributed to each appointed write thread enters a corresponding buffer queue.

In some embodiments of the present disclosure, further comprising:

the incremental data files transmitted to the target database are sequentially stored by taking a transaction as a unit;

the reading of a plurality of data from the incremental data file transmitted from the source database comprises:

the read thread sequentially and serially reads the sequentially stored incremental data files to read data.

In some embodiments of the present disclosure, further comprising:

the read thread inserts stop instruction data into the cache queue of each write thread and quits the read thread;

and each write thread quits the write thread when receiving the stop instruction data, and the data synchronization is finished when all the threads stop running.

In a second aspect, an embodiment of the present disclosure provides a heterogeneous database data synchronization apparatus, including:

the data reading module is used for reading data and indication information from the incremental data file transmitted by the source database; the indication information comprises table information indicating that the data is to be written;

the relation determining module is used for determining whether the data relates to an association relation table in a target database or not based on a preset configuration file and the indication information;

and the data processing module is used for distributing the data to a specified write thread in a plurality of write threads based on the preset configuration file and the indication information when the data does not relate to the incidence relation table, so that the read specified write threads corresponding to the plurality of data write the data into the target database in parallel.

In a third aspect, an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for synchronizing data in heterogeneous databases according to any embodiment of the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides an electronic device, including:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to execute the steps of the heterogeneous database data synchronization method according to any one of the embodiments of the first aspect through execution of the executable instructions.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:

in the embodiment of the disclosure, data and indication information are read from an incremental data file transmitted from a source database, wherein the indication information comprises table information indicating that the data is to be written; then determining whether the data relates to an association relation table in a target database or not based on a preset configuration file and the indication information; and when the data are determined not to relate to the incidence relation table, distributing the data to a specified write thread in a plurality of write threads based on the preset configuration file and the indication information so that the specified write threads corresponding to the read data write the data into the target database in parallel. In this way, the scheme of the embodiment essentially can dynamically allocate the incremental data to be written into a plurality of unrelated target database tables to a plurality of designated write threads, namely a plurality of warehousing channels, and then can write the data into the target database in parallel, so that the data writing efficiency of the target database is improved to a great extent, the data processing time is saved, and the data warehousing loading performance of the target database can be greatly improved when the incremental data are synchronized in a heterogeneous database system environment.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a flow chart of a heterogeneous database data synchronization method according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of another method for synchronizing data in heterogeneous databases according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an example heterogeneous database system architecture in an embodiment of the present disclosure;

FIG. 4 is a diagram illustrating a data synchronization scenario for a heterogeneous database according to an embodiment of the disclosure;

FIG. 5 is a diagram illustrating an apparatus for synchronizing data of heterogeneous databases according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of an electronic device for implementing a data synchronization method for heterogeneous databases in an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

When incremental data are synchronized in a heterogeneous database system environment, for example, in an environment of performing incremental data synchronization of a heterogeneous database by using a log file, an upper-layer application program does not simultaneously perform Data Manipulation Language (DML) operations, such as operations of inserting, deleting and modifying data in a table, on a plurality of tables in a target database in a short time. Based on the characteristics, in the scheme of the embodiment, the incremental data to be written into the plurality of irrelevant target database tables can be dynamically distributed into different warehousing channels to be written into the target databases in parallel, so that the influence on the normal work of the target databases is reduced as much as possible, and the data warehousing loading performance of the target databases can be greatly improved when the incremental data are synchronized under the heterogeneous database system environment.

Fig. 1 is a flowchart of a data synchronization method for heterogeneous databases according to an embodiment of the present disclosure, and referring to fig. 1, the data synchronization method may include the following steps:

step S102: and reading data and indication information from the incremental data file transmitted from the source database. The indication information includes table information indicating that data is to be written.

Step S104: and determining whether the data relates to an association relation table in a target database or not based on a preset configuration file and the indication information.

Step S106: and when the data are determined not to relate to the incidence relation table, distributing the data to a designated write thread in a plurality of write threads based on the preset configuration file and the indication information, so that the designated write thread corresponding to each of the plurality of read data writes the data into the target database in parallel.

The heterogeneous database data synchronization method of the embodiment of the disclosure can dynamically allocate incremental data to be written into a plurality of irrelevant target database tables to a plurality of designated write threads, that is, a plurality of warehousing channels, the plurality of irrelevant target database tables, that is, a plurality of data read from the plurality of irrelevant target database tables, do not relate to an association relation table, have no business association relation, and the writing of each data does not affect the business relation. The data can be written into the target database in parallel through the plurality of warehousing channels, so that the influence on the normal work of the target database can be reduced as much as possible, the data writing efficiency of the target database is improved to a great extent, the data processing time is saved, and the data warehousing loading performance of the target database can be greatly improved when the incremental data are synchronized under the heterogeneous database system environment.

In some embodiments of the present disclosure, the source database and the target database are at least part of a heterogeneous database system, and the incremental data file may be transmitted by the source-side data synchronization software to the target-side synchronization software on the target database side via a network, such as a local area network or a wide area network. The table information may include, but is not limited to, a table name and/or a unique identification of the table. The TABLE information in this embodiment may be a TABLE name such as TABLE1, that is, the TABLE information may indicate that the read data X1 is a TABLE1 to be written into the target database, for example, this is not limited in this embodiment, and is determined by specific business needs.

In some embodiments of the present disclosure, before reading the data and the indication information from the incremental data file transmitted from the source database in step S102, the following steps may be further included:

step a): and initializing and starting a read thread and a plurality of write threads on the target database side. The read thread may read data and indication information from the delta data file.

Step b): and respectively configuring corresponding cache queues for the plurality of write threads, and connecting the plurality of write threads with the target database. And the data distributed to each appointed write thread enters a corresponding buffer queue. In this embodiment, based on the multiple write threads and the corresponding cache queues, data is loaded in a multi-channel parallel warehousing manner at the side of the target database, so that the data warehousing loading performance of the target database can be improved.

Optionally, on the basis of the above embodiments, in some embodiments of the present disclosure, the following step c) may be further included: and sequentially storing the incremental data files transmitted to the target database by taking a transaction as a unit. Correspondingly, reading a plurality of data from the incremental data file in step S102 may specifically include: the read thread sequentially and serially reads the sequentially stored incremental data files to read data.

Optionally, in some embodiments of the present disclosure, a corresponding relationship between table information of different tables in the target database and corresponding write thread identifiers may be preset in the preset configuration file. For example, the plurality of write threads are distinguished by different write thread identifications, and the preset configuration file may pre-configure, for example, correspondence between TABLE1, TABLE2, TABLE3, TABLE4, TABLE5, TABLE6 and corresponding write thread identification 1, write thread identification 2, write thread identification 3, write thread identification 4, write thread identification 5, and write thread identification 6, such as the correspondence shown in TABLE1, which is only a simple example and is not limited thereto in this embodiment.

TABLE1

Watch information (watch name)	Write thread identification
		TABLE1	1
TABLE2	2
		TABLE3	3
TABLE4	4
		TABLE5	5
TABLE6	6

Correspondingly, in step S106, allocating data to a designated write thread of the plurality of write threads based on the preset configuration file and the indication information, specifically includes the following steps:

and determining the write thread identifier corresponding to the data based on the table information to be written in the data in the indication information and the corresponding relation.

Specifically, as an example, for example, the indication information indicates that the data X1 is to be written into the TABLE1 in the target database, based on the above correspondence of the example, it may be determined that the write thread identifier corresponding to TABLE1 is 1 by TABLE lookup, so that the data X1 may be distributed into the cache queue of the specified write thread indicated by the write thread identifier 1. For another example, the indication information indicates that the data X2, X3, and X4 are all to be written into the TABLE2 in the target database, and based on the correspondence relationship, it may be determined that operations of the corresponding subsequent write data X2, X3, and X4 are all performed by the specified thread indicated by the write thread identifier 2, that is, the data X2, X3, and X4 may be distributed into the cache queue of the specified write thread indicated by the write thread identifier 2.

The read thread can read a plurality of data in the incremental data file to repeat the distribution process, and meanwhile, the designated write thread corresponding to each data obtains the data from the respective cache queue and packages the data into SQL statements to be executed in the target database so as to write the SQL statements into the target database in parallel. Therefore, incremental data to be written into a plurality of unrelated target database tables can be dynamically distributed to a plurality of designated write threads, namely a plurality of warehousing channels, and the data can be written into the target database in parallel through the warehousing channels, so that the influence on the normal work of the target database can be reduced as much as possible, the data writing efficiency of the target database is improved to a great extent, the data processing time is saved, and the data warehousing loading performance of the target database can be greatly improved when the incremental data are synchronized under the heterogeneous database system environment.

In this embodiment, a designated write thread corresponding to an operation, such as a write operation, for a different table in the target database may be preset by the preset configuration file. The corresponding designated write thread required to be configured for the tables can be set according to the requirements of upper-layer application, the preset configuration file can be configured or updated in advance and the like, and can be specifically configured or updated according to application scenes, so that the functions are enriched when heterogeneous databases are synchronized, different scenes can be adapted, and the applicability is better.

Optionally, on the basis of any of the foregoing embodiments, in other embodiments of the present disclosure, the preset configuration file may further include a preset data allocation policy, where the preset data allocation policy indicates that data is allocated among the multiple write threads based on a preset polling algorithm. For example, the preset polling algorithm may be a Round-robin algorithm, but is not limited thereto. Correspondingly, allocating data to a designated write thread of the plurality of write threads based on the preset configuration file and the indication information may specifically include the following steps:

step i): determining whether the table in which the data is to be written is a first preset data table or not based on the table information in which the data is to be written in the indication information and the corresponding relation; the first preset data table is at least one table involved in the corresponding relationship.

Illustratively, for example, if TABLE1, TABLE2, TABLE3, TABLE4, TABLE5, and TABLE6 are involved in the correspondence shown in TABLE2, these TABLEs may be the first preset data TABLE. If the TABLE information indicating that the data X5 is to be written in the indication information is TABLE7, it may be determined that TABLE7 is not the above-specified first preset data TABLE.

Step ii): and when the data is determined not to be the first preset data table, distributing the data to a specified write thread in a plurality of write threads based on the preset data distribution strategy.

For example, since TABLE7 is not the first predetermined data TABLE specified above, for example, TABLE7, i.e., the operation of other TABLEs, the predetermined configuration file may be configured with a corresponding predetermined data allocation policy, such as a dynamic allocation policy based on Round-robin algorithm. Thus, upon determining that TABLE TABLE7 is not the first predetermined data TABLE specified above, data X5 is assigned to a specified write thread of the plurality of write threads based on a dynamic assignment policy, such as the Round-robin algorithm.

Step iii): and when the data is determined to be the first preset data table, distributing the data to a designated write thread in a plurality of write threads based on the corresponding relation and the indication information.

For example, the determination TABLE7 is the specified first preset data TABLE, that is, the corresponding relationship may also be preconfigured with the write thread identifier 7 corresponding to the TABLE7, at this time, the cache queue of the specified write thread indicated by the write thread identifier 7 in the multiple write threads to which the data X5 is allocated based on the corresponding relationship and the indication information may be returned, and the specific process may refer to the specific allocation process of the data such as X1, X2, X3, and X4, which is not described herein again.

Optionally, on the basis of any of the above embodiments, in some embodiments of the present disclosure, the preset configuration file may further include table information of an association table. The association TABLE represents at least two TABLEs having traffic dependency relationships or association relationships, for example, the data X6 relates to the association TABLEs TABLE8 and TABLE9, and represents that the data X6 needs to be written into the TABLEs TABLE8 and TABLE9 in the target database in sequence according to the traffic dependency relationships, for example, but is not limited thereto. In this embodiment, the preset configuration file may set TABLE information of the association TABLE, such as TABLE names TABLE8 and TABLE 9. Correspondingly, in step S104, determining whether the data relates to an association table in the target database based on the preset configuration file and the indication information may specifically include the following steps:

and acquiring the table information of the association relation table in the preset configuration file.

Determining whether the table information of the association relation table is the same as the table information to be written in the data in the indication information; determining that the data relates to an association table in the target database is the same; when the data are determined to be different, the data do not relate to the incidence relation table in the target database.

Specifically, as an example, TABLE information of the association TABLE in the preset configuration file, such as TABLE8 and TABLE9, may be obtained. If the TABLE information to be written of the data X6 indicated by the indication information includes the associated TABLE8 and TABLE9, it may be determined that the two TABLE information are the same, that is, the data X6 indicated by the indication information is to be written into the associated TABLE8 and TABLE 9.

Optionally, on the basis of the foregoing embodiments, in some embodiments of the present disclosure, the preset configuration file may further include a consistency protection policy, where the consistency protection policy may include, but is not limited to, a data allocation rule and a serial execution order when operating on the association table. For example, where the operations of TABLE8 and TABLE9 are related, a coherency protection policy may be initiated and the preset configuration file may preset data allocation rules and serial execution order for the operations of TABLE8 and TABLE 9. For example, in some embodiments of the present disclosure, the data allocation rule may include, but is not limited to, an allocation rule based on a preset polling algorithm, for example, an allocation rule based on Round-robin algorithm. The serial execution sequence is, for example, to operate TABLE8 first and then operate TABLE9, but the serial execution sequence is not limited to this, and may be configured according to traffic relationships and the like. Accordingly, in some embodiments of the present disclosure, as shown in connection with fig. 2, the method may further comprise the steps of:

step S102: and reading data and indication information from the incremental data file transmitted from the source database. The indication information includes table information indicating that the data is to be written.

It is understood that, with regard to step S102 and step S104, reference may be made to the description in the foregoing embodiments, and details are not repeated here.

Step S201: and when the data are determined to relate to the incidence relation table, determining whether all the write threads finish writing the data.

For example, it is determined that the data X6 relates to the association TABLE8 and TABLE9, where it is determined whether all current write threads are completely written with data, for example, the cache queues corresponding to multiple write threads are continuously polled, when the cache queues of all write threads become empty, it is determined that all write threads are completely written with data, and when the cache queue of at least one write thread is not empty, it is determined that all write threads are not completely written with data. That is, when operating on these specific association TABLEs, such as TABLE8 and TABLE9, it is necessary to wait until the previous transaction operation is completed, for example, all the write threads have completed writing data, before starting to execute the operations on TABLE8 and TABLE 9.

Step S202: and when all the write threads are determined to finish writing data, distributing the data to a specified write thread in a plurality of write threads based on a data distribution rule in the consistency protection strategy.

Illustratively, upon determining that all of the writers are finished writing data, data X6 is assigned to the cache queue corresponding to the designated one of the writers based on, for example, the Round-robin algorithm.

Step S203: and based on the serial execution sequence in the consistency protection strategy, serially executing the specified write thread corresponding to the data so as to write the data into the incidence relation table in the target database.

Illustratively, after data X6 is allocated, the corresponding designated write thread retrieves data from the cache queue, writes to Table TABLE8, and then writes to Table TABLE 9. Meanwhile, the cache queue of the designated write thread can be continuously queried, and when the cache queue is empty, namely the data X6 is written into the target database, the multi-channel parallel distribution data work mode before the specific data X6 is encountered can be recovered.

The embodiment of the present disclosure proposes a consistency protection policy, that is, a concept of a strong consistency block, which allows a consistency protection policy mode to be started when an operation is performed on some association relation table: that is, when some specific data need to be operated to these specific tables, it needs to wait for the completion of the previous transaction operation, serialize and execute the DML for these tables, and after completing the DML for these tables, it can be restored to the synchronization form of multi-channel parallel warehousing in the above embodiment again, so as to greatly improve the loading performance of the target database end data warehousing in the database synchronization process.

When the whole data synchronization needs to be stopped, in order to reduce the influence on the system, on the basis of any one of the above embodiments, some embodiments of the present disclosure may further include the following steps:

and the read thread inserts stop instruction data into the cache queue of each write thread and quits the read thread.

For example, a read thread may insert a null data with STOP as an instruction into a cache queue of each write thread, and exit its own thread; when each write thread receives the null data with the STOP instruction, the write thread also exits the thread; when all threads are stopped, the entire system is stopped. In this embodiment, when data synchronization needs to be stopped, all threads are not immediately forcibly closed, but a read thread inserts stop instruction data into a cache queue of each write thread and then exits the thread of the read thread, and each write thread exits the write thread when receiving the stop instruction data.

Referring to fig. 3 and fig. 4 in combination, in a specific application scenario, the heterogeneous database data synchronization method may be performed by a target synchronization program, and may include the following steps:

step 1): initializing a read thread and N write threads at a target database end, establishing buffer queues corresponding to the N write threads respectively, and connecting the N write threads with a target database.

Step 2): the read thread continuously reads data from an incremental data file from a source database transmitted by source-end data synchronization software, such as the source-end synchronization program shown in fig. 3, and allocates different data records to cache queues corresponding to different or incompletely identical write threads based on a preset configuration file.

Table2 below is a simple example of the preset configuration file, and the present embodiment is not limited thereto.

TABLE2

In an example of the preset configuration file, fixedly distributing the operation aiming at the TABLE TABLE1 to a cache queue corresponding to the write thread 0; fixedly distributing the operation aiming at the TABLE TABLE2 to a cache queue corresponding to the write thread 1; operations for TABLE3, TABLE4 and TABLE5 are fixedly distributed to the corresponding cache queue of the write thread 2. The other tables, namely the tables of the mark, adopt Round-robin algorithm (mark parameter-1) to carry out polling distribution. For the operation of the association relation TABLEs, i.e. the important (critical) TABLEs TABLE6, TABLE7, a strong consistent block mode, i.e. a consistent protection policy mode operation, is used.

Step 3): the write thread is responsible for acquiring incremental data records from respective cache queues, packaging the incremental data records into SQL statements, and executing the SQL statements in the target database to write data in parallel.

When the read thread reads the incremental data which is configured by a distribution strategy as-1 from the incremental data file, the Round-robin algorithm is used for distribution among the N write threads.

When a read thread reads incremental data configured as critical from an incremental data file, continuously polling buffer queues of N write threads, when the buffer queues of all the write threads are empty, according to a Round-robin algorithm, after the current critical incremental data is placed into the buffer queue corresponding to the write thread determined by the algorithm, continuously inquiring the buffer queue of the write thread, and when the buffer queue is empty, namely the data is written into a target database, recovering to a parallel distribution work mode before encountering the critical incremental data.

When the whole system needs to be stopped, the read thread inserts a null data with an instruction of STOP into the cache queue of each write thread and quits the thread of the read thread; when each write thread receives the null data with the STOP instruction, the write thread also exits the thread; when all threads are stopped, the entire system is stopped.

In the embodiment, by utilizing the characteristic that the DML operation cannot be simultaneously performed on a plurality of tables in a short time by the upper application, incremental data of a plurality of irrelevant database tables are dynamically distributed to different warehousing channels and are written into the database in parallel, so that the aim of accelerating the loading and warehousing performance of the data of the target database is fulfilled. Meanwhile, the embodiment also provides concepts such as preset configuration files and strong consistent blocks, so that the final user is allowed to configure the system according to the actual application scene, and the system is good in applicability and expandability.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc. Additionally, it will also be readily appreciated that the steps may be performed synchronously or asynchronously, e.g., among multiple modules/processes/threads.

Based on the same concept, an embodiment of the present disclosure further provides a heterogeneous database data synchronization apparatus, and referring to fig. 5, the heterogeneous database data synchronization apparatus 50 may include:

a data reading module 501, configured to read data and indication information from an incremental data file transmitted by a source database; the indication information includes table information indicating that the data is to be written.

A relationship determining module 502, configured to determine whether the data relates to an association relationship table in the target database based on a preset configuration file and the indication information.

A data processing module 503, configured to, when it is determined that the data does not relate to the association table, allocate the data to a designated write thread of multiple write threads based on the preset configuration file and the indication information, so that the read designated write threads corresponding to the multiple data write the data in the target database in parallel.

Optionally, in some embodiments of the present disclosure, a corresponding relationship between table information of different tables in the target database and corresponding write thread identifiers may be preset in the preset configuration file. The data processing module 503, based on the preset configuration file and the indication information, allocates the data to a designated write thread of a plurality of write threads, and specifically may include: determining a write thread identifier corresponding to the data based on the table information to be written in the data in the indication information and the corresponding relation; and distributing the data to the specified write thread indicated by the write thread identifier corresponding to the data based on the write thread identifier corresponding to the data.

Optionally, in some embodiments of the present disclosure, the preset configuration file may further include a preset data allocation policy, where the preset data allocation policy may indicate that data is allocated among the plurality of write threads based on a preset polling algorithm. The data processing module 503, based on the preset configuration file and the indication information, allocates the data to a designated write thread of a plurality of write threads, and specifically may include:

Optionally, in some embodiments of the present disclosure, the preset configuration file may further include table information of an association table. The relationship determining module 502 determines whether the data relates to an association relationship table in a target database based on a preset configuration file and the indication information, and specifically may include:

Optionally, in some embodiments of the present disclosure, the preset configuration file may further include a consistency protection policy, where the consistency protection policy may include a data allocation rule and a serial execution sequence when the association table is operated; the data synchronization apparatus may further include a consistency protection module to: when the data are determined to relate to the incidence relation table, determining whether all the write threads finish writing the data; when all the write threads finish writing data, distributing the data to a specified write thread in a plurality of write threads based on a data distribution rule in the consistency protection strategy; and based on the serial execution sequence in the consistency protection strategy, serially executing the specified write thread corresponding to the data so as to write the data into the incidence relation table in the target database.

Optionally, in some embodiments of the present disclosure, the data allocation rule may include an allocation rule based on a preset polling algorithm. In some embodiments of the present disclosure, the table information may include, but is not limited to, a table name and/or a unique identification of the table, and the like.

Optionally, in some embodiments of the present disclosure, a thread initialization module may be further included, configured to initialize and start a read thread and multiple write threads on the side of the target database before reading data and indication information from an incremental data file received from the target database; respectively configuring corresponding cache queues for the plurality of write threads, and connecting the plurality of write threads with the target database; and the data distributed to each appointed write thread enters a corresponding buffer queue.

Optionally, in some embodiments of the present disclosure, a file storage module may be further included, configured to sequentially store the incremental data files transmitted to the target database in a unit of transaction. The data reading module 501 is further configured to enable the reading thread to sequentially and serially read the sequentially stored incremental data files to read data.

Optionally, in some embodiments of the present disclosure, an end synchronization module may further be included, configured to enable the read thread to insert stop instruction data into the cache queue of each write thread, and exit the read thread; and each write thread quits the write thread when receiving the stop instruction data, and the data synchronization is finished when all the threads stop running.

The specific manner in which the above-mentioned embodiments of the apparatus, and the corresponding technical effects brought about by the operations performed by the respective modules, have been described in detail in the embodiments related to the method, and will not be described in detail herein.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units. The components shown as modules or units may or may not be physical units, i.e. may be located in one place or may also be distributed over a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the wood-disclosed scheme. One of ordinary skill in the art can understand and implement it without inventive effort.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the data synchronization method for heterogeneous databases according to any of the above embodiments.

By way of example, and not limitation, such readable storage media can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

The embodiment of the disclosure also provides an electronic device, which includes a processor and a memory, wherein the memory is used for storing the executable instruction of the processor. Wherein the processor is configured to perform the steps of the heterogeneous database data synchronization method in any of the above embodiments via execution of the executable instructions.

An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.

Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the above-mentioned heterogeneous database data synchronization method section of this specification. For example, the processing unit 610 may perform the steps of the method as shown in fig. 1.

The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.

The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above-mentioned heterogeneous database data synchronization method according to the embodiments of the present disclosure.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A heterogeneous database data synchronization method is characterized by comprising the following steps:

2. The data synchronization method according to claim 1, wherein the preset configuration file is preset with a corresponding relationship between table information of different tables in a target database and corresponding write thread identifiers;

3. The data synchronization method according to claim 2, wherein the preset configuration file further comprises a preset data allocation policy indicating allocation of data among the plurality of write threads based on a preset polling algorithm;

4. The data synchronization method according to any one of claims 1 to 3, wherein the preset configuration file further comprises table information of an association table; the determining whether the data relates to an association relation table in a target database based on a preset configuration file and the indication information includes:

5. The data synchronization method according to claim 4, wherein the preset configuration file further comprises a consistency protection policy, and the consistency protection policy comprises a data distribution rule and a serial execution sequence when operating on the association relation table; the method further comprises the following steps:

6. The data synchronization method according to claim 5, wherein the data allocation rule comprises an allocation rule based on a preset polling algorithm; and/or the table information comprises a table name and/or a unique identification of the table.

7. The data synchronization method according to claim 5, wherein before reading the data and the indication information from the delta data file transmitted from the source database, the method further comprises:

8. The data synchronization method of claim 7, further comprising:

9. The data synchronization method of claim 7, further comprising:

10. A heterogeneous database data synchronization apparatus, comprising:

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the heterogeneous database data synchronization method according to any one of claims 1 to 9.

12. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the steps of the heterogeneous database data synchronization method of any one of claims 1 to 9 via execution of the executable instructions.