CN114741453A - Method, system and computer readable storage medium for data synchronization - Google Patents

Method, system and computer readable storage medium for data synchronization Download PDF

Info

Publication number
CN114741453A
CN114741453A CN202210454528.0A CN202210454528A CN114741453A CN 114741453 A CN114741453 A CN 114741453A CN 202210454528 A CN202210454528 A CN 202210454528A CN 114741453 A CN114741453 A CN 114741453A
Authority
CN
China
Prior art keywords
table structure
database
change
native
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210454528.0A
Other languages
Chinese (zh)
Inventor
南方剑
刘畅
杨爽
陈存利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Du Xiaoman Technology Beijing Co Ltd
Original Assignee
Du Xiaoman Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Du Xiaoman Technology Beijing Co Ltd filed Critical Du Xiaoman Technology Beijing Co Ltd
Priority to CN202210454528.0A priority Critical patent/CN114741453A/en
Publication of CN114741453A publication Critical patent/CN114741453A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support

Abstract

The present application provides a method, a system and a computer readable storage medium for data synchronization, comprising: analyzing a table structure change event recorded in a binary log of an upstream database to obtain a table change statement; responding to a judgment result of the downstream database for finishing the execution of the row change event of the native table structure, and storing the target table structure to a cache module; and executing the table change statement in the downstream database based on the target table structure in the cache module, so that the native table structure in the downstream database is updated to the target table structure. By caching the native table structure in the cache module, the problem that the native table structure cannot be obtained when the table structure of the upstream database is changed, so that data synchronization is frequently interrupted is solved; in addition, through modification of the generation protocol of the binary log of the upstream database, the binary log can record more structural information related to the table change event.

Description

Method, system and computer readable storage medium for data synchronization
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, a system, and a computer-readable storage medium for data synchronization.
Background
In order to solve the requirements of enterprises for data migration, data synchronization or data subscription and the like, various cloud manufacturers provide data transmission services of heterogeneous and homogeneous data of the database. The data transmission service comprises three stages of structure migration, full data migration and incremental data migration, wherein the incremental data migration mainly utilizes the characteristic that the adding, deleting and modifying operations of an upstream database are all recorded in a binary log, an upstream table structure is used for analyzing the adding, deleting and modifying operation records in the binary log and restoring the adding, deleting and modifying operation records into a structured query language to be executed in a downstream database.
However, when a relational database (e.g., MySQL) is used as an upstream database to perform a data transfer service, the upstream database cannot be supported to perform table structure change. The main body is as follows: after the table structure is added and deleted by the upstream database, the changed table structure is used for analyzing the old binary log, so that the synchronization task fails and needs manual investigation and repair; in addition, in order to reduce the influence of the performance degradation of the upstream database caused by the table structure change, the table structure is often changed by using an open source tool, and the data of the upstream database and the data of the downstream database are inconsistent. Therefore, frequent interruptions of incremental data migration and low manual troubleshooting repair efficiency are major drawbacks of current data transfer services.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art. To this end, an object of the present application is to provide a method, system and computer-readable storage medium capable of supporting data synchronization of table structure changes of an upstream database.
One aspect of the present application provides a method of data synchronization, which may include: analyzing a table structure change event recorded in a binary log of an upstream database to obtain a table change statement, wherein the table structure change event represents a change operation from an original table structure of the upstream database to a target table structure, and the table change statement is an operation language for changing the table structure of the database; responding to a judgment result of the downstream database for finishing the execution of the row change event of the native table structure, and storing the target table structure to a cache module; and executing the table change statement in the downstream database based on the target table structure in the cache module, so that the native table structure in the downstream database is updated to the target table structure.
In some embodiments, parsing a table structure change event recorded in a binary log of an upstream database to obtain a table change statement may include: judging whether an open source tool is used when the native table structure is changed into the target table structure, wherein the open source tool is used for modifying the native table structure; performing open source tool matching on the temporary table according to a suffix of the temporary table generated in the table structure change event in response to a determination result of using the open source tool when the native table structure is changed to the target table structure; and analyzing the target table structure by combining with an open source tool to obtain the table change statement of the target table structure, and caching the table change statement to the metadata base.
In some embodiments, after analyzing the target table structure in combination with the open source tool, obtaining the table change statement of the target table structure, and caching the table change statement to the metadata base, the method may include: judging whether to execute a table change statement in the metadata base; and opening a read channel of the table alteration statement in the metadata base in response to a determination result of executing the table alteration statement in the metadata base.
In some embodiments, parsing the table structure change event recorded in the binary log of the upstream database to obtain the table change statement may include: judging whether an open source tool is used when the native table structure is changed into the target table structure, wherein the open source tool is used for modifying the native table structure; and reading a table change statement from the binary log record in response to a determination that the open source tool is not used when changing the native table structure to the target table structure.
In some embodiments, before parsing the table structure change event recorded in the binary log of the upstream database to obtain the table change statement, the method may include: modifying a generation protocol of a binary log of an upstream database, so that the binary log records a plurality of table structure information of a table structure change event, wherein the table structure information comprises: column name information, column attribute information, table character sets, and column character sets.
In some embodiments, modifying the generation protocol of the binary log of the upstream database such that the binary log records a plurality of table structure information of the table structure change event may include: analyzing a target table structure in an upstream database, and determining table structure information which is lacked in a table structure change event of a binary log, wherein the table structure information comprises: column name information, column attribute information, table character sets, and column character sets; inserting meta-information representing table structure information at the tail of an initial generation protocol of the binary log to obtain a target generation protocol of the binary log; and logging a table structure change event containing a plurality of table structure information into a binary log based on the target generation protocol.
In some embodiments, after parsing the table structure change event recorded in the binary log of the upstream database and obtaining the table change statement, the method may include: judging the execution state of the row change event of the downstream database to the native table structure, wherein the execution state comprises the end of execution and the execution; and when the execution state is in execution, waiting for the downstream database to execute the row change event of the native table structure until the execution state is changed to the end of execution.
In some embodiments, before storing the target table structure to the cache module in response to a determination that execution of the row change event of the native table structure by the downstream database is finished, the method may include: and clearing the native table structure in the cache module.
The present application further proposes a system for data synchronization, which may include: the system comprises a statement analysis module, a cache updating module and a synchronization module. The statement analysis module is used for analyzing a table structure change event recorded in a binary log of the upstream database to obtain a table change statement, wherein the table structure change event represents a change operation from an original table structure of the upstream database to a target table structure, and the table change statement is an operation language used for changing the database table structure. And the cache updating module is used for responding to a judgment result of the downstream database for finishing the execution of the row change event of the native table structure and storing the target table structure to the cache module. And the synchronization module is used for executing the table change statement in the downstream database based on the target table structure in the cache module, so that the native table structure in the downstream database is updated to the target table structure. A (c)
In some embodiments, the step of executing the statement parsing module may include: judging whether an open source tool is used when the native table structure is changed into the target table structure, wherein the open source tool is used for modifying the native table structure; performing open source tool matching on the temporary table according to a suffix of the temporary table generated in the table structure change event in response to a determination result of using the open source tool when the native table structure is changed to the target table structure; and analyzing the target table structure by combining with an open source tool to obtain the table change statement of the target table structure, and caching the table change statement to the metadata base.
In some embodiments, after analyzing the target table structure in combination with the open source tool, obtaining the table change statement of the target table structure, and caching the table change statement to the metadata base, the method may include: judging whether to execute a table change statement in the metadata base; and opening a read channel of the table alteration statement in the metadata base in response to a determination result of executing the table alteration statement in the metadata base.
In some embodiments, the step of executing the statement parsing module may include: judging whether an open source tool is used when the native table structure is changed into the target table structure, wherein the open source tool is used for modifying the native table structure; and reading a table change statement from the binary log record in response to a determination that the open source tool is not used when changing the native table structure to the target table structure.
In some embodiments, the system further includes a protocol modification module, configured to modify a generation protocol of the binary log of the upstream database, so that the binary log records a plurality of table structure information of the table structure change event, where the table structure information includes: column name information, column attribute information, table character sets, and column character sets.
In some embodiments, the executing step of the protocol modification module may include: analyzing a target table structure in an upstream database, and determining table structure information which is lacked in a table structure change event of a binary log, wherein the table structure information comprises: column name information, column attribute information, table character sets, and column character sets; inserting meta-information of the structural information of the representation table at the tail of an initial generation protocol of the binary log to obtain a target generation protocol of the binary log; and logging a table structure change event containing a plurality of table structure information into a binary log based on the target generation protocol.
In some embodiments, the system further comprises an execution condition determining module, configured to determine an execution state of the row change event of the native table structure by the downstream database, where the execution state includes an end of execution and an ongoing execution; and when the execution state is in execution, waiting for the downstream database to execute the row change event of the native table structure until the execution state is changed to the end of execution.
In some embodiments, the system further comprises a cache clearing module for clearing the native table structure in the cache module.
The present application also proposes a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which is suitable for being loaded by a processor to execute the steps of the above-mentioned data synchronization method.
According to the technical scheme of the embodiment, at least the following beneficial effects can be obtained.
According to the data synchronization method, the data synchronization system and the computer readable storage medium, the native table structure is cached in the cache module, so that the problem that the native table structure cannot be obtained when the table structure of the upstream database is changed, and the data synchronization is frequently interrupted is solved; in addition, the generation protocol of the binary log of the upstream database is modified, so that the binary log can record more structural information related to the table change event.
Drawings
FIG. 1 is a flow chart of a related art incremental data synchronization process;
FIG. 2 is a flow diagram of a method of data synchronization in accordance with an aspect of the subject application;
FIG. 3 is an incremental synchronization flow diagram;
FIG. 4 is a flow diagram of a method for an open source tool to modify a table structure;
FIG. 5 is a flow diagram of a table structure cache;
FIG. 6 is a system diagram of data synchronization in accordance with another aspect of the subject application;
FIG. 7 is an architectural diagram of a data transfer service in accordance with yet another aspect of the subject application;
FIG. 8 is a schematic diagram of an electronic device architecture according to yet another aspect of the present application; and
FIG. 9 is a schematic diagram of a computer readable storage medium structure according to yet another aspect of the present application.
Detailed Description
For a better understanding of the present application, various aspects of the present application will be described in more detail with reference to the accompanying drawings. It should be understood that the detailed description is merely illustrative of exemplary embodiments of the present application and does not limit the scope of the present application in any way. Like reference numerals refer to like elements throughout the specification. The expression "and/or" includes any and all combinations of one or more of the associated listed items.
It should be noted that in this specification the expressions first, second, third etc. are only used to distinguish one feature from another, and do not indicate any limitation of features, in particular any order of precedence. Thus, a first class of documents discussed in this application may also be referred to as a second class of documents and a first class of documents may also be referred to as a second class of documents and vice versa without departing from the teachings of this application.
In the drawings, the thickness, size, and shape of the components have been slightly adjusted for convenience of explanation. The figures are purely diagrammatic and not drawn to scale. As used herein, the terms "approximately", "about" and the like are used as table-approximating terms and not as table-degree terms, and are intended to account for inherent deviations in measured or calculated values that would be recognized by one of ordinary skill in the art.
It will be further understood that terms such as "comprising," "including," "having," "including," and/or "containing," when used in this specification, are open-ended and not closed-ended, and specify the presence of stated features, elements, and/or components, but do not preclude the presence or addition of one or more other features, elements, components, and/or groups thereof. Furthermore, when a statement such as "at least one of" appears after a list of listed features, it modifies that entire list of features rather than just individual elements in the list. Furthermore, the use of "may" mean "one or more embodiments of the application" when describing embodiments of the application. Also, the term "exemplary" is intended to refer to an example or illustration.
Unless otherwise defined, all terms (including engineering and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. In addition, unless explicitly defined or contradicted by context, the specific steps included in the methods described herein are not necessarily limited to the order described, but can be performed in any order or in parallel. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
In the DTS (Data Transmission Service), there are three main stages of table structure migration, full Data migration, and incremental Data synchronization to synchronize Data in an upstream database to a downstream database. Table structure migration is mainly to execute a Structured Query Language (SQL) showing table structures in an upstream database, such as "show create table; "SQL statement" to get the current table structure to execute downstream. The full data migration is mainly SQL statements that perform segmented data search in the upstream database, such as "select x from table limit xxx; "the data in the upstream database is scanned out in segments, and based on the current table structure, the data is spliced into SQL statements to be executed in the downstream database. The incremental data synchronization mainly utilizes the characteristic that the addition, deletion and modification operations of MySQL are recorded in the binary logs, analyzes the addition, deletion and modification operations recorded in the binary logs by using the current table structure of the upstream database, and restores the addition, deletion and modification operations into SQL statements to be executed in the downstream database.
The technical means of the present application is proposed to solve the problem that the table structure change of the upstream database cannot be supported in the incremental data synchronization process, so the incremental data synchronization process of the related art is briefly described below.
Fig. 1 is a flowchart of a related art incremental data synchronization process. As shown in fig. 1, in step S110, the user Actor writes data in the upstream MySQL: insert in student (id, name, sex) values (1, "Xiaoming", "Male"), i.e. data "1," Xiaoming "," Male "are written into the student table. Because the MySQL database is changed in a row, the operations of adding, deleting and modifying based on MySQL are all recorded in the binary log, and in step 120, when the data is changed, the binary log is written into the binary log, and the binary log records the operations of adding the row data, that is, "id is 1, name is mingming", and "sex is male"; ". In step S130, the DTS reads a line change event in the binary log. In step S140, the DTS reads the current native table structure in the upstream database MySQL. In step S150, the DTS integrates the row change event in the binary log and the obtained native table structure, and restores the row change event to the SQL statement: and (4) insert internestent (id, name, sex) values (1, Xiaoming and Man), finally writing the SQL statement into a downstream database, and completing data synchronization.
Further, step S131 is to add a class column to the native table structure of the MySQL database, so that the native table structure is changed to the target table structure. After step S131 occurs between step S130 and step S140, the obtained target table structure has one more class column than the native table structure, so that the target table structure with the added class column cannot resolve the row change operation performed on the native table structure in the binary log, which may eventually result in a failure of the synchronization task.
Therefore, in order to solve the above-mentioned problem of synchronization interruption caused by a table structure change event, the present application proposes two improvements to the incremental data synchronization process of the current DTS: firstly, a generation protocol of a binary log of an upstream database is modified, so that a table structure change event is recorded more completely; and secondly, a cache module is added between the upstream database and the downstream database and is used for caching the current table structure of the upstream database in the data synchronization process, so that the table structure change event recorded by the binary log can still be obtained and analyzed by using the current table structure after the table structure of the upstream database is changed, and synchronization interruption is avoided.
FIG. 2 is a flow diagram of a method of data synchronization in accordance with an aspect of the subject application. As shown in fig. 2, one aspect of the present application provides a method of data synchronization, which may include: step S210, analyzing a table structure change event recorded in a binary log of an upstream database to obtain a table change statement, wherein the table structure change event represents a change operation from an original table structure of the upstream database to a target table structure, and the table change statement is an operation language for changing a database table structure; step S220, responding to the judgment result of the downstream database for finishing the execution of the row change event of the native table structure, and storing the target table structure into a cache module; and step S230, based on the target table structure in the cache module, executing a table change statement in the downstream database, and updating the native table structure in the downstream database to the target table structure.
In some embodiments, the following steps are further included before step S210: and modifying the generation protocol of the binary log of the upstream database, so that the binary log records a plurality of table structure information of the table structure change events. Specifically, in the current generation protocol of the binary log of the upstream database, the field mapping of the column name information, the column attribute information, the table character set and the column character set in the table structure change process is absent, so that when the binary log records the table structure change event, the binary log lacks the record of the column name information, the column attribute information, the table character set and the column character set, and therefore, in order to enable the binary log to record the table structure information of the table structure change event of the upstream database more completely, the generation protocol of the binary log of the upstream database needs to be modified.
More specifically, a target table structure in an upstream database is analyzed to determine table structure information missing in a table structure change event of a binary log, such as: column name information, column attribute information (e.g., whether or not it is a primary key), table character sets, and column character sets, etc. Further, inserting meta-information of the token table structure information at the end of the initial generation protocol of the binary log to obtain a target generation protocol of the binary log. Due to the update of the generation protocol of the binary log, the upstream database is converted into a Databaus version from the MySQL5.6 open source version. The Databus version database persistently records more column information on the basis of compatible open source versions, such as: column name information, column attribute information (e.g., whether or not it is a primary key), table character sets, and column character sets, etc. In order to be compatible with the open source version MySQL, newly added meta information is inserted at the end of an initial generation protocol, and only the generation protocol of a binary log is modified, so that the open source version MySQL can also be used as a slave library and a master library. In this way, the binary log of the upstream database can record a table structure change event containing a plurality of table structure information.
TABLE 1
Figure BDA0003618299340000101
Figure BDA0003618299340000111
Table 1 is a binary log generation protocol table of the databank of databank version, and the binary log generation protocol of the databank of databank version is formed by overlapping a generation protocol table of mysql5.6 open source version and meta-information of newly added representation table structure information of databank version. It can be seen that meta-information characterizing the table structure information is inserted at the end of the mysql5.6 open source version of the generation protocol table. In table 1, the structure name, the number of occupied characters, and the description of the meta information of each table structure information correspond to one another, for example, the number of bytes occupied by the meta information of the table structure information having the structure name "event header" is 19 bytes, and the description thereof is: the EVENT header information has a type of TABLE _ MAP _ EVENT and a value of 19. The meta-information of the table structure information with the structure name of "table id" has the number of bytes occupied by 6 bytes, and the description thereof is as follows: table ID, and so on, the contents of table 1 are not listed here.
Fig. 3 is an incremental synchronization flow diagram. As shown in fig. 3, the incremental data synchronization process begins after the upstream database is updated from the mysql5.6 open source version to the Databus version. Specifically, after the incremental transmission task is started, the DTS sends a connection creation request to an upstream database; after the connection is successfully established, the DTS sends a request for registering a slave database to an upstream database, wherein the request is used for registering meta information necessary for pulling out data; further, after the slave database is successfully registered, the DTS sends a request for pulling the binary log to an upstream database to acquire an adding and deleting modification data packet related to the upstream database in the binary log; furthermore, the data packets of the binary log are analyzed and converted, and are restored into SQL statements which can be identified by the DTS, and then the SQL statements are executed in a downstream database.
In particular, the data packets in the binary log include table change events, row change events, and other events. The table change event is used for representing the table structure information when the data is changed; the row change event is used for representing specific operations of row data change, such as operations of inserting, deleting and updating row data; other events may include table structure change events, table change statements used to document table structures, such as SQL statements to add columns, delete columns, and so on. The DTS updates the table structure event to a cache module, and writes the row change event and other events into a built-in reading module; further, the read module reads the table change event in the cache module. Furthermore, the scheduling module integrates the table change event, the row change event and other events to obtain SQL statements which can be identified by the DTS, and the writing module executes the SQL statements in a downstream database.
In some embodiments, in an actual DTS, the more common scenarios are mainly: synchronizing data of a relational database to a relational database, synchronizing data of a relational database to a message queue, synchronizing data of a relational database to a non-relational database, and the like. Specifically, the data of the relational database is synchronized to the relational database, and for example, the data of the relational database management system MySQL is synchronized to the relational database management system MySQL, which is mainly used for cluster splitting, that is, because the storage space of the upstream MySQL disk is insufficient due to the increase of the data volume, or the traffic volume is too large and is limited by the write bottleneck of a single instance, one MySQL needs to be split into a plurality of MySQL. The data of the relational database is synchronized to the message queue, the MySQL data can be synchronized to the open-source stream processing platform kafka, the data change of the database is mainly used for subscribing, the service only needs to consume the data in the kafka, and the data can be used for scenes such as real-time analysis, service monitoring and the like. The data of the relational database is synchronized to the non-relational database, and the data of MySQL can be synchronized to a distributed real-time search and analysis Engine (ES), for example, for scenes of real-time search, real-time analysis, and the like. Thus, the downstream databases in this application may be the relational database management system MySQL, the open source stream processing platform kafka, and the distributed real-time search and analysis engine ES.
Obviously, in the above incremental data synchronization process, a caching module is provided to cache the table structure event, and the following will describe the table structure caching process in more detail.
In some embodiments, when modifying a native table structure of an upstream database (e.g., MySQL), a table change operation may be directly performed on the native table structure, but in order to avoid a problem of performance degradation of the upstream database when modifying the native table structure of the upstream database, an open source tool is usually used for performing the table change operation, and in this application, the open source tool is mainly used for modifying the native table structure.
FIG. 4 is a flow chart of a method for an open source tool to modify a table structure. As shown in fig. 4, when the open source table structure modification tool is used to perform table structure modification, the method mainly includes a data replication phase and a modification completion phase. In the data replication phase, a new table structure of the upstream database MySQL is created, for example, when the table structure change event is to add a new column class to the native table structure, a new table student _ new containing the class column is created. Specifically, assume that the native table structure includes: id int column, name varchar (20) column, and sex varchar (20) column, the new table structure will have: the id int column, the name varchar (20) column, the sex varchar (20) column and the class varchar (20) column, although there is no data content in the new table at this time. Further, the data content in the native table is copied to the new table by some means. Furthermore, when the data content in the native table and the data content in the new table are substantially consistent, the data writing operation of the native table can be prohibited by locking the native table, and the like, and the data content of the new table and the native table is completely consistent. Finally, renaming the new table and the native table, enabling the native table name to cover the new table name, renaming the native table, and finally obtaining the new table with the native table name, namely the target table structure. For example, the new table is renamed to student, the native table structure is renamed to student _ old. The name of the new table may be overwritten with the name of the original table, and the name of the original table may be replaced, and the specific name content is not limited. Of course, the above operations are all recorded in the binary log of the upstream database.
Obviously, when the open source tool is used to modify the native table structure of the upstream database, multiple temporary tables are generated, such as the student _ old and the student _ new. If the temporary table is synchronized to the downstream database, it will cause trouble to the user, and the data consistency in the synchronization process is difficult to be ensured, so that the data synchronization fails. Therefore, a table change statement for performing table structure change by using an open source tool, namely, an SQL statement related to table change, needs to be analyzed according to the table structure change event recorded by the binary log; and then directly synchronizing the table change statement to a downstream database so as to solve the problem.
Fig. 5 is a flow chart of table structure caching. As shown in fig. 5, at the beginning of the incremental synchronization phase, first pass through "show column from table; "statement, get the native table structure that needs to be synchronized in the upstream database MySQL (Databus version), and cache it into the cache module for table structure caching.
Further, reading a binary log of the upstream database, wherein data packets of the binary log include: table structure events, row change events, and table structure change events. Specifically, one operation of data in the binary log is called as an event, and the table change event is used for representing table structure information when the data changes; the row change event is used for representing specific operations of row data change, such as operations of inserting, deleting and updating row data; other events may include table structure change events, table change statements used to document table structures, such as SQL statements to add columns, delete columns, and so on.
Still further, because the target table structure obtained using the open source tool is the result of a table structure change operation performed on the built new table structure, and not the result of a table structure change performed directly on the basis of the native table structure. Therefore, when a table structure change event exists in the binary log, it is necessary to determine whether or not an open source tool is used when a native table structure change is performed. If the open source tool is used for the native table structure change, the table change statement for changing the native table structure into the target table structure needs to be analyzed, and if the table structure change operation is not performed by the open source tool, the table change statement can be directly read from the binary log.
Specifically, if the original table structure change operation of the upstream database is completed through an open source tool, performing open source tool matching on a temporary table (e.g. a student _ old and a student _ new) generated in a table structure change event according to a suffix of the temporary table; for example, "_ ghc" and "_ gho" are table name suffixes of temporary tables of open source tools gh-ost, which are tools of GitHub open source that alter MySQL table structures online. Further, the target table structure is analyzed by combining a specific open source tool, and a table change statement sentence for changing the native table structure into the target table structure is analyzed, for example, "an alternative table add column add class (20) comment' class name"; and caching the table change statement to a metadata base, wherein the metadata base is used for caching the table change statement analyzed by the table structure change event based on the binary log. Further, whether the table change statement cached in the metadata base needs to be executed is judged, namely whether all the table structure change events recorded in the binary log are analyzed completely, and if all the analysis operations on the table structure change events are not completed, the steps are repeated, and subsequent operations are not performed; and if all analysis operations on the table structure change event are completed, opening a reading channel of the table change statement in the metadata base. Further, an execution state of the row change event of the downstream database to the native table structure is determined, wherein the execution state includes an end of execution and an ongoing execution. When the determination result indicates that the execution status is executing, that is, the current row change event is not executed in the downstream database, the DTS waits for the downstream database to execute the row change event of the native table structure until the execution status is changed to end, during which the DTS does not perform any operation; and if the row change event aiming at the native table structure in the cache module is completely executed in the downstream database, namely the execution state is changed to be the execution end, cleaning the native table structure stored in the cache module, and updating the target table structure to the cache module. Further, a table alteration statement is read in the metadata base and executed downstream, such as "the table add column add class (20) comment' class name"; ".
In some embodiments, if the table structure change operation is not through the open source tool, the execution state of the row change event of the native table structure by the downstream database is determined, wherein the execution state comprises the end of execution and the execution. When the determination result shows that the execution state is executing, that is, the current row change event is not executed in the downstream database, the DTS waits for the downstream database to execute the row change event of the native table structure until the execution state is changed to the end of execution, during which the DTS does not perform any operation; if the row change event aiming at the native table structure in the cache module is executed in the downstream database, namely the execution state is changed to be the execution end, the native table structure stored in the cache module is cleaned, and the target table structure is updated to the cache module; further, a table change statement is read in a table structure change event of the binary log, and the statement is executed in a downstream database, for example, "the table add column add class varchar (20) comment 'class name'; ".
In some embodiments, the downstream databases of the incremental data synchronization process may be a relational database MySQL, an open source stream processing platform kafka, a non-relational database ES, and the like.
According to the data synchronization method, the native table structure is cached in the cache module, so that the problem that the native table structure cannot be obtained when the table structure of the upstream database is changed, and the data synchronization is frequently interrupted is solved; in addition, the generation protocol of the binary log of the upstream database is modified, so that the binary log can record more structural information related to the table change event.
FIG. 6 is a system diagram of data synchronization in accordance with another aspect of the subject application. As shown in fig. 6, the present application also proposes a system for data synchronization, which may include: a statement parsing module 100, a cache update module 200 and a synchronization module 300. The statement parsing module 100 is configured to parse a table structure change event recorded in a binary log of an upstream database to obtain a table change statement, where the table structure change event represents a change operation from an original table structure of the upstream database to a target table structure, and the table change statement is an operation language used for changing a table structure of the database. The cache update module 200 is configured to store the target table structure in the cache module in response to a determination result that the execution of the row change event of the native table structure by the downstream database is finished. The synchronization module 300 is configured to execute a table change statement in the downstream database based on the target table structure in the cache module, so that the native table structure in the downstream database is updated to the target table structure.
In some embodiments, the execution of the statement parsing module 100 may include: judging whether an open source tool is used when the native table structure is changed into the target table structure, wherein the open source tool is used for modifying the native table structure; performing open source tool matching on the temporary table according to a suffix of the temporary table generated in the table structure change event in response to a determination result of using the open source tool when the native table structure is changed to the target table structure; and analyzing the target table structure by combining with an open source tool to obtain the table change statement of the target table structure, and caching the table change statement to the metadata base.
In some embodiments, after analyzing the target table structure in combination with the open source tool, obtaining the table change statement of the target table structure, and caching the table change statement to the metadata base, the method may include: judging whether to execute a table change statement in the metadata base; and opening a read channel of the table alteration statement in the metadata base in response to a determination result of executing the table alteration statement in the metadata base.
In some embodiments, the execution of the statement parsing module 100 may include: judging whether an open source tool is used when the native table structure is changed into the target table structure, wherein the open source tool is used for modifying the native table structure; and reading a table change statement from the binary log record in response to a determination that the open source tool is not used when changing the native table structure to the target table structure.
In some embodiments, the system further includes a protocol modification module (not shown) configured to modify a generation protocol of the binary log of the upstream database, so that the binary log records a plurality of table structure information of the table structure change event, where the table structure information includes: column name information, column attribute information, table character sets, and column character sets.
In some embodiments, the executing step of the protocol modification module may include: analyzing a target table structure in an upstream database, and determining table structure information which is lacked in a table structure change event of a binary log, wherein the table structure information comprises: column name information, column attribute information, table character sets, and column character sets; inserting meta-information of the structural information of the representation table at the tail of an initial generation protocol of the binary log to obtain a target generation protocol of the binary log; and logging a table structure change event containing a plurality of table structure information into a binary log based on the target generation protocol.
In some embodiments, the system further comprises an execution condition determining module (not shown) for determining an execution state of the row change event of the native table structure by the downstream database, wherein the execution state comprises an execution end and an execution; and when the execution state is in execution, waiting for the downstream database to execute the row change event of the native table structure until the execution state is changed to the end of execution.
In some embodiments, a cache flush module (not shown) is further included for flushing the native table structures in the cache module.
According to the data synchronization system, the native table structure is cached in the cache module, so that the problem that the native table structure cannot be obtained when the table structure of the upstream database is changed, and the data synchronization is frequently interrupted is solved; in addition, the generation protocol of the binary log of the upstream database is modified, so that the binary log can record more structural information related to the table change event.
Fig. 7 is an architectural diagram of a data transfer service in accordance with yet another aspect of the subject application. As shown in fig. 7, the overall architecture of the DTS mainly includes a front-end module, a server-end module, and a task synchronization module. The front-end module is used for carrying out information interaction with a user and providing an interactive visual configuration information filling platform for the user. The server module mainly comprises a high-availability module, a communication module, a data verification module, a pre-inspection module, a task allocation module, a task management module and a metadata database; the server module mainly performs the functions of task allocation, configuration scheduling and the like. The task synchronization module comprises a cache module, a data matching module, a reading module, a filtering module, a scheduling module, a routing module, a writing module and the like; the task synchronization module is used for specific data synchronization, including database and table data synchronization operation. The upstream database is provided with a master database and a slave database, the upstream database is transformed by a generation protocol of a binary log, so that a Databus database can be obtained, and the binary log of the database can more completely record a table structure change event comprising a plurality of table structure information. Data of the databank is synchronized to various downstream databases through DTS, wherein the various downstream databases comprise a relational database MySQL, an open source stream processing platform kafka, a non-relational database ES and the like.
Obviously, in the framework of the data transmission service, the generation protocol of the binary log of the upstream database is adjusted to obtain a Databus; in addition, a cache module is added in the task synchronization module for caching the upstream table structure being synchronized. For the above two improvement points, reference may be made to the data synchronization method according to an embodiment of the present application, which is not described herein again.
FIG. 8 is a schematic diagram of an electronic device according to yet another aspect of the present application. As shown in fig. 8, according to still another aspect of the present application, there is also provided an electronic device. The electronic device may include one or more processors and one or more memories. Wherein the memory has stored therein computer readable code which, when executed by the one or more processors, may perform the method of determining a three-dimensional position of a target as described above.
The method or apparatus according to the embodiments of the present application may also be implemented by means of the architecture of an electronic device shown in fig. 8. As shown in fig. 8, the electronic device may include a bus 401, one or more CPUs 402, a Read Only Memory (ROM)403, a Random Access Memory (RAM)404, a communication port 405 connected to a network, an input/output component 406, a hard disk 407, and the like. A storage device in the electronic device, such as the ROM403 or the hard disk 407, may store the method for data synchronization provided herein. The method for data synchronization may include, for example, parsing a table structure change event recorded in a binary log of an upstream database to obtain a table change statement, where the table structure change event represents a change operation from an original table structure to a target table structure of the upstream database, and the table change statement is an operation language for changing a table structure of the database; responding to a judgment result of the downstream database for finishing the execution of the row change event of the native table structure, and storing the target table structure into a cache module; and executing the table change statement in the downstream database based on the target table structure in the cache module, so that the native table structure in the downstream database is updated to the target table structure. Further, the electronic device may also include a user interface 408. Of course, the architecture shown in fig. 8 is merely exemplary, and one or more components in the electronic device shown in fig. 8 may be omitted as needed when implementing different devices.
FIG. 9 is a schematic diagram of a computer-readable storage medium structure according to yet another aspect of the present application. As shown in fig. 9, a computer-readable storage medium 500 according to an embodiment of the present application. The computer-readable storage medium 500 has computer-readable instructions stored thereon. The method of data synchronization according to embodiments of the present application described with reference to the above figures may be performed when the computer readable instructions are executed by a processor. The storage medium 500 includes, but is not limited to, volatile memory and/or non-volatile memory, for example. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like.
Further, according to an embodiment of the present application, the processes described above with reference to the flowcharts may be implemented as a computer software program. For example, the present application provides a non-transitory machine-readable storage medium having stored thereon machine-readable instructions executable by a processor to perform instructions corresponding to the method steps provided herein, such as: analyzing a table structure change event recorded in a binary log of an upstream database to obtain a table change statement, wherein the table structure change event represents a change operation from an original table structure of the upstream database to a target table structure, and the table change statement is an operation language for changing the table structure of the database; responding to a judgment result of the downstream database for finishing the execution of the row change event of the native table structure, and storing the target table structure to a cache module; and executing the table change statement in the downstream database based on the target table structure in the cache module, so that the native table structure in the downstream database is updated to the target table structure. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application.
The method and apparatus, device of the present application may be implemented in a number of ways. For example, the methods and apparatuses, devices of the present application may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present application are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present application may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present application. Thus, the present application also covers a recording medium storing a program for executing the method according to the present application.
In addition, parts of the above technical solutions provided in the embodiments of the present application that are consistent with the implementation principle of the corresponding technical solutions in the prior art are not described in detail, so as to avoid redundant description.
The above description is only an embodiment of the present application and an illustration of the technical principles applied. It will be appreciated by a person skilled in the art that the scope of protection covered by the present application is not limited to the embodiments with a specific combination of the features described above, but also covers other embodiments with any combination of the features described above or their equivalents without departing from the technical idea. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (10)

1. A method of data synchronization, comprising:
analyzing a table structure change event recorded in a binary log of an upstream database to obtain a table change statement, wherein the table structure change event represents a change operation from an original table structure of the upstream database to a target table structure, and the table change statement is an operation language for changing a database table structure;
responding to a judgment result of a downstream database for finishing the execution of the row change event of the native table structure, and storing the target table structure to a cache module; and
executing the table change statement in the downstream database based on the target table structure in the cache module, so that the native table structure in the downstream database is updated to the target table structure.
2. The data synchronization method according to claim 1, wherein the parsing the table structure change event recorded in the binary log of the upstream database to obtain the table change statement comprises:
determining whether to use an open source tool when changing the native table structure to the target table structure, wherein the open source tool is used for modifying the native table structure;
matching the open source tool to the temporary table according to a suffix of the temporary table generated in a table structure change event in response to a determination result of using the open source tool when changing the native table structure to the target table structure; and
and analyzing the target table structure by combining the open source tool to obtain the table change statement of the target table structure, and caching the table change statement to a metadata base.
3. The data synchronization method according to claim 2, wherein after the analyzing the target table structure in combination with the open source tool to obtain the table change statement of the target table structure and caching the table change statement in a metadata database, the method comprises:
judging whether to execute the table change statement in the metadata base; and
and responding to the judgment result of executing the table change statement in the metadata base, and opening a reading channel of the table change statement in the metadata base.
4. The data synchronization method according to claim 1, wherein the parsing the table structure change event recorded in the binary log of the upstream database to obtain the table change statement comprises:
determining whether to use an open source tool when changing the native table structure to the target table structure, wherein the open source tool is used for modifying the native table structure; and
in response to a determination that the open source tool is not used when changing the native table structure to the target table structure, reading a table change statement from the binary log record.
5. The data synchronization method according to any one of claims 1 to 4, wherein before the parsing the table structure change event recorded in the binary log of the upstream database to obtain the table change statement, the method comprises:
modifying a generation protocol of a binary log of an upstream database, so that the binary log records a plurality of table structure information of a table structure change event, wherein the table structure information comprises: column name information, column attribute information, table character sets, and column character sets.
6. The data synchronization method of claim 5, wherein the modifying the generation protocol of the binary log of the upstream database to make the binary log record a plurality of table structure information of the table structure change event comprises:
analyzing the target table structure in the upstream database, and determining table structure information missing in a table structure change event of the binary log, wherein the table structure information comprises: column name information, column attribute information, table character sets, and column character sets;
inserting meta information representing the table structure information at the end of the initial generation protocol of the binary log to obtain a target generation protocol of the binary log; and
logging the table structure change event comprising a plurality of the table structure information into the binary log based on the target generation protocol.
7. The data synchronization method according to claim 1, wherein after parsing the table structure change event recorded in the binary log of the upstream database to obtain the table change statement, the method comprises:
determining an execution state of the row change event of the native table structure by the downstream database, wherein the execution state comprises an end of execution and an ongoing execution; and
when the execution state is executing, waiting for the downstream database to execute the row change event of the native table structure until the execution state is changed to the end of execution.
8. The data synchronization method according to claim 1, wherein before storing the target table structure to a cache module in response to a determination that execution of the row change event of the native table structure by the downstream database is finished, the method comprises:
clearing the native table structure in the cache module.
9. A system for data synchronization, comprising:
the statement analysis module is used for analyzing a table structure change event recorded in a binary log of an upstream database to obtain a table change statement, wherein the table structure change event represents a change operation from an original table structure of the upstream database to a target table structure, and the table change statement is an operation language for changing the table structure of the database;
the cache updating module is used for responding to a judgment result of the downstream database on the execution end of the row change event of the native table structure and storing the target table structure into the cache module; and
a synchronization module, configured to execute the table change statement in the downstream database based on the target table structure in the cache module, so that the native table structure in the downstream database is updated to the target table structure.
10. A computer-readable storage medium, characterized in that it stores a computer program adapted to be loaded by a processor for performing the steps of the data synchronization method according to any one of claims 1-8.
CN202210454528.0A 2022-04-27 2022-04-27 Method, system and computer readable storage medium for data synchronization Pending CN114741453A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210454528.0A CN114741453A (en) 2022-04-27 2022-04-27 Method, system and computer readable storage medium for data synchronization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210454528.0A CN114741453A (en) 2022-04-27 2022-04-27 Method, system and computer readable storage medium for data synchronization

Publications (1)

Publication Number Publication Date
CN114741453A true CN114741453A (en) 2022-07-12

Family

ID=82283416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210454528.0A Pending CN114741453A (en) 2022-04-27 2022-04-27 Method, system and computer readable storage medium for data synchronization

Country Status (1)

Country Link
CN (1) CN114741453A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115982285A (en) * 2023-03-10 2023-04-18 北京集度科技有限公司 Data processing method, device and computer readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115982285A (en) * 2023-03-10 2023-04-18 北京集度科技有限公司 Data processing method, device and computer readable storage medium

Similar Documents

Publication Publication Date Title
US11429641B2 (en) Copying data changes to a target database
RU2740865C1 (en) Methods and device for efficient implementation of database supporting fast copying
KR102307371B1 (en) Data replication and data failover within the database system
CN110502507B (en) Management system, method, equipment and storage medium of distributed database
US8078582B2 (en) Data change ordering in multi-log based replication
RU2599538C2 (en) Methods and systems for loading data into temporal data warehouse
US8626717B2 (en) Database backup and restore with integrated index reorganization
JP3992263B2 (en) Database-file linkage method
US8924365B2 (en) System and method for range search over distributive storage systems
EP1462960A2 (en) Consistency unit replication in application-defined systems
US20180046643A1 (en) Consistent execution of partial queries in hybrid dbms
US20190370360A1 (en) Cloud storage distributed file system
US20090210429A1 (en) System and method for asynchronous update of indexes in a distributed database
CN112286941B (en) Big data synchronization method and device based on Binlog + HBase + Hive
US20070288835A1 (en) Apparatus, computer readable medium, data signal, and method for document management
JP2004334858A (en) System and method of facilitating synchronization in client/server environment
JPWO2008149552A1 (en) Database conflict resolution method
CN107357920B (en) Incremental multi-copy data synchronization method and system
CN111930850A (en) Data verification method and device, computer equipment and storage medium
WO2023077971A1 (en) Transaction processing method and apparatus, and computing device and storage medium
CN113868028A (en) Method for replaying log on data node, data node and system
US20180276267A1 (en) Methods and system for efficiently performing eventual and transactional edits on distributed metadata in an object storage system
CN114741453A (en) Method, system and computer readable storage medium for data synchronization
WO2020192663A1 (en) Data management method and related device
CN111522688A (en) Data backup method and device for distributed system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination