CN113821532A - System and method for synchronizing data to heterogeneous data source based on mysql - Google Patents
System and method for synchronizing data to heterogeneous data source based on mysql Download PDFInfo
- Publication number
- CN113821532A CN113821532A CN202111154477.1A CN202111154477A CN113821532A CN 113821532 A CN113821532 A CN 113821532A CN 202111154477 A CN202111154477 A CN 202111154477A CN 113821532 A CN113821532 A CN 113821532A
- Authority
- CN
- China
- Prior art keywords
- data
- node
- mysql
- binlog
- heterogeneous
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 239000000203 mixture Substances 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 2
- 230000001360 synchronised effect Effects 0.000 abstract 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 5
- 238000005192 partition Methods 0.000 description 4
- 238000013138 pruning Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of data synchronization, in particular to a method for synchronizing mysql-based data to a heterogeneous data source. A method for synchronizing mysql data to a heterogeneous data source comprises the steps that canal pulls binlog data from mysql nodes through a master-slave copy protocol; performing log analysis on binlog data to form analysis data; merging the analyzed data to form merged data; converting the merged data into a standard format, wherein the outermost layer of the message is a tupe node and a schema node; and consuming the converted data, and performing corresponding operation on the heterogeneous data source. The invention also discloses a system for synchronizing the mysql-based data to the heterogeneous data source. By adopting the non-intrusive business system, the binlog processing of the mysql does not influence the performance of the mysql and can be synchronized in real time, so that the query efficiency is improved.
Description
Technical Field
The invention relates to the technical field of data synchronization, in particular to a system and a method for synchronizing mysql-based data to a heterogeneous data source.
Background
MySQL is a relational database management system widely applied at present, when Mysql has a large data volume, the query performance is rapidly reduced, and many other storage engines can solve the problem of query of the large data volume at present, but no perfect tool is used for synchronization of heterogeneous data sources, so that the problem of synchronization of the heterogeneous data sources is urgently needed to be solved at present.
Disclosure of Invention
One of the purposes of the invention is to provide a method for synchronizing data to heterogeneous data sources based on mysql.
A method for synchronizing data to heterogeneous data sources based on mysql comprises the following steps:
canal draws binlog data from the mysql node through a master-slave copy protocol;
performing log analysis on binlog data to form analysis data;
merging the analyzed data to form merged data;
converting the merged data into a standard format, wherein the outermost layer of the message is a tupe node and a schema node;
and consuming the converted data, and performing corresponding operation on the heterogeneous data source.
The technical scheme adopted by the invention is as follows: 1. the service system is not invaded, the performance of the mysql is not influenced by binlog processing of the mysql, and real-time synchronization can be carried out; 2. after the operations are combined, the synchronization performance of the heterogeneous data can be enhanced by executing the operations in batch; 3. the problem of mysql mass data query performance is solved, and query efficiency is high.
Furthermore, the tupe node is an array structure, the specific information of each binlog data is stored inside the tupe node, the tupe node comprises pipeline id, operation time stamp, operation character and detail data of operation, and the pipeline id composition rule is formed by adding offset through the binlog name. The running water id is used for preventing repeated operation, the running water id composition rule is that the unique id of each operation can be guaranteed through binlog names and offsets, the running water id composition rule can be stored to redis after being processed by a consumption end, repeated operation is prevented, and the problem of data repeatability is solved through redis.
Furthermore, the schema node internally comprises a fields array node, a tableName node and a namespace node. tableName and namespace are table names and namespace identifications for heterogeneous data sources.
Further, to prevent data from piling up, before consuming the converted data, sending the converted data to the kafka middleware is also included.
Further, the method also comprises the partition of the rewriting kafka, and the hash grouping is carried out on the library name and the table name. By means of kafka self-defined partition, the sequence of single table synchronization can be guaranteed, and automatic fault tolerance can be achieved.
The invention also aims to provide a system for synchronizing mysql-based data to heterogeneous data sources.
A system based on mysql data synchronization to heterogeneous data sources comprises a binlog data pulling module, wherein the binlog data is pulled from mysql nodes by adopting canal through a master-slave copy protocol;
the log analysis module is used for carrying out log analysis on the binlog data to form analysis data;
the data preprocessing module is used for merging the analysis data to form merged data;
the data conversion module is used for converting the merged data into a standard format, and the outermost layer of the message is a tupe node and a schema node;
and the data consumption module is used for consuming the converted data and carrying out corresponding operation on the heterogeneous data source.
Further, in order to prevent data accumulation, kafka middleware is further included, and the kafka middleware is used for receiving the converted data, consuming the converted data through the kafka middleware and synchronizing the converted data to the heterogeneous data sources.
Furthermore, the tupe node is an array structure, the concrete information of each binlog data is stored in the tupe node, the tupe node comprises pipeline id, operation timestamp, operational characters and detailed data of the operation, the pipeline id composition rule is that the binlog name is added with an offset, and the schema node internally comprises a fields array node, a tableName node and a namespace node. And the running water id is used for preventing repeated operation, the unique id of each operation can be ensured and is not repeated, and the running water id can be stored in the redis to prevent repeated operation after being processed by the consumption end.
Drawings
FIG. 1 is a flowchart of an embodiment of a method for synchronizing mysql-based data to heterogeneous data sources according to the invention.
Detailed Description
The following is further detailed by way of specific embodiments:
as shown in fig. 1, the embodiment discloses a method for synchronizing data to heterogeneous data sources based on mysql, which includes the following steps:
canal pulls binlog data from the mysql node through a master-slave copy protocol, and in this embodiment canal simulates slave of mysql. canal is understood as a tool for synchronizing incremental data, and the main purpose is to provide incremental data subscription and consumption based on MySQL database incremental log parsing. Binlog, a binary file, records all the changes of MySQL data and stores them in binary form. Both Canal and binlog are prior art.
And performing log analysis on the binlog data to form analysis data. If 10 pieces of data are updated to the A table, 5 pieces of data are deleted from the B table, and 6 pieces of data are added to the A table. The data are as follows:
json1: { a table adds 10 pieces of data };
json2: { b table modifies 5 data };
json3: { a table deletes 6 pieces of data }.
And in order to reduce the operation times of the data source subsequently, merging the analysis data to form merged data. After treatment, the following were carried out:
json1: { a table adds 10 pieces of data and a table deletes 6 pieces of data };
json2: { b table modifies 5 data }.
And converting the merged data into a standard format, wherein the outermost layer of the message is a tupe node and a schema node. The tupe node is an array structure, and stores the specific information of each binlog data inside, such as: "tupe" [ [ [ "mysql-bin.00001000000000000000090222" ] ",1517194451453," i ","16"," zhangsan "], [" mysql-bin.00001000000000000000090345",1517194455643," u ","15"," lisi "] ], wherein the first column is a pipeline id and functions to prevent repeated operations, and the composition rule is composed by adding an offset to a binlog name, for example, the first data pipeline id is composed of the binlog name mysql-bin.000010 and then 20 bits 0 is supplemented to mysql-bin.00001000000000000000000000, and then the unique offset 90222 is changed to mysql-bin.000010000000000000000000222, so as to ensure that the id of each operation is unique and not repeated, and when the consumer side processes, the operation is stored to a redis to prevent repeated operations; the second column is the operation timestamp; the third column is an operator, i represents insertion, u represents update, and d represents deletion; the latter columns are the detailed data of the operation.
The schema node internally comprises a fields array node, a tableName node and a namespace node. For example: "schemas" { "fields" [ ], "tableName": testTable "," namespace ": mysql.test" }, wherein tableName and namespace are table names and namespace identifications of heterogeneous data sources, fields are metadata information of operation data, "fields" [ { "isPk": false "," name ": id", "type": int (10) "}, {" isPk ": false", "name": name "," type ": varchar (20)") in which metadata of two pieces of data are contained, isPk identifies whether the field is a primary key of mysql, name denotes a field name of the field at mysql, and type denotes a type of the field in mysmyl.
In this embodiment, before consuming the converted data, in order to prevent data from being stacked, the step of sending the converted data to the kafka middleware is further included. Also included is a partition that overwrites kafka, hash-grouping the library and table names to ensure that operations on a table can only be within a partition and are ordered.
And consuming the data of the kafka middleware, and performing corresponding operation on the heterogeneous data source. The operations include, but are not limited to, incremental pruning.
The embodiment also discloses a system for synchronizing mysql data to a heterogeneous data source, which comprises a binlog data pulling module, wherein the binlog data is pulled from a mysql node by adopting canal through a master-slave copy protocol;
the log analysis module is used for carrying out log analysis on the binlog data to form analysis data;
the data preprocessing module is used for merging the analysis data to form merged data;
the data conversion module is used for converting the merged data into a standard format, and the outermost layer of the message is a tupe node and a schema node; the tupe node is an array structure, the concrete information of each binlog data is stored inside the tupe node, the tupe node comprises pipeline id, operation timestamp, an operator and detailed data of operation, the pipeline id composition rule is that the binlog name is added with an offset, and the schema node internally comprises a fields array node, a tableName node and a namespace node. Other descriptions and methods of the tupe node and the schema node are the same, and are not repeated herein.
The kafka middleware is used for consuming the converted data through the kafka middleware after receiving the converted data and synchronizing the converted data to a heterogeneous data source;
and the data consumption module is used for consuming the converted data and carrying out corresponding operation on the heterogeneous data source. The operations include, but are not limited to, incremental pruning.
The foregoing is merely an example of the present invention, and common general knowledge in the field of known specific structures and characteristics is not described herein in any greater extent than that known in the art at the filing date or prior to the priority date of the application, so that those skilled in the art can now appreciate that all of the above-described techniques in this field and have the ability to apply routine experimentation before this date can be combined with one or more of the present teachings to complete and implement the present invention, and that certain typical known structures or known methods do not pose any impediments to the implementation of the present invention by those skilled in the art. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.
Claims (8)
1. A method for synchronizing data to heterogeneous data sources based on mysql is characterized by comprising the following steps:
canal draws binlog data from the mysql node through a master-slave copy protocol;
performing log analysis on binlog data to form analysis data;
merging the analyzed data to form merged data;
converting the merged data into a standard format, wherein the outermost layer of the message is a tupe node and a schema node;
and consuming the converted data, and performing corresponding operation on the heterogeneous data source.
2. The mysql-based data synchronization method to heterogeneous data sources of claim 1, wherein the tupe node is an array structure, and internally stores the specific information of each binlog data, the tupe node comprises pipeline id, operation timestamp, operator and detail data of operation, and the pipeline id composition rule is adding offset by binlog name.
3. The method for mysql-based data synchronization to heterogeneous data sources of claim 2, wherein the schema node internally comprises a fields array node, a tableName node and a namespace node.
4. The method for mysql-based data synchronization to heterogeneous data sources as claimed in claim 1 or 3, further comprising sending the converted data to kafka middleware before consuming the converted data.
5. The method of claim 4, further comprising overwriting part of kafka, hash-grouping the library name and the table name based on mysql data synchronization to a heterogeneous data source.
6. A system based on mysql data synchronization to heterogeneous data sources is characterized by comprising a binlog data pulling module, wherein the binlog data is pulled from mysql nodes by adopting canal through a master-slave copy protocol;
the log analysis module is used for carrying out log analysis on the binlog data to form analysis data;
the data preprocessing module is used for merging the analysis data to form merged data;
the data conversion module is used for converting the merged data into a standard format, and the outermost layer of the message is a tupe node and a schema node;
and the data consumption module is used for consuming the converted data and carrying out corresponding operation on the heterogeneous data source.
7. The mysql data-based system for synchronizing to heterogeneous data sources according to claim 6, further comprising kafka middleware, configured to receive the converted data and then consume the converted data through the kafka middleware to synchronize to the heterogeneous data sources.
8. The mysql data-based system for synchronizing to heterogeneous data sources according to claim 7, wherein the tupe node is an array structure, and stores the specific information of each binlog data internally, the tupe node includes pipeline id, operation timestamp, operator and detail data of operation, the pipeline id composition rule is that the binlog name is added with an offset, and the schema node contains the fields array node, the tableName node and the namespace node internally.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111154477.1A CN113821532A (en) | 2021-09-29 | 2021-09-29 | System and method for synchronizing data to heterogeneous data source based on mysql |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111154477.1A CN113821532A (en) | 2021-09-29 | 2021-09-29 | System and method for synchronizing data to heterogeneous data source based on mysql |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113821532A true CN113821532A (en) | 2021-12-21 |
Family
ID=78915941
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111154477.1A Pending CN113821532A (en) | 2021-09-29 | 2021-09-29 | System and method for synchronizing data to heterogeneous data source based on mysql |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113821532A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341223A (en) * | 2017-06-29 | 2017-11-10 | 华泰证券股份有限公司 | A kind of heterogeneous database real-time synchronization method based on message-oriented middleware |
CN110866022A (en) * | 2019-10-24 | 2020-03-06 | 贝壳技术有限公司 | Data analysis method, system and device based on log file |
CN111666344A (en) * | 2020-06-19 | 2020-09-15 | 中信银行股份有限公司 | Heterogeneous data synchronization method and device |
CN112182036A (en) * | 2020-09-15 | 2021-01-05 | 中信银行股份有限公司 | Data sending and writing method and device, electronic equipment and readable storage medium |
-
2021
- 2021-09-29 CN CN202111154477.1A patent/CN113821532A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341223A (en) * | 2017-06-29 | 2017-11-10 | 华泰证券股份有限公司 | A kind of heterogeneous database real-time synchronization method based on message-oriented middleware |
CN110866022A (en) * | 2019-10-24 | 2020-03-06 | 贝壳技术有限公司 | Data analysis method, system and device based on log file |
CN111666344A (en) * | 2020-06-19 | 2020-09-15 | 中信银行股份有限公司 | Heterogeneous data synchronization method and device |
CN112182036A (en) * | 2020-09-15 | 2021-01-05 | 中信银行股份有限公司 | Data sending and writing method and device, electronic equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020147392A1 (en) | Method and system for data synchronization between databases | |
CN110879813B (en) | Binary log analysis-based MySQL database increment synchronization implementation method | |
DE69902749T2 (en) | ENCAPSULATION, DATA DISPLAY AND TRANSMISSION OF CONTENT-ADDRESSABLE DATA | |
CN109800207B (en) | Log analysis method, device and equipment and computer readable storage medium | |
US8103705B2 (en) | System and method for storing text annotations with associated type information in a structured data store | |
EP3602341A1 (en) | Data replication system | |
CN113986873B (en) | Method for processing, storing and sharing data modeling of mass Internet of things | |
EP1662380A2 (en) | Time stamp methods for unified plant model | |
JPH0934763A (en) | Device and method for managing file | |
US20140289280A1 (en) | System and Method for Bi-directional Conversion of Directed Acyclic Graphs and Inter-File Branching | |
CN106802905B (en) | Collaborative data exchange method of isomorphic PLM system | |
CA2379930A1 (en) | Multi-model access to data | |
CN109086382B (en) | Data synchronization method, device, equipment and storage medium | |
CN114691704A (en) | Metadata synchronization method based on MySQL binlog | |
CN111427964B (en) | Running time stamp-oriented industrial cloud data storage model | |
CN114218218A (en) | Data processing method, device and equipment based on data warehouse and storage medium | |
CN102508834A (en) | Message-based event application device and message-based event application method | |
CN113590639A (en) | Data synchronization method between databases isolated by gatekeepers | |
CN110287251A (en) | The high fault tolerant data real-time synchronization method of distribution of MongoDB to HBase | |
CN112835918A (en) | MySQL database increment synchronization implementation method | |
CN114661823A (en) | Data synchronization method and device, electronic equipment and readable storage medium | |
CN109491988B (en) | Data real-time association method supporting full-scale updating | |
US7016906B1 (en) | Data processing method and apparatus employing OLE DB and having dual schema and auto update features | |
CN113821532A (en) | System and method for synchronizing data to heterogeneous data source based on mysql | |
US10061654B1 (en) | Depth first search of summary change log records for backup |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211221 |
|
RJ01 | Rejection of invention patent application after publication |