CN113821532A

CN113821532A - System and method for synchronizing data to heterogeneous data source based on mysql

Info

Publication number: CN113821532A
Application number: CN202111154477.1A
Authority: CN
Inventors: 黎杰; 罗鹏
Original assignee: Chongqing Fumin Bank Co Ltd
Current assignee: Chongqing Fumin Bank Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2021-12-21

Abstract

The invention relates to the technical field of data synchronization, in particular to a method for synchronizing mysql-based data to a heterogeneous data source. A method for synchronizing mysql data to a heterogeneous data source comprises the steps that canal pulls binlog data from mysql nodes through a master-slave copy protocol; performing log analysis on binlog data to form analysis data; merging the analyzed data to form merged data; converting the merged data into a standard format, wherein the outermost layer of the message is a tupe node and a schema node; and consuming the converted data, and performing corresponding operation on the heterogeneous data source. The invention also discloses a system for synchronizing the mysql-based data to the heterogeneous data source. By adopting the non-intrusive business system, the binlog processing of the mysql does not influence the performance of the mysql and can be synchronized in real time, so that the query efficiency is improved.

Description

System and method for synchronizing data to heterogeneous data source based on mysql

Technical Field

The invention relates to the technical field of data synchronization, in particular to a system and a method for synchronizing mysql-based data to a heterogeneous data source.

Background

MySQL is a relational database management system widely applied at present, when Mysql has a large data volume, the query performance is rapidly reduced, and many other storage engines can solve the problem of query of the large data volume at present, but no perfect tool is used for synchronization of heterogeneous data sources, so that the problem of synchronization of the heterogeneous data sources is urgently needed to be solved at present.

Disclosure of Invention

One of the purposes of the invention is to provide a method for synchronizing data to heterogeneous data sources based on mysql.

A method for synchronizing data to heterogeneous data sources based on mysql comprises the following steps:

canal draws binlog data from the mysql node through a master-slave copy protocol;

performing log analysis on binlog data to form analysis data;

merging the analyzed data to form merged data;

converting the merged data into a standard format, wherein the outermost layer of the message is a tupe node and a schema node;

and consuming the converted data, and performing corresponding operation on the heterogeneous data source.

The technical scheme adopted by the invention is as follows: 1. the service system is not invaded, the performance of the mysql is not influenced by binlog processing of the mysql, and real-time synchronization can be carried out; 2. after the operations are combined, the synchronization performance of the heterogeneous data can be enhanced by executing the operations in batch; 3. the problem of mysql mass data query performance is solved, and query efficiency is high.

Furthermore, the tupe node is an array structure, the specific information of each binlog data is stored inside the tupe node, the tupe node comprises pipeline id, operation time stamp, operation character and detail data of operation, and the pipeline id composition rule is formed by adding offset through the binlog name. The running water id is used for preventing repeated operation, the running water id composition rule is that the unique id of each operation can be guaranteed through binlog names and offsets, the running water id composition rule can be stored to redis after being processed by a consumption end, repeated operation is prevented, and the problem of data repeatability is solved through redis.

Furthermore, the schema node internally comprises a fields array node, a tableName node and a namespace node. tableName and namespace are table names and namespace identifications for heterogeneous data sources.

Further, to prevent data from piling up, before consuming the converted data, sending the converted data to the kafka middleware is also included.

Further, the method also comprises the partition of the rewriting kafka, and the hash grouping is carried out on the library name and the table name. By means of kafka self-defined partition, the sequence of single table synchronization can be guaranteed, and automatic fault tolerance can be achieved.

The invention also aims to provide a system for synchronizing mysql-based data to heterogeneous data sources.

A system based on mysql data synchronization to heterogeneous data sources comprises a binlog data pulling module, wherein the binlog data is pulled from mysql nodes by adopting canal through a master-slave copy protocol;

the log analysis module is used for carrying out log analysis on the binlog data to form analysis data;

the data preprocessing module is used for merging the analysis data to form merged data;

the data conversion module is used for converting the merged data into a standard format, and the outermost layer of the message is a tupe node and a schema node;

and the data consumption module is used for consuming the converted data and carrying out corresponding operation on the heterogeneous data source.

Further, in order to prevent data accumulation, kafka middleware is further included, and the kafka middleware is used for receiving the converted data, consuming the converted data through the kafka middleware and synchronizing the converted data to the heterogeneous data sources.

Furthermore, the tupe node is an array structure, the concrete information of each binlog data is stored in the tupe node, the tupe node comprises pipeline id, operation timestamp, operational characters and detailed data of the operation, the pipeline id composition rule is that the binlog name is added with an offset, and the schema node internally comprises a fields array node, a tableName node and a namespace node. And the running water id is used for preventing repeated operation, the unique id of each operation can be ensured and is not repeated, and the running water id can be stored in the redis to prevent repeated operation after being processed by the consumption end.

Drawings

FIG. 1 is a flowchart of an embodiment of a method for synchronizing mysql-based data to heterogeneous data sources according to the invention.

Detailed Description

The following is further detailed by way of specific embodiments:

as shown in fig. 1, the embodiment discloses a method for synchronizing data to heterogeneous data sources based on mysql, which includes the following steps:

canal pulls binlog data from the mysql node through a master-slave copy protocol, and in this embodiment canal simulates slave of mysql. canal is understood as a tool for synchronizing incremental data, and the main purpose is to provide incremental data subscription and consumption based on MySQL database incremental log parsing. Binlog, a binary file, records all the changes of MySQL data and stores them in binary form. Both Canal and binlog are prior art.

And performing log analysis on the binlog data to form analysis data. If 10 pieces of data are updated to the A table, 5 pieces of data are deleted from the B table, and 6 pieces of data are added to the A table. The data are as follows:

json1: { a table adds 10 pieces of data };

json2: { b table modifies 5 data };

json3: { a table deletes 6 pieces of data }.

And in order to reduce the operation times of the data source subsequently, merging the analysis data to form merged data. After treatment, the following were carried out:

json1: { a table adds 10 pieces of data and a table deletes 6 pieces of data };

json2: { b table modifies 5 data }.

And converting the merged data into a standard format, wherein the outermost layer of the message is a tupe node and a schema node. The tupe node is an array structure, and stores the specific information of each binlog data inside, such as: "tupe" [ [ [ "mysql-bin.00001000000000000000090222" ] ",1517194451453," i ","16"," zhangsan "], [" mysql-bin.00001000000000000000090345",1517194455643," u ","15"," lisi "] ], wherein the first column is a pipeline id and functions to prevent repeated operations, and the composition rule is composed by adding an offset to a binlog name, for example, the first data pipeline id is composed of the binlog name mysql-bin.000010 and then 20 bits 0 is supplemented to mysql-bin.00001000000000000000000000, and then the unique offset 90222 is changed to mysql-bin.000010000000000000000000222, so as to ensure that the id of each operation is unique and not repeated, and when the consumer side processes, the operation is stored to a redis to prevent repeated operations; the second column is the operation timestamp; the third column is an operator, i represents insertion, u represents update, and d represents deletion; the latter columns are the detailed data of the operation.

The schema node internally comprises a fields array node, a tableName node and a namespace node. For example: "schemas" { "fields" [ ], "tableName": testTable "," namespace ": mysql.test" }, wherein tableName and namespace are table names and namespace identifications of heterogeneous data sources, fields are metadata information of operation data, "fields" [ { "isPk": false "," name ": id", "type": int (10) "}, {" isPk ": false", "name": name "," type ": varchar (20)") in which metadata of two pieces of data are contained, isPk identifies whether the field is a primary key of mysql, name denotes a field name of the field at mysql, and type denotes a type of the field in mysmyl.

In this embodiment, before consuming the converted data, in order to prevent data from being stacked, the step of sending the converted data to the kafka middleware is further included. Also included is a partition that overwrites kafka, hash-grouping the library and table names to ensure that operations on a table can only be within a partition and are ordered.

And consuming the data of the kafka middleware, and performing corresponding operation on the heterogeneous data source. The operations include, but are not limited to, incremental pruning.

The embodiment also discloses a system for synchronizing mysql data to a heterogeneous data source, which comprises a binlog data pulling module, wherein the binlog data is pulled from a mysql node by adopting canal through a master-slave copy protocol;

the data conversion module is used for converting the merged data into a standard format, and the outermost layer of the message is a tupe node and a schema node; the tupe node is an array structure, the concrete information of each binlog data is stored inside the tupe node, the tupe node comprises pipeline id, operation timestamp, an operator and detailed data of operation, the pipeline id composition rule is that the binlog name is added with an offset, and the schema node internally comprises a fields array node, a tableName node and a namespace node. Other descriptions and methods of the tupe node and the schema node are the same, and are not repeated herein.

The kafka middleware is used for consuming the converted data through the kafka middleware after receiving the converted data and synchronizing the converted data to a heterogeneous data source;

and the data consumption module is used for consuming the converted data and carrying out corresponding operation on the heterogeneous data source. The operations include, but are not limited to, incremental pruning.

The foregoing is merely an example of the present invention, and common general knowledge in the field of known specific structures and characteristics is not described herein in any greater extent than that known in the art at the filing date or prior to the priority date of the application, so that those skilled in the art can now appreciate that all of the above-described techniques in this field and have the ability to apply routine experimentation before this date can be combined with one or more of the present teachings to complete and implement the present invention, and that certain typical known structures or known methods do not pose any impediments to the implementation of the present invention by those skilled in the art. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.

Claims

1. A method for synchronizing data to heterogeneous data sources based on mysql is characterized by comprising the following steps:

performing log analysis on binlog data to form analysis data;

merging the analyzed data to form merged data;

2. The mysql-based data synchronization method to heterogeneous data sources of claim 1, wherein the tupe node is an array structure, and internally stores the specific information of each binlog data, the tupe node comprises pipeline id, operation timestamp, operator and detail data of operation, and the pipeline id composition rule is adding offset by binlog name.

3. The method for mysql-based data synchronization to heterogeneous data sources of claim 2, wherein the schema node internally comprises a fields array node, a tableName node and a namespace node.

4. The method for mysql-based data synchronization to heterogeneous data sources as claimed in claim 1 or 3, further comprising sending the converted data to kafka middleware before consuming the converted data.

5. The method of claim 4, further comprising overwriting part of kafka, hash-grouping the library name and the table name based on mysql data synchronization to a heterogeneous data source.

6. A system based on mysql data synchronization to heterogeneous data sources is characterized by comprising a binlog data pulling module, wherein the binlog data is pulled from mysql nodes by adopting canal through a master-slave copy protocol;

7. The mysql data-based system for synchronizing to heterogeneous data sources according to claim 6, further comprising kafka middleware, configured to receive the converted data and then consume the converted data through the kafka middleware to synchronize to the heterogeneous data sources.

8. The mysql data-based system for synchronizing to heterogeneous data sources according to claim 7, wherein the tupe node is an array structure, and stores the specific information of each binlog data internally, the tupe node includes pipeline id, operation timestamp, operator and detail data of operation, the pipeline id composition rule is that the binlog name is added with an offset, and the schema node contains the fields array node, the tableName node and the namespace node internally.