CN113821532A - System and method for synchronizing data to heterogeneous data source based on mysql - Google Patents

System and method for synchronizing data to heterogeneous data source based on mysql Download PDF

Info

Publication number
CN113821532A
CN113821532A CN202111154477.1A CN202111154477A CN113821532A CN 113821532 A CN113821532 A CN 113821532A CN 202111154477 A CN202111154477 A CN 202111154477A CN 113821532 A CN113821532 A CN 113821532A
Authority
CN
China
Prior art keywords
data
node
mysql
binlog
heterogeneous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111154477.1A
Other languages
Chinese (zh)
Inventor
黎杰
罗鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Fumin Bank Co Ltd
Original Assignee
Chongqing Fumin Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Fumin Bank Co Ltd filed Critical Chongqing Fumin Bank Co Ltd
Priority to CN202111154477.1A priority Critical patent/CN113821532A/en
Publication of CN113821532A publication Critical patent/CN113821532A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data synchronization, in particular to a method for synchronizing mysql-based data to a heterogeneous data source. A method for synchronizing mysql data to a heterogeneous data source comprises the steps that canal pulls binlog data from mysql nodes through a master-slave copy protocol; performing log analysis on binlog data to form analysis data; merging the analyzed data to form merged data; converting the merged data into a standard format, wherein the outermost layer of the message is a tupe node and a schema node; and consuming the converted data, and performing corresponding operation on the heterogeneous data source. The invention also discloses a system for synchronizing the mysql-based data to the heterogeneous data source. By adopting the non-intrusive business system, the binlog processing of the mysql does not influence the performance of the mysql and can be synchronized in real time, so that the query efficiency is improved.

Description

System and method for synchronizing data to heterogeneous data source based on mysql
Technical Field
The invention relates to the technical field of data synchronization, in particular to a system and a method for synchronizing mysql-based data to a heterogeneous data source.
Background
MySQL is a relational database management system widely applied at present, when Mysql has a large data volume, the query performance is rapidly reduced, and many other storage engines can solve the problem of query of the large data volume at present, but no perfect tool is used for synchronization of heterogeneous data sources, so that the problem of synchronization of the heterogeneous data sources is urgently needed to be solved at present.
Disclosure of Invention
One of the purposes of the invention is to provide a method for synchronizing data to heterogeneous data sources based on mysql.
A method for synchronizing data to heterogeneous data sources based on mysql comprises the following steps:
canal draws binlog data from the mysql node through a master-slave copy protocol;
performing log analysis on binlog data to form analysis data;
merging the analyzed data to form merged data;
converting the merged data into a standard format, wherein the outermost layer of the message is a tupe node and a schema node;
and consuming the converted data, and performing corresponding operation on the heterogeneous data source.
The technical scheme adopted by the invention is as follows: 1. the service system is not invaded, the performance of the mysql is not influenced by binlog processing of the mysql, and real-time synchronization can be carried out; 2. after the operations are combined, the synchronization performance of the heterogeneous data can be enhanced by executing the operations in batch; 3. the problem of mysql mass data query performance is solved, and query efficiency is high.
Furthermore, the tupe node is an array structure, the specific information of each binlog data is stored inside the tupe node, the tupe node comprises pipeline id, operation time stamp, operation character and detail data of operation, and the pipeline id composition rule is formed by adding offset through the binlog name. The running water id is used for preventing repeated operation, the running water id composition rule is that the unique id of each operation can be guaranteed through binlog names and offsets, the running water id composition rule can be stored to redis after being processed by a consumption end, repeated operation is prevented, and the problem of data repeatability is solved through redis.
Furthermore, the schema node internally comprises a fields array node, a tableName node and a namespace node. tableName and namespace are table names and namespace identifications for heterogeneous data sources.
Further, to prevent data from piling up, before consuming the converted data, sending the converted data to the kafka middleware is also included.
Further, the method also comprises the partition of the rewriting kafka, and the hash grouping is carried out on the library name and the table name. By means of kafka self-defined partition, the sequence of single table synchronization can be guaranteed, and automatic fault tolerance can be achieved.
The invention also aims to provide a system for synchronizing mysql-based data to heterogeneous data sources.
A system based on mysql data synchronization to heterogeneous data sources comprises a binlog data pulling module, wherein the binlog data is pulled from mysql nodes by adopting canal through a master-slave copy protocol;
the log analysis module is used for carrying out log analysis on the binlog data to form analysis data;
the data preprocessing module is used for merging the analysis data to form merged data;
the data conversion module is used for converting the merged data into a standard format, and the outermost layer of the message is a tupe node and a schema node;
and the data consumption module is used for consuming the converted data and carrying out corresponding operation on the heterogeneous data source.
Further, in order to prevent data accumulation, kafka middleware is further included, and the kafka middleware is used for receiving the converted data, consuming the converted data through the kafka middleware and synchronizing the converted data to the heterogeneous data sources.
Furthermore, the tupe node is an array structure, the concrete information of each binlog data is stored in the tupe node, the tupe node comprises pipeline id, operation timestamp, operational characters and detailed data of the operation, the pipeline id composition rule is that the binlog name is added with an offset, and the schema node internally comprises a fields array node, a tableName node and a namespace node. And the running water id is used for preventing repeated operation, the unique id of each operation can be ensured and is not repeated, and the running water id can be stored in the redis to prevent repeated operation after being processed by the consumption end.
Drawings
FIG. 1 is a flowchart of an embodiment of a method for synchronizing mysql-based data to heterogeneous data sources according to the invention.
Detailed Description
The following is further detailed by way of specific embodiments:
as shown in fig. 1, the embodiment discloses a method for synchronizing data to heterogeneous data sources based on mysql, which includes the following steps:
canal pulls binlog data from the mysql node through a master-slave copy protocol, and in this embodiment canal simulates slave of mysql. canal is understood as a tool for synchronizing incremental data, and the main purpose is to provide incremental data subscription and consumption based on MySQL database incremental log parsing. Binlog, a binary file, records all the changes of MySQL data and stores them in binary form. Both Canal and binlog are prior art.
And performing log analysis on the binlog data to form analysis data. If 10 pieces of data are updated to the A table, 5 pieces of data are deleted from the B table, and 6 pieces of data are added to the A table. The data are as follows:
json1: { a table adds 10 pieces of data };
json2: { b table modifies 5 data };
json3: { a table deletes 6 pieces of data }.
And in order to reduce the operation times of the data source subsequently, merging the analysis data to form merged data. After treatment, the following were carried out:
json1: { a table adds 10 pieces of data and a table deletes 6 pieces of data };
json2: { b table modifies 5 data }.
And converting the merged data into a standard format, wherein the outermost layer of the message is a tupe node and a schema node. The tupe node is an array structure, and stores the specific information of each binlog data inside, such as: "tupe" [ [ [ "mysql-bin.00001000000000000000090222" ] ",1517194451453," i ","16"," zhangsan "], [" mysql-bin.00001000000000000000090345",1517194455643," u ","15"," lisi "] ], wherein the first column is a pipeline id and functions to prevent repeated operations, and the composition rule is composed by adding an offset to a binlog name, for example, the first data pipeline id is composed of the binlog name mysql-bin.000010 and then 20 bits 0 is supplemented to mysql-bin.00001000000000000000000000, and then the unique offset 90222 is changed to mysql-bin.000010000000000000000000222, so as to ensure that the id of each operation is unique and not repeated, and when the consumer side processes, the operation is stored to a redis to prevent repeated operations; the second column is the operation timestamp; the third column is an operator, i represents insertion, u represents update, and d represents deletion; the latter columns are the detailed data of the operation.
The schema node internally comprises a fields array node, a tableName node and a namespace node. For example: "schemas" { "fields" [ ], "tableName": testTable "," namespace ": mysql.test" }, wherein tableName and namespace are table names and namespace identifications of heterogeneous data sources, fields are metadata information of operation data, "fields" [ { "isPk": false "," name ": id", "type": int (10) "}, {" isPk ": false", "name": name "," type ": varchar (20)") in which metadata of two pieces of data are contained, isPk identifies whether the field is a primary key of mysql, name denotes a field name of the field at mysql, and type denotes a type of the field in mysmyl.
In this embodiment, before consuming the converted data, in order to prevent data from being stacked, the step of sending the converted data to the kafka middleware is further included. Also included is a partition that overwrites kafka, hash-grouping the library and table names to ensure that operations on a table can only be within a partition and are ordered.
And consuming the data of the kafka middleware, and performing corresponding operation on the heterogeneous data source. The operations include, but are not limited to, incremental pruning.
The embodiment also discloses a system for synchronizing mysql data to a heterogeneous data source, which comprises a binlog data pulling module, wherein the binlog data is pulled from a mysql node by adopting canal through a master-slave copy protocol;
the log analysis module is used for carrying out log analysis on the binlog data to form analysis data;
the data preprocessing module is used for merging the analysis data to form merged data;
the data conversion module is used for converting the merged data into a standard format, and the outermost layer of the message is a tupe node and a schema node; the tupe node is an array structure, the concrete information of each binlog data is stored inside the tupe node, the tupe node comprises pipeline id, operation timestamp, an operator and detailed data of operation, the pipeline id composition rule is that the binlog name is added with an offset, and the schema node internally comprises a fields array node, a tableName node and a namespace node. Other descriptions and methods of the tupe node and the schema node are the same, and are not repeated herein.
The kafka middleware is used for consuming the converted data through the kafka middleware after receiving the converted data and synchronizing the converted data to a heterogeneous data source;
and the data consumption module is used for consuming the converted data and carrying out corresponding operation on the heterogeneous data source. The operations include, but are not limited to, incremental pruning.
The foregoing is merely an example of the present invention, and common general knowledge in the field of known specific structures and characteristics is not described herein in any greater extent than that known in the art at the filing date or prior to the priority date of the application, so that those skilled in the art can now appreciate that all of the above-described techniques in this field and have the ability to apply routine experimentation before this date can be combined with one or more of the present teachings to complete and implement the present invention, and that certain typical known structures or known methods do not pose any impediments to the implementation of the present invention by those skilled in the art. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.

Claims (8)

1. A method for synchronizing data to heterogeneous data sources based on mysql is characterized by comprising the following steps:
canal draws binlog data from the mysql node through a master-slave copy protocol;
performing log analysis on binlog data to form analysis data;
merging the analyzed data to form merged data;
converting the merged data into a standard format, wherein the outermost layer of the message is a tupe node and a schema node;
and consuming the converted data, and performing corresponding operation on the heterogeneous data source.
2. The mysql-based data synchronization method to heterogeneous data sources of claim 1, wherein the tupe node is an array structure, and internally stores the specific information of each binlog data, the tupe node comprises pipeline id, operation timestamp, operator and detail data of operation, and the pipeline id composition rule is adding offset by binlog name.
3. The method for mysql-based data synchronization to heterogeneous data sources of claim 2, wherein the schema node internally comprises a fields array node, a tableName node and a namespace node.
4. The method for mysql-based data synchronization to heterogeneous data sources as claimed in claim 1 or 3, further comprising sending the converted data to kafka middleware before consuming the converted data.
5. The method of claim 4, further comprising overwriting part of kafka, hash-grouping the library name and the table name based on mysql data synchronization to a heterogeneous data source.
6. A system based on mysql data synchronization to heterogeneous data sources is characterized by comprising a binlog data pulling module, wherein the binlog data is pulled from mysql nodes by adopting canal through a master-slave copy protocol;
the log analysis module is used for carrying out log analysis on the binlog data to form analysis data;
the data preprocessing module is used for merging the analysis data to form merged data;
the data conversion module is used for converting the merged data into a standard format, and the outermost layer of the message is a tupe node and a schema node;
and the data consumption module is used for consuming the converted data and carrying out corresponding operation on the heterogeneous data source.
7. The mysql data-based system for synchronizing to heterogeneous data sources according to claim 6, further comprising kafka middleware, configured to receive the converted data and then consume the converted data through the kafka middleware to synchronize to the heterogeneous data sources.
8. The mysql data-based system for synchronizing to heterogeneous data sources according to claim 7, wherein the tupe node is an array structure, and stores the specific information of each binlog data internally, the tupe node includes pipeline id, operation timestamp, operator and detail data of operation, the pipeline id composition rule is that the binlog name is added with an offset, and the schema node contains the fields array node, the tableName node and the namespace node internally.
CN202111154477.1A 2021-09-29 2021-09-29 System and method for synchronizing data to heterogeneous data source based on mysql Pending CN113821532A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111154477.1A CN113821532A (en) 2021-09-29 2021-09-29 System and method for synchronizing data to heterogeneous data source based on mysql

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111154477.1A CN113821532A (en) 2021-09-29 2021-09-29 System and method for synchronizing data to heterogeneous data source based on mysql

Publications (1)

Publication Number Publication Date
CN113821532A true CN113821532A (en) 2021-12-21

Family

ID=78915941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111154477.1A Pending CN113821532A (en) 2021-09-29 2021-09-29 System and method for synchronizing data to heterogeneous data source based on mysql

Country Status (1)

Country Link
CN (1) CN113821532A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341223A (en) * 2017-06-29 2017-11-10 华泰证券股份有限公司 A kind of heterogeneous database real-time synchronization method based on message-oriented middleware
CN110866022A (en) * 2019-10-24 2020-03-06 贝壳技术有限公司 Data analysis method, system and device based on log file
CN111666344A (en) * 2020-06-19 2020-09-15 中信银行股份有限公司 Heterogeneous data synchronization method and device
CN112182036A (en) * 2020-09-15 2021-01-05 中信银行股份有限公司 Data sending and writing method and device, electronic equipment and readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341223A (en) * 2017-06-29 2017-11-10 华泰证券股份有限公司 A kind of heterogeneous database real-time synchronization method based on message-oriented middleware
CN110866022A (en) * 2019-10-24 2020-03-06 贝壳技术有限公司 Data analysis method, system and device based on log file
CN111666344A (en) * 2020-06-19 2020-09-15 中信银行股份有限公司 Heterogeneous data synchronization method and device
CN112182036A (en) * 2020-09-15 2021-01-05 中信银行股份有限公司 Data sending and writing method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
WO2020147392A1 (en) Method and system for data synchronization between databases
CN110879813B (en) Binary log analysis-based MySQL database increment synchronization implementation method
DE69902749T2 (en) ENCAPSULATION, DATA DISPLAY AND TRANSMISSION OF CONTENT-ADDRESSABLE DATA
CN109800207B (en) Log analysis method, device and equipment and computer readable storage medium
US8103705B2 (en) System and method for storing text annotations with associated type information in a structured data store
EP3602341A1 (en) Data replication system
CN113986873B (en) Method for processing, storing and sharing data modeling of mass Internet of things
EP1662380A2 (en) Time stamp methods for unified plant model
JPH0934763A (en) Device and method for managing file
US20140289280A1 (en) System and Method for Bi-directional Conversion of Directed Acyclic Graphs and Inter-File Branching
CN106802905B (en) Collaborative data exchange method of isomorphic PLM system
CA2379930A1 (en) Multi-model access to data
CN109086382B (en) Data synchronization method, device, equipment and storage medium
CN114691704A (en) Metadata synchronization method based on MySQL binlog
CN111427964B (en) Running time stamp-oriented industrial cloud data storage model
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
CN102508834A (en) Message-based event application device and message-based event application method
CN113590639A (en) Data synchronization method between databases isolated by gatekeepers
CN110287251A (en) The high fault tolerant data real-time synchronization method of distribution of MongoDB to HBase
CN112835918A (en) MySQL database increment synchronization implementation method
CN114661823A (en) Data synchronization method and device, electronic equipment and readable storage medium
CN109491988B (en) Data real-time association method supporting full-scale updating
US7016906B1 (en) Data processing method and apparatus employing OLE DB and having dual schema and auto update features
CN113821532A (en) System and method for synchronizing data to heterogeneous data source based on mysql
US10061654B1 (en) Depth first search of summary change log records for backup

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211221

RJ01 Rejection of invention patent application after publication