CN112395290A

CN112395290A - Data synchronization realization method and system

Info

Publication number: CN112395290A
Application number: CN202011311286.7A
Authority: CN
Inventors: 周志文; 郭潇文; 纪向晴
Original assignee: Shenzhen Mapgoo Technology Co ltd
Current assignee: Shenzhen Mapgoo Technology Co ltd
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2021-02-23
Anticipated expiration: 2040-11-20
Also published as: CN112395290B

Abstract

The embodiment of the invention discloses a method and a system for realizing data synchronization, wherein the method comprises the following steps: acquiring all data of an original database, processing the data which do not meet preset conditions, and summarizing the data types of all the processed data; creating a unique logic index for all the data of the summarized original database; performing data synchronization processing according to a preset configuration rule; and storing the synchronized data to a target database. The embodiment of the invention can realize that a single service supports multiple data types, simultaneously supports simultaneous synchronization of a plurality of databases, and has the advantages of simple deployment, stable service, high performance, high concurrency, uninterrupted synchronization process and difficult data loss.

Description

Data synchronization realization method and system

Technical Field

The present invention relates to the field of data synchronization technologies, and in particular, to a method and a system for implementing data synchronization.

Background

A relational database, which is a database based on a relational database model, processes data in the database by using concepts and methods such as set algebra and is also organized into a set of tables with formal descriptive nature as a special collection of loaded data items, and the data in the tables can be accessed or recalled in many different ways without reorganizing the database tables. The definition of a relational database results in a table of metadata or in a formal description of tables, columns, ranges, and constraints. Each table (sometimes referred to as a relationship) contains one or more data categories represented by columns. Each row contains a unique data entity, which is a category defined by the columns.

Currently mainstream relational databases include mysql, sql server, oracle, capturing change data to synchronize data to database/data warehouse/kafka in real time or in full. And writing the data into a data warehouse in real time through a streaming calculation engine or directly writing the data into a database. The synchronization mode includes an incremental synchronization mode and a full synchronization mode. The incremental synchronization mode includes a binlog mode, a CDC (Change Data Capture), a rowversion mode, and a date/date time mode. The full-scale synchronization mode can be updated through data such as a main key, a unique index, a date, a shaping key and the like. The data synchronization mode supports the database to the database, and the database is synchronized to kafka.

However, in the prior art, the open source solution can be complex in scan, streammeters and debezium configuration, cannot simultaneously support mysql and sql server, automatically divide months, needs independent service for each data, is configured, is easy to cause memory overflow, interrupts the synchronization process, loses data and the like, and cannot synchronize a table without a main key and unique index.

The prior art is therefore still subject to further development.

Disclosure of Invention

In view of the above technical problems, embodiments of the present invention provide a method and a system for implementing data synchronization, which can solve the technical problems of complex configuration and easy memory overflow in the data synchronization process of the existing database.

A first aspect of an embodiment of the present invention provides a method for implementing data synchronization, including:

acquiring all data of an original database, processing the data which do not meet preset conditions, and summarizing the data types of all the processed data;

creating a unique logic index for all the data of the summarized original database;

performing data synchronization processing according to a preset configuration rule;

and storing the synchronized data to a target database.

Optionally, the acquiring all data of the original database, processing the data that does not satisfy the preset condition, and summarizing the data types of all the processed data includes:

the method comprises the steps of obtaining table structures corresponding to all data of an original database, processing the table structures which do not meet preset conditions, and summarizing all the processed table structures according to data types.

Optionally, the obtaining table structures corresponding to all data of the original database, processing the table structures that do not meet the preset condition, and summarizing all the processed table structures according to the data types includes:

acquiring table structures corresponding to all data of an original database, and acquiring data types and data field names which do not meet preset conditions in all the table structures;

and processing the data types and the data field names which do not meet the preset conditions, and summarizing all the processed table structures according to the data types.

Optionally, the storing the synchronized data to the target database includes:

and sending incremental data generated in the synchronization process to kafka, and writing the full data into a target database.

Optionally, the performing data synchronization processing according to a preset configuration rule includes:

and according to a preset configuration type, after the concurrency quantity is specified, carrying out data synchronization processing, wherein the configuration type is one of a main key, a unique logic index and a date.

A second aspect of the embodiments of the present invention provides a system for implementing data synchronization, where the system includes: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of:

and storing the synchronized data to a target database.

Optionally, the computer program when executed by the processor further implements the steps of:

A third aspect of the embodiments of the present invention provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by one or more processors, the computer-executable instructions may cause the one or more processors to perform the above-mentioned data synchronization implementation method.

According to the technical scheme provided by the embodiment of the invention, all data of an original database are acquired, the data which do not meet the preset condition are processed, and the data types of all the processed data are summarized; creating a unique logic index for all the data of the summarized original database; performing data synchronization processing according to a preset configuration rule; and storing the synchronized data to a target database. Compared with the prior art, the embodiment of the invention can realize that a single service supports multiple data types, simultaneously supports simultaneous synchronization of a plurality of databases, and has the advantages of simple deployment, stable service, high performance, high concurrency, uninterrupted synchronization process and difficult data loss.

Drawings

Fig. 1 is a schematic flow chart illustrating an embodiment of a method for implementing data synchronization according to an embodiment of the present invention;

fig. 2 is a schematic hardware structure diagram of another embodiment of a data synchronization implementation system according to the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The following detailed description of embodiments of the invention refers to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating an embodiment of a data synchronization implementation method according to an embodiment of the present invention. As shown in fig. 1, includes:

s100, acquiring all data of an original database, processing the data which do not meet preset conditions, and summarizing the data types of all the processed data;

s200, creating a unique logic index for all the summarized data of the original database;

step S300, carrying out data synchronization processing according to a preset configuration rule;

and S400, storing the synchronized data to a target database.

Specifically, all data of a database are obtained, after irregular data are processed, the data types of the processed data are summarized; creating a unique logical index for all the data of the summarized database; synchronizing data in the database according to preset configuration rules; and storing the synchronized data.

Synchronous data is started by designating the concurrent number according to a main key/a unique index/date (the main key, the unique index and the date are not in parallel relation and can exist at the same time, and a query _ key is set in a configuration management table (mgd _ catalog _ table _ info), so that the data types of the main key, the unique index and the date are kept as the data types of an original table);

the incremental sync sends 15w/s messages per second to kafka.

Further, acquiring all data of the original database, processing the data which do not meet the preset conditions, and summarizing the data types of all the processed data, wherein the data types of all the processed data comprise:

Specifically, all table structures of the database are obtained, and after irregular table structures are processed, data types of the processed table structures are summarized.

Further, acquiring table structures corresponding to all data of the original database, processing the table structures which do not meet preset conditions, and summarizing all the processed table structures according to data types, wherein the table structures comprise:

Specifically, all table structures of mysql and sql server are obtained. Treat irregular types, field names, and generalize data types. The data types include, but are not limited to, varchar, int, bit, numeric, datetime, smallnt, tinyint, nvarchar, image, money, nchar, bigit, varbinary, time, decimal, float, char, date, text, smalldatemime, binary, enum, double, mediumint, json, timetag, mediumtext.

Incompatible types include, but are not limited to nvarchar, meaning reshape, string, floating point number, type not unambiguous; money, decimal should be used; smalldatetime, time, non-compliant date format; bit, varbinary, cannot resolve the binary.

The irregular designations and types are as follows: E-MAIL validates the fields by setting alias names; the same meaning of a plurality of names and types, namely an isDeleted bit type, an ISDelete int type, an ISDelete varchar type, and a unified name of the isDeleted unified type, namely tinyint; illegal value is "1" (10 spaces) and the value is unified as shaping.

The logic for data type induction is as follows:

WHEN DATA_TYPE IN('varchar'，'text'，'char'，'nchar'，'nvarchar'，'mediumtext'，'json')THEN'string'；

WHEN DATA_TYPE IN('image'，'varbinary'，'binary')THEN'binary'；

WHEN DATA_TYPE IN('int'，'mediumint')THEN'int'；

WHEN DATA_TYPE IN('smallint'，'tinyint')THEN'smallint'；

WHEN DATA_TYPE IN('money'，'numeric')AND(numeric_precision>0AND numeric_precision<12)AND numeric_scale＝0THEN'int'；

WHEN DATA_TYPE IN('money'，'numeric')AND numeric_precision>11AND numeric_scale＝0THEN'bigint'；

WHEN DATA_TYPE IN('money'，'numeric')AND numeric_scale>0THEN'decimal'；

WHEN DATA_TYPE＝'bit'THEN'smallint'；

WHEN DATA_TYPE IN('money'，'numeric')THEN'decimal'；

WHEN DATA_TYPE IN('smalldatetime'，'date')THEN'datetime'；

WHEN DATA_TYPE＝'time'THEN'string')。

further, storing the synchronized data to a target database includes:

In specific implementation, the synchronization process is as follows: a pending _ column, a column _ type, a datetime/dateeg

Unspecified offset default acquisition of a natural day

SELECT*FROM table_name WHERE created_at>＝'2020-09-0100:00:00'AND created_at<'2020-09-02 00:00:00'

Given an offset of 3, each batch was acquired for 3 days

SELECT*FROM table_name WHERE created_at>＝'2020-09-0100:00:00'AND created_at<'2020-09-04 00:00:00'

Given from '2020-09-0200: 00:00', offset 7 is shifted from 2020-09-0100: 00:00 by offset 7 (per batch) size

SELECT*FROM table_name WHERE created_at>＝'2020-09-0200:00:00'AND created_at<'2020-09-08 00:00:00'

2) Part _ column id, column _ type int, offset 50000 (per batch)

SELECT*FROM table_name WHERE id>0LIMIT 50000

3) Name, column type and offset 50000 (batch)

SELECT*FROM table_name WHERE name IN(<size of offset>)。

Full 3 partitions 3 hundred million data per partition 5 threads are synchronized for 2 hours.

Further, the data synchronization processing is performed according to a preset configuration rule, and the data synchronization processing method includes:

In specific implementation, all the service logic tables of the target database are obtained as follows:

the target database is mssql with the logic as follows:

the target database is mysql with the logic as follows:

and acquiring the table structure of the logic table according to the acquired logic table, and processing irregular names, types and values.

All table structures are saved in the configuration management table

CREATE TABLE`mgd_catalog_table_info`(

`id`int(11)NOT NULL AUTO_INCREMENT，

'cronfa _ id' int (11) DEFAULT NULL COMMENT 'database id',

table _ name ' varchar (255) NOT NULL copolymer ' table name/kafka topic name ',

the "presql" varchar (255) DEFAULT NULL COMMENT 'table configuration does not satisfy the requirement of preferentially ignoring all other table settings with the batch query statement',

whether or not "enable" tinyint (2) DEFAULT '0' COMMENT 'allows synchronization 1 to pass through database 2 and not allow synchronization through kafka 0',

table _ COMMENT (255) DEFAULT NULL COMMENT 'table name annotation',

whether is _ deleted 'tinyint (2) DEFAULT'0'COMMENT' is deleted,

"index _ columns" varchar (255) DEFAULT NULL common 'primary _ key empty or single primary key update the field' do not operate for up to upk rows that uniquely set the field single or combined primary key name,

'prerequis' varchar (255) DEFAULT NULL command 'format conversion, alias setting is used mainly for sql statements',

"dictionary _ include ' varchar (255) DEFAULT NULL COMMENT ' sets the carbondata high radix column ',

"no _ inverted _ index ' varchar (255) DEFAULT NULL COMMENT ' sets the carbondata low radix column ',

'sort columns' varchar (255) DEFAULT NULL COMMENT 'sets the carbon index field',

"Long _ string _ columns" varchar (255) DEFAULT NULL COMMENT 'over 32kb field',

"num _ partitions" minor (6) DEFAULT '1' COMMENT 'sets the number of partitions' of the carousel,

"partition _ col" varchar (255) CHARACTER SET utf8 COLLATE utf8_ general _ ci DEFAULT NULL COMMENT 'when primary _ key is not the only primary key by specifying multiple field combinations the order index _ cols > partition _ col > primary _ key',

primary _ key ' varchar (255) CHARACTER SET utf8 COLLATE utf8_ general _ ci DEFAULT NULL COMMENT ' as the full-scale synchronization partition field, the full-scale synchronization does not support the existence of multiple primary _ key full-scale synchronization using partition _ col fields ' in multiple primary _ key,

the field sync DEFAULT of query _ key ' varchar (255) DEFAULT NULL copolymer ' sql server DEFAULT rowversion other needs to specify field sync ',

'topic _ name' varchar (255) DEFAULT NULL COMMENT 'kafka Default use table _ name, distinguish car lender and McUnion synonym table',

PRIMARY KEY(`id`)，

UNIQUE KEY`IDX_CRONFAID_TABLE_NAME`(`cronfa_id`，`table_name`)USING BTREE

)ENGINE＝InnoDB AUTO_INCREMENT＝871DEFAULT CHARSET＝utf8；

CREATE TABLE`mgd_column_info`(

the library 'to which the "" cronfa _ id "" int (11) DEFAULT NULL COMMENT' table belongs,

`catalog_table_id`int(11)NOT NULL，

"origin _ table _ name ' varchar (120) NOT NULL COMMENT ' original table name ',

"table _ name" varchar (120) DEFAULT "COMMENT' table name",

"origin _ column _ name ' varchar (120) NOT NULL COMMENT ' original field name ',

"column _ name" varchar (120) DEFAULT "COMMENT' field name",

ordering of the "edit _ position" tinyint (6) signaled NOT NULL DEFAULT '0' COMMENT ' field,

"column _ DEFAULT ' varchar (255) DEFAULT NULL COMMENT ' field DEFAULT ',

whether the is _ nullable 'char (5) DEFAULT NULL command' field may be NULL,

"data _ type" varchar (64) DEFAULT "COMMENT 'data type',

the maximum number of characters 'of the' character _ maximum _ length 'region (19) DEFAULT NULL COMMENT' field,

the maximum number of bytes ' of the "character _ octet _ length ' region (19) DEFAULT NULL COMMENT ' field,

"numerical _ precision" int (11) signaled DEFAULT NULL command 'numerical precision',

"numerical _ scale" int (11) assigned DEFAULT NULL COMMENT 'decimal',

"column _ key" varchar (60) DEFAULT "COMMENT 'index type, PRI, for primary key, UNI, for unique key, MUL, repeatable',

"extra ' varchar (30) DEFAULT" COMMENT ' other information ',

"column _ COMMENT" varchar (255) DEFAULT "COMMENT 'field COMMENT',

the formula of the generation _ expression 'text COMMENT' combination field,

whether the is _ deleted ' tinyint (2) DEFAULT '0' COMMENT ' deletes the 1-delete ',

"technology _ tags" varchar (255) DEFAULT NULL COMMENT' technical tag ",

"business _ tags" varchar (255) DEFAULT NULL COMMENT 'service tag',

`carbon_type`varchar(120)NOT NULL，

"UPDATE _ at" data time DEFAULT NULL ON UPDATE CURRENT _ TIMESTAMP COMMENT 'UPDATE time',

"create _ at ' data time DEFAULT NULL ON UPDATE CURRENT _ TIMESTAMP COMMENT ' creation time ',

'outside' tinyint (2) NOT NULL DEFAULT '1' COMMENT 'sets to 0' by DEFAULT excluding all fields that need to be synchronized,

UNIQUE KEY`IDX_CRONFAID_CATALOGTABLEID_ORIGINCOLUMNNAME`(`cronfa_id`，`catalog_table_id`，`origin_column_name`)USING BTREE

)ENGINE＝InnoDB DEFAULT CHARSET＝utf8；

whether synchronization 0 is started, namely synchronization is forbidden, and synchronization 1 is allowed is controlled through a configuration management table mgd _ catalog _ table _ info enable field;

the way synchronization is initiated is as follows:

when the database is mysql, the synchronization mode is as follows:

#mysql binlog

nohup./binlog-log_dir＝./logs-alsologtostderr＝false-stderrthreshold＝ERROR&

when the database is mssql, the synchronization mode is as follows:

#mssql cdc

go build-o cdc-tags static main.go

nohup./cdc-log_dir＝./logs-alsologtostderr＝false-stderrthreshold＝ERROR-keyspace SIMDB&)。

the embodiment of the method can know that the embodiment of the invention discloses a method and a system for realizing data synchronization, a single service supports multiple data types and simultaneously supports simultaneous synchronization of multiple databases; the solution is that no main key is provided, no unique data synchronization is provided, the synchronization process is not interrupted, and no data is lost; high performance, high concurrency; the deployment is simple, and the service is stable.

It should be noted that, a certain order does not necessarily exist between the above steps, and those skilled in the art can understand, according to the description of the embodiments of the present invention, that in different embodiments, the above steps may have different execution orders, that is, may be executed in parallel, may also be executed interchangeably, and the like.

The above describes the data synchronization implementation method in the embodiment of the present invention, and the following describes the data synchronization implementation system in the embodiment of the present invention, please refer to fig. 2, fig. 2 is a schematic diagram of a hardware structure of another embodiment of a data synchronization implementation system in the embodiment of the present invention, and as shown in fig. 2, the system 10 includes: a memory 101, a processor 102 and a computer program stored on the memory and executable on the processor, the computer program realizing the following steps when executed by the processor 101:

and storing the synchronized data to a target database.

The specific implementation steps are the same as those of the method embodiments, and are not described herein again.

Optionally, the computer program when executed by the processor 101 further implements the steps of:

The specific implementation steps are the same as those of the method embodiments, and are not described herein again. A

Embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer-executable instructions for execution by one or more processors, e.g., to perform method steps S100-S400 of fig. 1 described above.

By way of example, non-volatile storage media can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory components or memory of the operating environment described herein are intended to comprise one or more of these and/or any other suitable types of memory.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A data synchronization implementation method is characterized by comprising the following steps:

and storing the synchronized data to a target database.

2. The method for implementing data synchronization according to claim 1, wherein the acquiring all data of the original database, processing the data that does not satisfy the preset condition, and summarizing the data types of all the processed data comprises:

3. The method according to claim 2, wherein the obtaining of the table structures corresponding to all the data in the original database, the processing of the table structures that do not satisfy the preset condition, and the induction of all the processed table structures according to the data types include:

4. The method for implementing data synchronization according to claim 3, wherein the saving the synchronized data to the target database includes:

5. The method according to claim 4, wherein the performing data synchronization processing according to the preset configuration rule includes:

6. A system for implementing data synchronization, the system comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of:

and storing the synchronized data to a target database.

7. The system of claim 6, wherein the computer program when executed by the processor further performs the steps of:

8. The system of claim 7, wherein the computer program when executed by the processor further performs the steps of:

9. The system of claim 8, wherein the computer program when executed by the processor further performs the steps of:

10. A non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the data synchronization implementation method of any one of claims 1-5.