CN117171132A - Data synchronization method, device and medium - Google Patents

Data synchronization method, device and medium Download PDF

Info

Publication number
CN117171132A
CN117171132A CN202311125525.3A CN202311125525A CN117171132A CN 117171132 A CN117171132 A CN 117171132A CN 202311125525 A CN202311125525 A CN 202311125525A CN 117171132 A CN117171132 A CN 117171132A
Authority
CN
China
Prior art keywords
source
data
information
target
synchronization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311125525.3A
Other languages
Chinese (zh)
Inventor
张德亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202311125525.3A priority Critical patent/CN117171132A/en
Publication of CN117171132A publication Critical patent/CN117171132A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种数据同步方法、装置及介质,涉及计算机技术领域,用于解决不停机迁移数据的问题,所述方法包括:通过同步信息表关联配置待进行数据同步的源端信息和目标端信息;根据同步信息表中待进行数据同步的源端信息配置binlog解析组件,基于binlog解析组件解析源端的binlog以获取源端的数据变化,将数据变化存储在中间设备;根据同步信息表中与待进行数据同步的源端信息关联的待进行数据同步的目标端信息,启动消费程序将中间设备存储的数据变化同步至目标端。本发明可制定从源端到目标端高效的数据同步策略,通过中间设备根据数据同步策略获取、存储并同步数据变化,可实现从源端到目标端不停机且灵活可控的迁移数据。

The present invention provides a data synchronization method, device and medium, which relates to the field of computer technology and is used to solve the problem of migrating data without stopping the machine. The method includes: correlating and configuring the source end information and target end to be data synchronized through a synchronization information table. Information; configure the binlog parsing component according to the source-side information to be synchronized in the synchronization information table, parse the binlog of the source-side based on the binlog parsing component to obtain the data changes of the source-side, and store the data changes in the intermediate device; configure the binlog parsing component according to the synchronization information table and the data to be synchronized. The source information for data synchronization is associated with the target information to be synchronized, and the consumer program is started to synchronize the data changes stored in the intermediate device to the target. The invention can formulate an efficient data synchronization strategy from the source end to the target end, obtain, store and synchronize data changes according to the data synchronization strategy through the intermediate device, and can realize non-stop, flexible and controllable data migration from the source end to the target end.

Description

数据同步方法、装置及介质Data synchronization method, device and medium

技术领域Technical field

本发明至少涉及计算机技术领域,尤其涉及一种数据同步方法、用于数据同步的中间设备以及计算机可读存储介质。The present invention relates at least to the field of computer technology, and in particular to a data synchronization method, an intermediate device for data synchronization and a computer-readable storage medium.

背景技术Background technique

在传统数据迁移过程中,迁移数据时需要停止向源端写入数据,以保证数据的一致性,这会导致数据迁移期间业务暂停,对业务的服务性能和用户体验产生影响。In the traditional data migration process, it is necessary to stop writing data to the source when migrating data to ensure data consistency. This will cause the business to be suspended during the data migration, affecting the service performance and user experience of the business.

因此,如何实现不停机迁移数据,保证业务的服务性能和用户体验,是本领域亟待解决的技术问题。Therefore, how to migrate data without downtime and ensure business service performance and user experience is an urgent technical issue in this field that needs to be solved.

发明内容Contents of the invention

本发明所要解决的技术问题是,提供一种数据同步方法、用于数据同步的中间设备以及计算机可读存储介质,以解决如何实现不停机迁移数据,保证业务的服务性能和用户体验的问题。The technical problem to be solved by the present invention is to provide a data synchronization method, an intermediate device for data synchronization and a computer-readable storage medium to solve the problem of how to migrate data without downtime and ensure business service performance and user experience.

第一方面,本发明提供一种数据同步方法,应用于连接源端与目标端的中间设备,且包括:In a first aspect, the present invention provides a data synchronization method, which is applied to an intermediate device connecting a source end and a target end, and includes:

通过同步信息表关联配置待进行数据同步的源端信息和目标端信息;Configure the source information and target information to be synchronized through the synchronization information table;

根据同步信息表中待进行数据同步的源端信息配置binlog解析组件,基于binlog解析组件解析源端的binlog以获取源端的数据变化,将数据变化存储在中间设备;Configure the binlog parsing component according to the source-side information to be synchronized in the synchronization information table, parse the binlog of the source-side based on the binlog parsing component to obtain the data changes of the source-side, and store the data changes in the intermediate device;

根据同步信息表中与待进行数据同步的源端信息关联的待进行数据同步的目标端信息,启动消费程序将中间设备存储的数据变化同步至目标端;According to the target end information to be synchronized that is associated with the source end information to be synchronized in the synchronization information table, start the consumer program to synchronize the data changes stored in the intermediate device to the target end;

其中,binlog是二进制日志文件。Among them, binlog is a binary log file.

进一步地,通过同步信息表关联配置待进行数据同步的源端信息和目标端信息,具体包括:Further, the source information and target information to be synchronized are configured through the synchronization information table, specifically including:

设置源端信息表、目标端信息表和同步信息表;Set the source information table, target information table and synchronization information table;

在源端信息表中写入多个源端数据库标识,并与每个源端数据库标识关联地设置存储BinlogTopic的第一存储位置、存储源端数据表的名称和主键的第二存储位置;Write multiple source database identifiers in the source information table, and set a first storage location for storing BinlogTopic and a second storage location for storing the name and primary key of the source data table in association with each source database identifier;

在目标端信息表中写入多个目标端地址,并与每个目标端地址关联地设置存储目标端分区信息的第三存储位置;writing multiple target end addresses in the target end information table, and setting a third storage location for storing target end partition information in association with each target end address;

在同步信息表中关联地写入待进行数据同步的源端数据库标识和待将源端数据同步至的目标端地址;Write the source database identifier to be synchronized and the target address to be synchronized to in the synchronization information table;

其中,BinlogTopic是binlog解析组件的唯一标识。Among them, BinlogTopic is the unique identifier of the binlog parsing component.

进一步地,根据同步信息表中待进行数据同步的源端信息配置binlog解析组件,基于binlog解析组件解析源端的binlog以获取源端的数据变化,将数据变化存储在中间设备,具体包括:Further, configure the binlog parsing component according to the source-side information to be synchronized in the synchronization information table, parse the binlog of the source-side based on the binlog parsing component to obtain the data changes of the source-side, and store the data changes in the intermediate device, including:

根据待进行数据同步的源端数据库标识配置binlog解析组件,生成binlog解析组件的BinlogTopic,将BinlogTopic写入第一存储位置,在中间设备基于BinlogTopic设置存储源端的数据变化的第四存储位置;Configure the binlog parsing component according to the source database identifier to be synchronized, generate the BinlogTopic of the binlog parsing component, write the BinlogTopic into the first storage location, and set a fourth storage location for storing source data changes based on the BinlogTopic in the intermediate device;

基于binlog解析组件解析源端的binlog以获取源端的数据变化,根据数据变化获取源端数据表的名称和主键,将源端数据表的名称和主键写入第二存储位置,将数据变化写入第四存储位置。Based on the binlog parsing component, the binlog of the source is parsed to obtain the data changes of the source, the name and primary key of the source data table are obtained according to the data changes, the name and primary key of the source data table are written to the second storage location, and the data changes are written to the second storage location. Four storage locations.

进一步地,根据同步信息表中与待进行数据同步的源端信息关联的待进行数据同步的目标端信息,启动消费程序将中间设备存储的数据变化同步至目标端,具体包括:Further, based on the target end information to be synchronized in the synchronization information table associated with the source end information to be synchronized, the consumption program is started to synchronize the data changes stored in the intermediate device to the target end, specifically including:

根据待进行数据同步的源端数据库标识获取BinlogTopic和源端数据表的名称和主键,根据源端数据表的名称和主键以及待将源端数据同步至的目标端地址在目标端设置分区,将目标端分区信息写入第三存储位置;Obtain the BinlogTopic and the name and primary key of the source data table based on the source database identifier to be synchronized, and set the partition on the target based on the name and primary key of the source data table and the target address to which the source data is to be synchronized. The target partition information is written to the third storage location;

启动消费程序根据BinlogTopic获取中间设备存储的数据变化,根据与待进行数据同步的源端信息关联的待将源端数据同步至的目标端地址获取目标端分区信息,根据目标端地址和目标端分区信息将获取的数据变化同步至目标端。Start the consumer program to obtain the data changes stored in the intermediate device based on the BinlogTopic, obtain the target partition information based on the target address to be synchronized to which the source data is to be synchronized, which is associated with the source information to be synchronized, and obtain the target partition information based on the target address and target partition. The information synchronizes the acquired data changes to the target.

进一步地,所述源端的形式具体为MYSQL数据库,所述目标端的形式具体为MYSQL数据库、KAFKA集群、MQ集群、ES集群至少之一;Further, the form of the source end is specifically a MYSQL database, and the form of the target end is specifically at least one of a MYSQL database, a KAFKA cluster, an MQ cluster, and an ES cluster;

其中,MYSQL是关系型数据库管理系统,KAFKA是开源流处理平台,MQ是消息队列,ES是搜索服务器。Among them, MYSQL is a relational database management system, KAFKA is an open source stream processing platform, MQ is a message queue, and ES is a search server.

进一步地,所述消费程序的形式具体为虚机、物理机、容器之一。Further, the form of the consumption program is specifically one of a virtual machine, a physical machine, and a container.

进一步地,所述方法还包括:Further, the method also includes:

在同步信息表中,关联待进行数据同步的源端信息配置将数据变化存储在中间设备的保存期限,和关联待进行数据同步的目标端信息配置消费程序的启动时间,所述启动时间在所述保存期限之内,和/或关联待进行数据同步的目标端信息配置目标端过滤字段。In the synchronization information table, the source information to be synchronized is associated with the storage period of the data changes stored in the intermediate device, and the target information to be synchronized is associated with the start time of the consumer program. The start time is at the Within the above retention period, and/or configure the target filter field associated with the target information to be synchronized.

进一步地,所述方法还包括:Further, the method also includes:

响应于监控到中间设备性能不足、binlog解析组件无法完成解析操作、或消费程序无法完成同步操作,发出告警。An alarm is issued in response to monitoring of insufficient performance of the intermediate device, the inability of the binlog parsing component to complete the parsing operation, or the inability of the consumer program to complete the synchronization operation.

第二方面,本发明提供一种用于数据同步的中间设备,所述中间设备连接源端与目标端,且包括:In a second aspect, the present invention provides an intermediate device for data synchronization. The intermediate device connects the source end and the target end and includes:

配置模块,用于通过同步信息表关联配置待进行数据同步的源端信息和目标端信息;The configuration module is used to configure the source information and target information to be synchronized through the synchronization information table;

解析模块,与配置模块连接,用于根据同步信息表中待进行数据同步的源端信息配置binlog解析组件,基于binlog解析组件解析源端的binlog以获取源端的数据变化,将数据变化存储在中间设备;The parsing module, connected to the configuration module, is used to configure the binlog parsing component according to the source-side information to be synchronized in the synchronization information table. Based on the binlog parsing component, the binlog of the source-side is parsed to obtain the data changes of the source-side, and the data changes are stored in the intermediate device. ;

同步模块,与解析模块连接,用于根据同步信息表中与待进行数据同步的源端信息关联的待进行数据同步的目标端信息,启动消费程序将中间设备存储的数据变化同步至目标端;The synchronization module is connected to the parsing module and is used to start the consumer program to synchronize the data changes stored in the intermediate device to the target based on the target end information to be synchronized in the synchronization information table associated with the source end information to be synchronized;

其中,binlog是二进制日志文件。Among them, binlog is a binary log file.

第三方面,本发明提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序被处理器运行时,实现如上所述的数据同步方法。In a third aspect, the present invention provides a computer-readable storage medium. A computer program is stored in the computer-readable storage medium. When the computer program is run by a processor, the above-mentioned data synchronization method is implemented.

本发明提供一种数据同步方法、用于数据同步的中间设备以及计算机可读存储介质,通过同步信息表的关联配置,使得中间设备及时获取待进行数据同步的源端信息和目标端信息,然后采用binlog解析组件实时获取源端的数据变化,并将数据变化存储在中间设备,再通过消费程序将中间设备存储的数据变化同步至目标端,通过同步信息表可以实现对源端和目标端灵活的关联配置,从而可制定从源端到目标端高效的数据同步策略,通过中间设备根据数据同步策略获取、存储并同步数据变化,可实现从源端到目标端不停机且灵活可控的迁移数据。The present invention provides a data synchronization method, an intermediate device for data synchronization, and a computer-readable storage medium. Through the associated configuration of the synchronization information table, the intermediate device can obtain source information and target information to be synchronized in a timely manner, and then The binlog parsing component is used to obtain the data changes at the source end in real time, and the data changes are stored in the intermediate device. The data changes stored in the intermediate device are then synchronized to the target end through the consumer program. Flexible control of the source end and the target end can be achieved by synchronizing the information table. Associated configuration allows an efficient data synchronization strategy from the source to the target to be developed. The intermediate device obtains, stores and synchronizes data changes according to the data synchronization strategy, enabling non-stop, flexible and controllable data migration from the source to the target. .

附图说明Description of drawings

图1是本发明实施例的一种数据同步系统的结构示意图;Figure 1 is a schematic structural diagram of a data synchronization system according to an embodiment of the present invention;

图2是本发明实施例的一种数据同步方法的流程图;Figure 2 is a flow chart of a data synchronization method according to an embodiment of the present invention;

图3是本发明实施例的一种用于数据同步的中间设备的结构示意图;Figure 3 is a schematic structural diagram of an intermediate device for data synchronization according to an embodiment of the present invention;

图4是本发明实施例的另一种数据同步系统的结构示意图。Figure 4 is a schematic structural diagram of another data synchronization system according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本领域技术人员更好地理解本发明的技术方案,下面将结合附图对本发明实施方式作进一步地详细描述。In order to enable those skilled in the art to better understand the technical solutions of the present invention, the embodiments of the present invention will be described in further detail below in conjunction with the accompanying drawings.

可以理解的是,此处描述的具体实施例和附图仅仅用于解释本发明,而非对本发明的限定。It can be understood that the specific embodiments and drawings described here are only used to explain the present invention, but not to limit the present invention.

可以理解的是,在不冲突的情况下,本发明中的各实施例及实施例中的各特征可相互组合。It can be understood that, without conflict, the embodiments and features in the embodiments of the present invention can be combined with each other.

可以理解的是,为便于描述,本发明的附图中仅示出了与本发明相关的部分,而与本发明无关的部分未在附图中示出。It can be understood that, for convenience of description, only the parts related to the present invention are shown in the drawings of the present invention, and the parts irrelevant to the present invention are not shown in the drawings.

可以理解的是,本发明的实施例中所涉及的每个单元、模块可仅对应一个实体结构,也可由多个实体结构组成,或者,多个单元、模块也可集成为一个实体结构。It can be understood that each unit and module involved in the embodiments of the present invention may correspond to only one entity structure, or may be composed of multiple entity structures, or multiple units and modules may be integrated into one entity structure.

可以理解的是,在不冲突的情况下,本发明的流程图和框图中所标注的功能、步骤可根据不同于附图中所标注的顺序发生。It can be understood that, provided there is no conflict, the functions and steps marked in the flowcharts and block diagrams of the present invention may occur in a sequence different from that marked in the drawings.

可以理解的是,本发明的流程图和框图中,示出了根据本发明各实施例的系统、装置、设备、方法的可能实现的体系架构、功能和操作。其中,流程图或框图中的每个方框可代表一个单元、模块、程序段、代码,其包含用于实现规定的功能的可执行指令。而且,框图和流程图中的每个方框或方框的组合,可用实现规定的功能的基于硬件的系统实现,也可用硬件与计算机指令的组合来实现。It can be understood that the flowcharts and block diagrams of the present invention illustrate the possible implementation architecture, functions and operations of systems, devices, equipment, and methods according to various embodiments of the present invention. Each box in the flow chart or block diagram may represent a unit, module, program segment, or code, which contains executable instructions for realizing the specified function. Furthermore, each block or combination of blocks in the block diagrams and flowchart illustrations may be implemented by a hardware-based system that performs the specified functions, or by a combination of hardware and computer instructions.

可以理解的是,本发明实施例中所涉及的单元、模块可通过软件的方式实现,也可通过硬件的方式来实现,例如单元、模块可位于处理器中。It can be understood that the units and modules involved in the embodiments of the present invention can be implemented in software or hardware. For example, the units and modules can be located in a processor.

实施例1:Example 1:

本发明提供一种数据同步方法,应用于如图1所示连接源端1与目标端3的中间设备2,且包括如图2所示的步骤:The present invention provides a data synchronization method, which is applied to the intermediate device 2 connecting the source end 1 and the target end 3 as shown in Figure 1, and includes the steps shown in Figure 2:

S21、通过同步信息表关联配置待进行数据同步的源端信息和目标端信息;S21. Configure the source information and target information to be synchronized through the synchronization information table;

S22、根据同步信息表中待进行数据同步的源端信息配置binlog解析组件,基于binlog解析组件解析源端的binlog以获取源端的数据变化,将数据变化存储在中间设备;S22. Configure the binlog parsing component according to the source-side information to be synchronized in the synchronization information table, parse the binlog of the source-side based on the binlog parsing component to obtain the data changes of the source-side, and store the data changes in the intermediate device;

S23、根据同步信息表中与待进行数据同步的源端信息关联的待进行数据同步的目标端信息,启动消费程序将中间设备存储的数据变化同步至目标端;S23. According to the target terminal information to be synchronized in the synchronization information table associated with the source terminal information to be synchronized, start the consumption program to synchronize the data changes stored in the intermediate device to the target terminal;

其中,binlog是二进制日志文件。Among them, binlog is a binary log file.

具体而言,在本实施例中,首先组建如图1所示的数据同步系统,在源端和目标端之间加入中间设备,中间设备的组成结构如图3所示,包括用于实现步骤S21的配置模块21、用于实现步骤S22的解析模块22和用于实现步骤S23的同步模块23,通过配置模块21进行同步信息表的关联配置,使得中间设备及时获取待进行数据同步的源端信息和目标端信息,然后由解析模块22采用binlog解析组件实时获取源端的数据变化,并将数据变化存储在中间设备,再通过消费程序将中间设备存储的数据变化同步至目标端,通过同步信息表可以实现对源端和目标端灵活的关联配置,从而可制定从源端到目标端高效的数据同步策略,通过中间设备根据数据同步策略获取、存储并同步数据变化,可实现不停机且灵活可控的迁移数据。另外,本实施例中间设备存储的是源端的数据变化,不存储全量数据,数据量小,实时同步数据速度快,采用中间设备的主机存储数据变化,容量大,数据保存可靠性高,通过表关联信息配置可同时设置多个目标端进行同步。Specifically, in this embodiment, a data synchronization system as shown in Figure 1 is first established, and an intermediate device is added between the source end and the target end. The structure of the intermediate device is as shown in Figure 3, including steps for implementing The configuration module 21 of S21, the parsing module 22 for implementing step S22, and the synchronization module 23 for implementing step S23 perform associated configuration of the synchronization information table through the configuration module 21, so that the intermediate device can obtain the source end to be synchronized data in a timely manner. information and target end information, and then the parsing module 22 uses the binlog parsing component to obtain the data changes of the source end in real time, and stores the data changes in the intermediate device, and then synchronizes the data changes stored in the intermediate device to the target end through the consumer program, and synchronizes the information The table can realize flexible association configuration between the source and the target, so that an efficient data synchronization strategy from the source to the target can be formulated. The intermediate device can obtain, store and synchronize data changes according to the data synchronization strategy, which can achieve non-stop and flexible Controlled data migration. In addition, the intermediate device in this embodiment stores data changes at the source end and does not store the full amount of data. The data volume is small and the real-time data synchronization speed is fast. The host using the intermediate device stores data changes, has a large capacity, and has high data storage reliability. Through the table The associated information configuration can set multiple targets for synchronization at the same time.

进一步地,通过同步信息表关联配置待进行数据同步的源端信息和目标端信息,具体包括:Further, the source information and target information to be synchronized are configured through the synchronization information table, specifically including:

设置源端信息表、目标端信息表和同步信息表;Set the source information table, target information table and synchronization information table;

在源端信息表中写入多个源端数据库标识,并与每个源端数据库标识关联地设置存储BinlogTopic的第一存储位置、存储源端数据表的名称和主键的第二存储位置;Write multiple source database identifiers in the source information table, and set a first storage location for storing BinlogTopic and a second storage location for storing the name and primary key of the source data table in association with each source database identifier;

在目标端信息表中写入多个目标端地址,并与每个目标端地址关联地设置存储目标端分区信息的第三存储位置;writing multiple target end addresses in the target end information table, and setting a third storage location for storing target end partition information in association with each target end address;

在同步信息表中关联地写入待进行数据同步的源端数据库标识和待将源端数据同步至的目标端地址;Write the source database identifier to be synchronized and the target address to be synchronized to in the synchronization information table;

其中,BinlogTopic是binlog解析组件的唯一标识。Among them, BinlogTopic is the unique identifier of the binlog parsing component.

具体而言,在本实施例中,数据同步系统具体还可以表示为图4所示的组成结构,源端具体为源端数据库,中间设备还可以表示为包括binlog解析组件、本地存储模块和消费程序集群,目标端具体为目标端集群,采用中间设备的binlog解析组件监听源端的binlog,并根据binlog实时获取源端的数据变化,将数据变化存储在本地存储模块,再从中间设备通过消费程序集群将源端的数据变化迁移至目标端,本地存储模块中还存储有控制实现上述数据迁移过程的表关联策略,表关联策略包括源端信息表、目标端信息表和同步信息表三张表,具体可通过同步信息表实现待进行数据同步的源端信息和目标端信息的关联配置,通过同步信息表获得业务侧提供的需要进行数据同步的源端和目标端,通过表关联策略分别从源端信息表和目标端信息表中获得源端信息和目标端信息,源端信息表和目标端信息表可以针对接入系统的多个源端和目标端进行预设,首先记录基本的源端信息和目标端信息,在接收到业务侧具体的数据同步任务后,在同步信息表中记录数据同步任务的要求,记录多组源端和目标端的关联关系,以及多种源端到目标端的同步要求。Specifically, in this embodiment, the data synchronization system can also be expressed as the composition structure shown in Figure 4. The source end is specifically the source end database, and the intermediate device can also be expressed as including a binlog parsing component, a local storage module and a consumption module. Program cluster, the target end is specifically the target end cluster, uses the binlog parsing component of the intermediate device to monitor the binlog of the source end, and obtains the data changes of the source end according to the binlog in real time, stores the data changes in the local storage module, and then consumes the program cluster from the intermediate device Migrate the data changes at the source end to the target end. The local storage module also stores the table association strategy that controls the above data migration process. The table association strategy includes three tables: the source end information table, the target end information table and the synchronization information table. Specifically The association configuration of the source information and target information to be synchronized can be realized through the synchronization information table. The source and target information provided by the business side that need to be synchronized can be obtained through the synchronization information table. The source end and target end need to be synchronized through the table association strategy. Obtain source information and target information from the information table and target information table. The source information table and target information table can be preset for multiple sources and targets connected to the system. First, record the basic source information. and target end information. After receiving the specific data synchronization task from the business side, record the requirements of the data synchronization task in the synchronization information table, record the association between multiple sets of source end and target end, and multiple source end to target end synchronization requirements. .

更具体的示例如,业务部署在单个地域中的场景,如果发生数据停机迁移,根据数据量和网络延迟条件,整体迁移过程可能需要几天的时间,而几天的停业务周期可能会对业务产生很大的影响,随着业务的快速发展和用户数量的增长,可能会因为用户跨域访问业务延迟较高无法实时访问导致用户体验较差,影响业务的扩展能力,还可能因为发生断电、网络中断等不可抗力因素导致无法正常获取数据,进而导致数据迁移失败,针对这种情况提出性能优异的数据迁移方法是非常必要的。增量数据同步是一种可参考的解决方案,如以海量平台为源数据库,以Hadoop(一个由Apache基金会所开发的分布式系统基础架构)大数据平台为目标数据库,MapReduce(一种编程模型,用于大规模数据集(大于1TB)的并行运算)作为大数据的计算引擎,以HDFS(一个分布式文件系统)分布式文件系统存储非结构化和半结构化的数据,以Hbase(一个分布式的、面向列的开源数据库)分布式数据库存储结构化数据,结合Java(一门面向对象编程语言)开发自动化增量数据导入方法,来实现源端数据库至目标端数据库的增量导入,虽然实现了对于数据的增量同步,避免了数据一次性整体迁移带来的长时间业务停机,但实时性较差,且无法满足业务的真实需求。另一种基于canal(数据库同步工具)的数据同步方法,以MYSQL(一个关系型数据库管理系统)、Oracle(一款关系数据库管理系统)等作为源端数据库,以KAFKA(一个开源流处理平台)、MYSQL等作为目标端,模拟mysql slave(从设备)与mysql master(主设备)的dump(备份文件系统)协议,解析binlog(二进制格式的日志文件)实现了数据同步的基本要求,但每个目标端需要部署一套程序进行同步,如果业务扩展至31省份则需要部署31套程序集群进行数据同步,若同步至不同的目标端还需更多的集群,这种情况下造成了集群资源的浪费且需要不断查询源端数据库造成数据库负载较高,严重时影响业务的正常使用。A more specific example is a scenario where the business is deployed in a single region. If data outage migration occurs, the overall migration process may take several days depending on the amount of data and network delay conditions, and the downtime period of several days may have an impact on the business. It has a great impact. With the rapid development of business and the increase in the number of users, the user experience may be poor due to the high delay of cross-domain access to the business and the inability to access the business in real time, affecting the scalability of the business, and may also occur due to power outages. , network interruption and other force majeure factors lead to the inability to obtain data normally, which in turn leads to data migration failure. It is very necessary to propose a data migration method with excellent performance for this situation. Incremental data synchronization is a reference solution, such as using a massive platform as the source database, Hadoop (a distributed system infrastructure developed by the Apache Foundation) big data platform as the target database, MapReduce (a programming model , used for parallel computing of large-scale data sets (larger than 1TB)) as a computing engine for big data, using HDFS (a distributed file system) distributed file system to store unstructured and semi-structured data, and Hbase (a distributed file system) A distributed, column-oriented open source database) distributed database stores structured data, and combines Java (an object-oriented programming language) to develop an automated incremental data import method to achieve incremental import from the source database to the target database. Although incremental synchronization of data is achieved and long-term business downtime caused by one-time overall data migration is avoided, the real-time performance is poor and cannot meet the real needs of the business. Another data synchronization method based on canal (database synchronization tool), using MYSQL (a relational database management system), Oracle (a relational database management system), etc. as the source database, and KAFKA (an open source stream processing platform) , MYSQL, etc. as the target end, simulating the dump (backup file system) protocol of mysql slave (slave device) and mysql master (master device), parsing binlog (log file in binary format) to achieve the basic requirements for data synchronization, but each The target needs to deploy a set of programs for synchronization. If the business expands to 31 provinces, 31 program clusters need to be deployed for data synchronization. If synchronization is performed to different targets, more clusters will be needed. In this case, cluster resources will be exhausted. It is wasteful and requires continuous querying of the source database, resulting in high database load, which may seriously affect the normal use of the business.

综合上述场景和可能选择的方案存在的问题,本实施例提供一种不停机迁移且同步效率较高的数据实时同步系统及同步方法,方法基于数据库binlog实现,对binlog的解析设置在中间设备完成,中间设备通过解析binlog获得源端的增量数据,然后将增量数据从中间设备同步至目标端,中间设备只需保存增量数据节约了存储空间并提高了同步效率。具体地,数据同步系统包括:源端RDS(关系型数据库服务,Relational DatabaseService)数据库(如MYSQL)、数据库binlog解析组件、目标端集群以及将源端数据库中的事务数据实时同步至目标端的消费程序集群;消费程序集群通过数据同步的表关联策略获取数据同步控制策略,通过数据同步控制策略控制数据同步的时间以及同步至的目标端,表关联策略通过将新建或者修改的源端信息表、目标端信息表以及所需的同步信息表进行关联而获得,源端信息表、目标端信息表和同步信息表三张表中都有id(Identity,身份标识)字段,唯一标识一个链路,三张表靠id关联,源端信息表存储源端数据库信息、BinlogTopic、数据库名、表名、主键等;目标端信息表针对不同的目标端存储不同的信息,如kafka会存储kafka的集群信息、topic、分区等,数据库会存储与源端数据库对应的数据表信息等;同步信息表会存储需要同步的源端与对应的目标端,以及数据同步启动的消费时间点等。Based on the above scenarios and the problems of possible solutions, this embodiment provides a real-time data synchronization system and synchronization method with non-stop migration and high synchronization efficiency. The method is implemented based on the database binlog, and the parsing and setting of the binlog is completed in the intermediate device. , the intermediate device obtains the incremental data from the source end by parsing the binlog, and then synchronizes the incremental data from the intermediate device to the target end. The intermediate device only needs to save the incremental data, which saves storage space and improves synchronization efficiency. Specifically, the data synchronization system includes: source RDS (Relational Database Service) database (such as MYSQL), database binlog parsing component, target cluster, and consumer program that synchronizes transaction data in the source database to the target in real time. Cluster; the consumer program cluster obtains the data synchronization control strategy through the table association strategy of data synchronization, and controls the time of data synchronization and the target end to which it is synchronized through the data synchronization control policy. The table association strategy controls the newly created or modified source end information table and target end. The source information table, the target information table and the synchronization information table all have an id (Identity, identity) field in the three tables, which uniquely identifies a link. The three tables Relying on ID association, the source information table stores source database information, BinlogTopic, database name, table name, primary key, etc.; the target information table stores different information for different targets. For example, Kafka will store Kafka's cluster information, topic, Partitions, etc., the database will store the data table information corresponding to the source database; the synchronization information table will store the source and corresponding target that need to be synchronized, as well as the consumption time point when data synchronization is started, etc.

进一步地,根据同步信息表中待进行数据同步的源端信息配置binlog解析组件,基于binlog解析组件解析源端的binlog以获取源端的数据变化,将数据变化存储在中间设备,具体包括:Further, configure the binlog parsing component according to the source-side information to be synchronized in the synchronization information table, parse the binlog of the source-side based on the binlog parsing component to obtain the data changes of the source-side, and store the data changes in the intermediate device, including:

根据待进行数据同步的源端数据库标识配置binlog解析组件,生成binlog解析组件的BinlogTopic,将BinlogTopic写入第一存储位置,在中间设备基于BinlogTopic设置存储源端的数据变化的第四存储位置;Configure the binlog parsing component according to the source database identifier to be synchronized, generate the BinlogTopic of the binlog parsing component, write the BinlogTopic into the first storage location, and set a fourth storage location for storing source data changes based on the BinlogTopic in the intermediate device;

基于binlog解析组件解析源端的binlog以获取源端的数据变化,根据数据变化获取源端数据表的名称和主键,将源端数据表的名称和主键写入第二存储位置,将数据变化写入第四存储位置。Based on the binlog parsing component, the binlog of the source is parsed to obtain the data changes of the source, the name and primary key of the source data table are obtained according to the data changes, the name and primary key of the source data table are written to the second storage location, and the data changes are written to the second storage location. Four storage locations.

具体而言,在本实施例中,基于数据库binlog的数据同步系统中,源端数据库的形式具体为MYSQL数据库;数据库binlog解析组件用于解析数据库binlog信息,解析、过滤、标准格式化源端数据,并在本地进行存储;消费程序集群,用于对本地存储的数据进行实时同步消费,即将数据库binlog解析组件生成的源端数据信息处理后实时同步至目标端集群,且消费程序集群依赖数据同步的表关联策略,表关联策略用于将新建或者修改的源端信息、目标端信息以及所需的同步数据表信息进行关联;目标端集群,包括MYSQL数据库,KAFKA集群,MQ集群等形式。Specifically, in this embodiment, in the data synchronization system based on the database binlog, the form of the source database is specifically a MYSQL database; the database binlog parsing component is used to parse the database binlog information, parse, filter, and standard format the source data. , and store it locally; the consumer program cluster is used to synchronize the consumption of locally stored data in real time, that is, the source data information generated by the database binlog parsing component is processed and synchronized to the target cluster in real time, and the consumer program cluster relies on data synchronization. The table association strategy is used to associate the newly created or modified source information, target information and the required synchronized data table information; the target cluster includes MYSQL database, KAFKA cluster, MQ cluster and other forms.

进一步地,根据同步信息表中与待进行数据同步的源端信息关联的待进行数据同步的目标端信息,启动消费程序将中间设备存储的数据变化同步至目标端,具体包括:Further, based on the target end information to be synchronized in the synchronization information table associated with the source end information to be synchronized, the consumption program is started to synchronize the data changes stored in the intermediate device to the target end, specifically including:

根据待进行数据同步的源端数据库标识获取BinlogTopic和源端数据表的名称和主键,根据源端数据表的名称和主键以及待将源端数据同步至的目标端地址在目标端设置分区,将目标端分区信息写入第三存储位置;Obtain the BinlogTopic and the name and primary key of the source data table based on the source database identifier to be synchronized, and set the partition on the target based on the name and primary key of the source data table and the target address to which the source data is to be synchronized. The target partition information is written to the third storage location;

启动消费程序根据BinlogTopic获取中间设备存储的数据变化,根据与待进行数据同步的源端信息关联的待将源端数据同步至的目标端地址获取目标端分区信息,根据目标端地址和目标端分区信息将获取的数据变化同步至目标端。Start the consumer program to obtain the data changes stored in the intermediate device based on the BinlogTopic, obtain the target partition information based on the target address to be synchronized to which the source data is to be synchronized, which is associated with the source information to be synchronized, and obtain the target partition information based on the target address and target partition. The information synchronizes the acquired data changes to the target.

具体而言,在本实施例中,整个系统运转流程为,中间设备获取业务侧提供的所需数据同步的源端MYSQL数据库标识,根据此MYSQL数据库标识配置数据库binlog解析组件,生成基于数据库的唯一数据库binlog解析ID(后续称为BinlogTopic),mysql每个操作都会记录到binlog,解析组件监听源端MYSQL数据库实时变化,实时解析数据库对应的binlog,将实时事务数据(用户对源端数据库的操作,包括insert、update、delete等)格式化后存储至本地(解析组件对应的集群主机);配置数据同步的表关联信息,先由用户将源端数据库信息、目标端同步信息、所需的数据同步表信息录入消费程序集群的存储模块,根据表关联信息确定业务侧所需的MYSQL数据库及所需同步的数据表名,所同步数据表需有主键,主键可以唯一标识一条数据,并确定需要同步的目标端,若目标端为数据库,需保证目标端数据库已建好与源端表结构一致的数据表;若目标端为KAFKA、MQ集群,需保证已建好对应的topic及分区,且与消费程序集群网络互通;启动消费程序,获取表关联信息中的源端数据库信息,根据源端数据库信息中的BinlogTopic信息,获取源端MYSQL数据库对应的BinlogTopic在本地存储的事务消息,获取表关联信息中的数据同步表信息,根据数据同步表中的表信息筛选及处理事务消息,获取表关联信息中的目标端同步信息,根据所配置的目标端地址将所筛选及处理后的数据实时同步至目标端。另外,传输的数据具体可以采用key-value(键值对)方式,key为表的字段信息,value字段值。通过上述流程,本实施例解决了用户迁移过程中因数据量大及延迟造成的业务暂停问题,可以并发同步多张表的变更数据,整体更新同步链路简单高效,同步效率较高,同步期间业务可正常运行,可配置一个源端数据库并发同步至多个目标端,解决了部署集群资源浪费及减少对源端数据库的性能影响,例如同时同步MYSQL数据库,KAFKA集群,可实现跨域同步,异地多活,真正实现一省数据多省同步备份。Specifically, in this embodiment, the entire system operation process is as follows: the intermediate device obtains the source-side MYSQL database identification required for data synchronization provided by the business side, configures the database binlog parsing component according to this MYSQL database identification, and generates a unique database-based The database binlog parses the ID (subsequently called BinlogTopic). Each MySQL operation will be recorded to the binlog. The parsing component monitors real-time changes in the source MYSQL database, parses the binlog corresponding to the database in real time, and converts real-time transaction data (user operations on the source database, Including insert, update, delete, etc.), they are formatted and stored locally (the cluster host corresponding to the parsing component); configure the table association information for data synchronization. First, the user synchronizes the source database information, target synchronization information, and required data. Table information is entered into the storage module of the consumer program cluster, and the MYSQL database required on the business side and the name of the data table to be synchronized are determined based on the table association information. The synchronized data table must have a primary key. The primary key can uniquely identify a piece of data and determine the need for synchronization. The target end. If the target end is a database, you need to ensure that the target end database has built a data table that is consistent with the source end table structure. If the target end is a KAFKA or MQ cluster, you need to ensure that the corresponding topic and partition have been built, and are consistent with the source end table structure. The consumer program cluster network is interconnected; start the consumer program, obtain the source database information in the table association information, obtain the locally stored transaction message of the BinlogTopic corresponding to the source MYSQL database according to the BinlogTopic information in the source database information, and obtain the table association information The data synchronization table information in the data synchronization table filters and processes transaction messages based on the table information in the data synchronization table, obtains the target synchronization information in the table association information, and synchronizes the filtered and processed data to the configured target address in real time. target end. In addition, the transmitted data can specifically adopt the key-value (key-value pair) method, where the key is the field information of the table and the value field value. Through the above process, this embodiment solves the problem of business suspension caused by large data volume and delay during user migration. It can synchronize the changed data of multiple tables concurrently. The overall update of the synchronization link is simple and efficient, and the synchronization efficiency is high. During the synchronization period The business can run normally, and a source database can be configured to synchronize to multiple targets concurrently, which solves the waste of deployment cluster resources and reduces the performance impact on the source database. For example, synchronizing the MYSQL database and KAFKA cluster at the same time can achieve cross-domain synchronization and off-site synchronization. Multi-Active truly realizes synchronized backup of data in one province and multiple provinces.

进一步地,所述源端的形式具体为MYSQL数据库,所述目标端的形式具体为MYSQL数据库、KAFKA集群、MQ集群、ES集群至少之一;Further, the form of the source end is specifically a MYSQL database, and the form of the target end is specifically at least one of a MYSQL database, a KAFKA cluster, an MQ cluster, and an ES cluster;

其中,MYSQL是关系型数据库管理系统,KAFKA是开源流处理平台,MQ是消息队列,ES是搜索服务器。Among them, MYSQL is a relational database management system, KAFKA is an open source stream processing platform, MQ is a message queue, and ES is a search server.

具体而言,在本实施例中,源端数据库可以包括MYSQL数据库,目标端包括MYSQL数据库、KAFKA集群、MQ(Message Queue,消息队列)集群、ES(Elasticsearch,搜索服务器)集群等,可同时将源端数据同步至多个目标端,如同时同步至RDS数据库及kafka集群。Specifically, in this embodiment, the source database may include a MYSQL database, and the target may include a MYSQL database, a KAFKA cluster, an MQ (Message Queue, message queue) cluster, an ES (Elasticsearch, a search server) cluster, etc., and can simultaneously The source data is synchronized to multiple targets, such as to the RDS database and Kafka cluster at the same time.

进一步地,所述消费程序的形式具体为虚机、物理机、容器之一。Further, the form of the consumption program is specifically one of a virtual machine, a physical machine, and a container.

具体而言,在本实施例中,消费程序集群可由虚机、物理机、k8s(Kubernetes,一个全新的基于容器技术的分布式架构解决方案)容器等组成。Specifically, in this embodiment, the consumer program cluster may be composed of virtual machines, physical machines, k8s (Kubernetes, a new distributed architecture solution based on container technology) containers, etc.

进一步地,所述方法还包括:Further, the method also includes:

在同步信息表中,关联待进行数据同步的源端信息配置将数据变化存储在中间设备的保存期限,和关联待进行数据同步的目标端信息配置消费程序的启动时间,所述启动时间在所述保存期限之内,和/或关联待进行数据同步的目标端信息配置目标端过滤字段。In the synchronization information table, the source information to be synchronized is associated with the storage period of the data changes stored in the intermediate device, and the target information to be synchronized is associated with the start time of the consumer program. The start time is at the Within the above retention period, and/or configure the target filter field associated with the target information to be synchronized.

具体而言,在本实施例中,数据同步的表关联策略还可以设置为,源端信息表,存储源端RDS数据库信息、程序部署集群信息等;目标端信息表,存储目标端地址、用户名密码、属性信息等;数据同步表信息,存储源端数据库信息、同步表信息、同步模式(如消费时间点、api(应用程序接口,Application Programming Interface)地址等)等。传输数据至目标端时可设置字段条件进行数据过滤,可设置数据保存期限为7天,可动态进行消费时间点配置,选择某一个时间的进行数据同步,可动态进行需要同步的数据表配置,从而可是实现断点续传,即源端数据库同步表的数据存储7天,仅调整关联表数据同步信息表中的消费时间点字段即可动态设置7天内的某一消费时间点进行数据同步,若下游系统因业务暂时停止数据同步任务,可在7天内实现断点续传,由上次暂停消费时间继续进行数据同步,防止数据丢失问题,动态设置的同步目标端字段条件过滤,可实现各省数据灵活转存。Specifically, in this embodiment, the table association strategy for data synchronization can also be set as follows: the source information table stores the source RDS database information, program deployment cluster information, etc.; the target information table stores the target address, user name, password, attribute information, etc.; data synchronization table information, which stores source database information, synchronization table information, synchronization mode (such as consumption time point, API (Application Programming Interface) address, etc.), etc. When transmitting data to the target end, you can set field conditions for data filtering, set the data retention period to 7 days, dynamically configure consumption time points, select a certain time for data synchronization, and dynamically configure data tables that need to be synchronized. This enables resumable transmission, that is, the data in the synchronization table of the source database is stored for 7 days. By simply adjusting the consumption time point field in the data synchronization information table of the associated table, a certain consumption time point within 7 days can be dynamically set for data synchronization. If the downstream system temporarily stops the data synchronization task due to business, it can resume the transmission within 7 days, and continue the data synchronization from the last consumption pause time to prevent data loss. The dynamically set synchronization target field condition filtering can realize all provinces. Flexible transfer of data.

进一步地,所述方法还包括:Further, the method also includes:

响应于监控到中间设备性能不足、binlog解析组件无法完成解析操作、或消费程序无法完成同步操作,发出告警。An alarm is issued in response to monitoring of insufficient performance of the intermediate device, the inability of the binlog parsing component to complete the parsing operation, or the inability of the consumer program to complete the synchronization operation.

具体而言,在本实施例中,还可设置如下辅助步骤,对于数据库binlog解析组件服务的监控及告警,如监控binlogtopic是否正常,是否出现延迟等现象,延迟是指源端可能会有大量数据变更,对应的BinlogTopic可能一时间无法解析完所有的操作;对于所生成的数据同步任务的监控及告警,即同步数据到目标端任务的告警,如是否任务失败,是否出现延迟,延迟是指源端出现大数据量的操作,可能无法及时同步到目标端;对于binlog解析组件服务所在主机、消费程序集群主机的监控及告警,如binlog解析组件所在的主机磁盘是否打满,内存是否不足等;另外,还可以定时由业务侧检验MYSQL数据库与目标端集群是否存在不一致情况,若不一致查询原因进行数据修复。Specifically, in this embodiment, the following auxiliary steps can also be set to monitor and alarm the database binlog parsing component service, such as monitoring whether the binlogtopic is normal and whether delays occur. Delay means that there may be a large amount of data at the source end. Change, the corresponding BinlogTopic may not be able to parse all operations at one time; for monitoring and alarming of the generated data synchronization task, that is, the alarm of the task of synchronizing data to the target, such as whether the task fails and whether there is a delay, the delay refers to the source If a large amount of data occurs on the end, it may not be synchronized to the target end in time; monitor and alarm the host where the binlog parsing component service is located, and the consumer program cluster host, such as whether the disk of the host where the binlog parsing component is located is full, whether the memory is insufficient, etc.; In addition, the business side can also regularly check whether there is any inconsistency between the MYSQL database and the target cluster. If there is any inconsistency, the cause can be queried for data repair.

实施例2:Example 2:

如图1和3所示,本发明提供一种用于数据同步的中间设备2,所述中间设备2连接源端1与目标端3,且包括:As shown in Figures 1 and 3, the present invention provides an intermediate device 2 for data synchronization. The intermediate device 2 connects the source end 1 and the target end 3, and includes:

配置模块21,用于通过同步信息表关联配置待进行数据同步的源端信息和目标端信息;The configuration module 21 is used to associate and configure the source information and target information to be synchronized through the synchronization information table;

解析模块22,与配置模块21连接,用于根据同步信息表中待进行数据同步的源端信息配置binlog解析组件,基于binlog解析组件解析源端的binlog以获取源端的数据变化,将数据变化存储在中间设备;The parsing module 22 is connected to the configuration module 21 and is used to configure the binlog parsing component according to the source end information to be synchronized in the synchronization information table, parse the binlog of the source end based on the binlog parsing component to obtain the data changes of the source end, and store the data changes in intermediate equipment;

同步模块23,与解析模块22连接,用于根据同步信息表中与待进行数据同步的源端信息关联的待进行数据同步的目标端信息,启动消费程序将中间设备存储的数据变化同步至目标端;The synchronization module 23 is connected to the parsing module 22 and is used to start the consumption program to synchronize the data changes stored in the intermediate device to the target according to the target information to be synchronized in the synchronization information table associated with the source information to be synchronized. end;

其中,binlog是二进制日志文件。Among them, binlog is a binary log file.

进一步地,配置模块21,具体包括:Further, the configuration module 21 specifically includes:

表设置单元,用于设置源端信息表、目标端信息表和同步信息表;The table setting unit is used to set the source information table, target information table and synchronization information table;

源端信息表单元,用于在源端信息表中写入多个源端数据库标识,并与每个源端数据库标识关联地设置存储BinlogTopic的第一存储位置、存储源端数据表的名称和主键的第二存储位置;The source information table unit is used to write multiple source database identifiers in the source information table, and set the first storage location for storing BinlogTopic, the name of the source data table, and the name of the source data table in association with each source database identifier. The secondary storage location of the primary key;

目标端信息表单元,用于在目标端信息表中写入多个目标端地址,并与每个目标端地址关联地设置存储目标端分区信息的第三存储位置;The target information table unit is used to write multiple target addresses in the target information table, and set a third storage location for storing target partition information in association with each target address;

同步信息表单元,用于在同步信息表中关联地写入待进行数据同步的源端数据库标识和待将源端数据同步至的目标端地址;The synchronization information table unit is used to write the source database identifier to be synchronized and the target address to be synchronized to in the synchronization information table;

其中,BinlogTopic是binlog解析组件的唯一标识。Among them, BinlogTopic is the unique identifier of the binlog parsing component.

进一步地,解析模块22,具体包括:Further, the parsing module 22 specifically includes:

BinlogTopic单元,用于根据待进行数据同步的源端数据库标识配置binlog解析组件,生成binlog解析组件的BinlogTopic,将BinlogTopic写入第一存储位置,在中间设备基于BinlogTopic设置存储源端的数据变化的第四存储位置;The BinlogTopic unit is used to configure the binlog parsing component according to the source database identifier to be synchronized, generate the BinlogTopic of the binlog parsing component, write the BinlogTopic to the first storage location, and store the fourth step of the source data changes in the intermediate device based on the BinlogTopic setting. storage location;

数据变化信息单元,用于基于binlog解析组件解析源端的binlog以获取源端的数据变化,根据数据变化获取源端数据表的名称和主键,将源端数据表的名称和主键写入第二存储位置,将数据变化写入第四存储位置。The data change information unit is used to parse the binlog of the source end based on the binlog parsing component to obtain the data changes of the source end, obtain the name and primary key of the source end data table based on the data changes, and write the name and primary key of the source end data table to the second storage location. , writing the data changes to the fourth storage location.

进一步地,同步模块23,具体包括:Further, the synchronization module 23 specifically includes:

目标端分区信息单元,用于根据待进行数据同步的源端数据库标识获取BinlogTopic和源端数据表的名称和主键,根据源端数据表的名称和主键以及待将源端数据同步至的目标端地址在目标端设置分区,将目标端分区信息写入第三存储位置;The target partition information unit is used to obtain the BinlogTopic and the name and primary key of the source data table based on the source database identifier to be synchronized, and the name and primary key of the source data table and the target to which the source data is to be synchronized. The address sets the partition on the target side and writes the target side partition information to the third storage location;

目标端分区同步单元,用于启动消费程序根据BinlogTopic获取中间设备存储的数据变化,根据与待进行数据同步的源端信息关联的待将源端数据同步至的目标端地址获取目标端分区信息,根据目标端地址和目标端分区信息将获取的数据变化同步至目标端。The target partition synchronization unit is used to start the consumer program to obtain the data changes stored in the intermediate device based on the BinlogTopic, and obtain the target partition information based on the target address to which the source data is to be synchronized, which is associated with the source information to be synchronized. Synchronize the acquired data changes to the target based on the target address and target partition information.

进一步地,所述源端的形式具体为MYSQL数据库,所述目标端的形式具体为MYSQL数据库、KAFKA集群、MQ集群、ES集群至少之一;Further, the form of the source end is specifically a MYSQL database, and the form of the target end is specifically at least one of a MYSQL database, a KAFKA cluster, an MQ cluster, and an ES cluster;

其中,MYSQL是关系型数据库管理系统,KAFKA是开源流处理平台,MQ是消息队列,ES是搜索服务器。Among them, MYSQL is a relational database management system, KAFKA is an open source stream processing platform, MQ is a message queue, and ES is a search server.

进一步地,所述消费程序的形式具体为虚机、物理机、容器之一。Further, the form of the consumption program is specifically one of a virtual machine, a physical machine, and a container.

进一步地,所述同步信息表单元还用于:Further, the synchronization information table unit is also used for:

在同步信息表中,关联待进行数据同步的源端信息配置将数据变化存储在中间设备的保存期限,和关联待进行数据同步的目标端信息配置消费程序的启动时间,所述启动时间在所述保存期限之内,和/或关联待进行数据同步的目标端信息配置目标端过滤字段。In the synchronization information table, the source information to be synchronized is associated with the storage period of the data changes stored in the intermediate device, and the target information to be synchronized is associated with the start time of the consumer program. The start time is at the Within the above retention period, and/or configure the target filter field associated with the target information to be synchronized.

进一步地,所述中间设备2还包括:Further, the intermediate device 2 also includes:

监控与告警单元,用于响应于监控到中间设备性能不足、binlog解析组件无法完成解析操作、或消费程序无法完成同步操作,发出告警。The monitoring and alarming unit is used to issue an alarm in response to monitoring of insufficient performance of the intermediate device, the inability of the binlog parsing component to complete the parsing operation, or the inability of the consumer program to complete the synchronization operation.

实施例3:Example 3:

本发明实施例3提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序被处理器运行时,实现如实施例1所述的数据同步方法。Embodiment 3 of the present invention provides a computer-readable storage medium. A computer program is stored in the computer-readable storage medium. When the computer program is run by a processor, the data synchronization method as described in Embodiment 1 is implemented.

所述计算机可读存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、计算机程序模块或其他数据)的任何方法或技术中实施的易失性或非易失性、可移除或不可移除的介质。计算机可读存储介质包括但不限于RAM(Random Access Memory,随机存取存储器),ROM(Read-Only Memory,只读存储器),EEPROM(Electrically ErasableProgrammable read only memory,带电可擦可编程只读存储器)、闪存或其他存储器技术、CD-ROM(Compact Disc Read-Only Memory,光盘只读存储器),数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。The computer-readable storage media includes volatile or nonvolatile, removable storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, computer program modules or other data. or non-removable media. Computer-readable storage media include but are not limited to RAM (Random Access Memory, random access memory), ROM (Read-Only Memory, read-only memory), EEPROM (Electrically Erasable Programmable read only memory, electrically erasable programmable read-only memory) , flash memory or other memory technology, CD-ROM (Compact Disc Read-Only Memory), digital versatile disk (DVD) or other optical disk storage, magnetic cassette, tape, magnetic disk storage or other magnetic storage device, or Any other medium that can be used to store the desired information and that can be accessed by a computer.

另外,本发明还可以提供一种计算机装置,包括存储器和处理器,所述存储器中存储有计算机程序,当所述处理器运行所述存储器存储的计算机程序时,所述处理器执行如实施例1所述的数据同步方法。In addition, the present invention can also provide a computer device, including a memory and a processor. A computer program is stored in the memory. When the processor runs the computer program stored in the memory, the processor executes the embodiments. The data synchronization method described in 1.

其中,存储器与处理器连接,存储器可采用闪存或只读存储器或其他存储器,处理器可采用中央处理器或单片机。The memory is connected to the processor, and the memory can be flash memory, read-only memory, or other memories, and the processor can be a central processing unit or a single-chip microcomputer.

再者,本发明还可以提供一种如图1或图4所示的数据同步系统,其中的中间设备执行如实施例1所述的数据同步方法,其余组成部分及其功能如实施例1所述。Furthermore, the present invention can also provide a data synchronization system as shown in Figure 1 or Figure 4, in which the intermediate device performs the data synchronization method as described in Embodiment 1, and the remaining components and their functions are as described in Embodiment 1. narrate.

本发明实施例1-3提供一种数据同步方法、用于数据同步的中间设备以及计算机可读存储介质,通过同步信息表的关联配置,使得中间设备及时获取待进行数据同步的源端信息和目标端信息,然后采用binlog解析组件实时获取源端的数据变化,并将数据变化存储在中间设备,再通过消费程序将中间设备存储的数据变化同步至目标端,通过同步信息表可以实现对源端和目标端灵活的关联配置,从而可制定从源端到目标端高效的数据同步策略,通过中间设备根据数据同步策略获取、存储并同步数据变化,可实现从源端到目标端不停机且灵活可控的迁移数据。Embodiments 1-3 of the present invention provide a data synchronization method, an intermediate device for data synchronization, and a computer-readable storage medium. Through the associated configuration of the synchronization information table, the intermediate device can obtain the source information to be synchronized and the source information to be synchronized in a timely manner. Target end information, and then use the binlog parsing component to obtain the data changes of the source end in real time, and store the data changes in the intermediate device, and then synchronize the data changes stored in the intermediate device to the target end through the consumer program. Through the synchronization information table, the source end can be realized Flexible association configuration with the target end, so that an efficient data synchronization strategy from the source end to the target end can be formulated. The intermediate device obtains, stores and synchronizes data changes according to the data synchronization strategy, achieving non-stop and flexible operation from the source end to the target end. Controlled data migration.

可以理解的是,以上实施方式仅仅是为了说明本发明的原理而采用的示例性实施方式,然而本发明并不局限于此。对于本领域内的普通技术人员而言,在不脱离本发明的精神和实质的情况下,可以做出各种变型和改进,这些变型和改进也视为本发明的保护范围。It can be understood that the above embodiments are only exemplary embodiments adopted to illustrate the principles of the present invention, but the present invention is not limited thereto. For those of ordinary skill in the art, various modifications and improvements can be made without departing from the spirit and essence of the present invention, and these modifications and improvements are also regarded as the protection scope of the present invention.

Claims (10)

1. The data synchronization method is characterized by being applied to an intermediate device connecting a source end and a target end, and comprising the following steps:
source terminal information and target terminal information to be subjected to data synchronization are configured in an associated mode through a synchronization information table;
configuring a binlog analysis component according to source information to be subjected to data synchronization in a synchronization information table, analyzing binlog of a source based on the binlog analysis component to obtain data change of the source, and storing the data change in an intermediate device;
according to target end information to be subjected to data synchronization, which is associated with source end information to be subjected to data synchronization, in the synchronization information table, starting a consumption program to synchronize data changes stored in the intermediate equipment to the target end;
where binlog is a binary log file.
2. The method according to claim 1, wherein the source side information and the target side information to be subjected to data synchronization are configured in a correlated manner through a synchronization information table, specifically comprising:
setting a source end information table, a target end information table and a synchronous information table;
writing a plurality of source database identifications in a source information table, and setting a first storage position for storing BinlogTopic, a name for storing the source data table and a second storage position for storing a main key in association with each source database identification;
writing a plurality of target end addresses into a target end information table, and setting a third storage position for storing target end partition information in association with each target end address;
writing a source end database identifier to be subjected to data synchronization and a target end address to which source end data are to be synchronized in a related manner in a synchronization information table;
wherein BinlogTopic is the unique identification of the binlog parsing component.
3. The method according to claim 2, wherein the binlog parsing component is configured according to source information to be data synchronized in the synchronization information table, and the binlog of the source is parsed by the binlog parsing component to obtain data changes of the source, and the data changes are stored in the intermediate device, specifically including:
configuring a binlog analysis component according to a source end database identifier to be subjected to data synchronization, generating a binlog topic of the binlog analysis component, writing the binlog topic into a first storage position, and setting a fourth storage position for storing data change of a source end based on the binlog topic in an intermediate device;
and analyzing the binlog of the source terminal based on the binlog analysis component to acquire the data change of the source terminal, acquiring the name and the primary key of the source terminal data table according to the data change, writing the name and the primary key of the source terminal data table into the second storage position, and writing the data change into the fourth storage position.
4. A method according to claim 3, wherein starting the consumption program to synchronize the data change stored in the intermediate device to the target according to the target information to be synchronized associated with the source information to be synchronized in the synchronization information table, specifically comprises:
acquiring the names and the primary keys of the BinlogTopic and the source data table according to the source database identification to be subjected to data synchronization, setting a partition at a target end according to the names and the primary keys of the source data table and the target end address to be subjected to the source data synchronization, and writing target end partition information into a third storage position;
and starting a consumption program to acquire data changes stored in the intermediate device according to the BinlogTopic, acquiring target end partition information according to a target end address to which source end data are to be synchronized, which is associated with the source end information to be subjected to data synchronization, and synchronizing the acquired data changes to the target end according to the target end address and the target end partition information.
5. The method according to any one of claims 1-4, wherein the source terminal is in the form of a MYSQL database, and the target terminal is in the form of at least one of a MYSQL database, a KAFKA cluster, a MQ cluster, and an ES cluster;
wherein MYSQL is a relational database management system, KAFKA is an open source stream processing platform, MQ is a message queue, and ES is a search server.
6. The method according to any of claims 1-4, wherein the consumption program is in the form of one of a virtual machine, a physical machine, a container.
7. The method according to any one of claims 1-4, further comprising:
in the synchronization information table, the source information configuration associated with the data to be synchronized stores the data change in a storage period of the intermediate device, and the target information configuration associated with the data to be synchronized consumes a start time of the program, wherein the start time is within the storage period, and/or the target information configuration associated with the data to be synchronized targets filters.
8. The method according to any one of claims 1-4, further comprising:
and sending an alarm in response to monitoring that the performance of the intermediate device is insufficient, the binlog analysis component cannot complete analysis operation, or the consumption program cannot complete synchronous operation.
9. An intermediate device for data synchronization, wherein the intermediate device connects a source end and a target end, and comprises:
the configuration module is used for associating and configuring source terminal information and target terminal information to be subjected to data synchronization through the synchronization information table;
the analysis module is connected with the configuration module and is used for configuring a binlog analysis component according to source end information to be subjected to data synchronization in the synchronization information table, analyzing binlog of a source end based on the binlog analysis component to acquire data change of the source end, and storing the data change in the intermediate equipment;
the synchronous module is connected with the analysis module and is used for starting a consumption program to synchronize the data change stored in the intermediate equipment to the target end according to the target end information to be subjected to data synchronization, which is associated with the source end information to be subjected to data synchronization, in the synchronous information table;
where binlog is a binary log file.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when being executed by a processor, implements the data synchronization method according to any of claims 1-8.
CN202311125525.3A 2023-09-01 2023-09-01 Data synchronization method, device and medium Pending CN117171132A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311125525.3A CN117171132A (en) 2023-09-01 2023-09-01 Data synchronization method, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311125525.3A CN117171132A (en) 2023-09-01 2023-09-01 Data synchronization method, device and medium

Publications (1)

Publication Number Publication Date
CN117171132A true CN117171132A (en) 2023-12-05

Family

ID=88938932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311125525.3A Pending CN117171132A (en) 2023-09-01 2023-09-01 Data synchronization method, device and medium

Country Status (1)

Country Link
CN (1) CN117171132A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119149509A (en) * 2024-09-10 2024-12-17 湖南长银五八消费金融股份有限公司 Data synchronization method, device, equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119149509A (en) * 2024-09-10 2024-12-17 湖南长银五八消费金融股份有限公司 Data synchronization method, device, equipment and medium
CN119149509B (en) * 2024-09-10 2025-03-21 湖南长银五八消费金融股份有限公司 Data synchronization method, device, equipment and medium

Similar Documents

Publication Publication Date Title
EP3754514B1 (en) Distributed database cluster system, data synchronization method and storage medium
CN103902617B (en) Distributed data base synchronous method and system
CN103116661B (en) A kind of data processing method of database
CN105260376B (en) Method, apparatus and system for clustered node reducing and expansion
CN105468473A (en) Data migration method and data migration apparatus
CN104735110B (en) Metadata management method and system
CN109325200B (en) Method and device for acquiring data and computer readable storage medium
CN101667181A (en) Method, device and system for data disaster tolerance
CN113987064A (en) Data processing method, system and equipment
CN111552701B (en) Method for determining data consistency in distributed cluster and distributed data system
CN103095806A (en) Load balancing management system of large-power-network real-time database system
CN106339387B (en) A data synchronization method and device for a new server in a database cluster
WO2017181430A1 (en) Method and device for duplicating database in distributed system
CN104468274A (en) Cluster monitor and management method and system
CN116304390B (en) Time sequence data processing method and device, storage medium and electronic equipment
CN112612850A (en) Data synchronization method and device
CN112685499A (en) Method, device and equipment for synchronizing process data of work service flow
CN109361777A (en) Synchronous method, synchronization system and the relevant apparatus of distributed type assemblies node state
CN117171132A (en) Data synchronization method, device and medium
CN103490923B (en) The reading/writing method of journal file, Apparatus and system
CN113157701A (en) Dual-activity mechanism deployment method and device of ORACLE database
EP3349416B1 (en) Relationship chain processing method and system, and storage medium
CN106557530B (en) Operation system, data recovery method and device
CN110019092A (en) Method, controller and the system of data storage
CN117851380A (en) System and method for migrating MySQL sharded databases and tables to NewSQL based on consortium chain and Zookeeper

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination