WO2024077802A1 - Cross-region data synchronization method and system, and computer readable medium - Google Patents

Cross-region data synchronization method and system, and computer readable medium Download PDF

Info

Publication number
WO2024077802A1
WO2024077802A1 PCT/CN2023/070424 CN2023070424W WO2024077802A1 WO 2024077802 A1 WO2024077802 A1 WO 2024077802A1 CN 2023070424 W CN2023070424 W CN 2023070424W WO 2024077802 A1 WO2024077802 A1 WO 2024077802A1
Authority
WO
WIPO (PCT)
Prior art keywords
synchronization
data
plan
data synchronization
service
Prior art date
Application number
PCT/CN2023/070424
Other languages
French (fr)
Chinese (zh)
Inventor
孔祥强
林喆
于汉岭
Original Assignee
上海商米科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商米科技集团股份有限公司 filed Critical 上海商米科技集团股份有限公司
Publication of WO2024077802A1 publication Critical patent/WO2024077802A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Abstract

The present invention relates to a cross-region data synchronization method and system, and a computer readable medium. The cross-region data synchronization method comprises: submitting a data synchronization statement according to a syntax structure of a data synchronization language; parsing the data synchronization statement to obtain a synchronization plan, wherein the synchronization plan is suitable for synchronizing data of unit nodes of one or more synchronization regions to a central node; and executing the synchronization plan to obtain an execution result. According to the technical solution of the present invention, the content of the data synchronization statement is standardized by means of a unified syntax structure of the data synchronization language, a corresponding synchronization plan is obtained according to the data synchronization statement, all synchronization plans do not need to be subjected to additional scheduling and development, and the synchronization plans are easy to maintain and have low development cost; and after the synchronization plan is executed, the data of the unit nodes of the synchronization regions are synchronized to the central node, and data synchronization can be implemented by the unified deployment of the cross-region data synchronization method at the central node.

Description

跨区域的数据同步方法、系统及计算机可读介质Cross-region data synchronization method, system and computer-readable medium 技术领域Technical Field
本发明主要涉及计算机信息技术领域,具体地涉及一种跨区域的数据同步方法、系统及计算机可读介质。The present invention mainly relates to the field of computer information technology, and in particular to a cross-regional data synchronization method, system and computer-readable medium.
背景技术Background technique
在《通用数据保护条例》(General Data Protection Regulation,GDPR)严格规定了数据处理、存储和管理规则的背景下,全球化业务中跨多个地理区域的数据本地化存储需求日益紧迫,但一些数据需求仍要全局统计,传统的数据传输方案已无法满足多地部署的需求,并且随着业务需求的增加,需要将存储于不同地理区域的数据同步汇总至统一的节点以进一步处理、计算数据。Under the background of the General Data Protection Regulation (GDPR) which strictly stipulates the rules for data processing, storage and management, the demand for local storage of data across multiple geographic regions in global business is becoming increasingly urgent. However, some data needs still need to be counted globally. Traditional data transmission solutions can no longer meet the needs of multi-location deployment. Moreover, as business needs increase, data stored in different geographic regions need to be synchronized and aggregated to a unified node for further processing and calculation of data.
图1是现有技术中跨区域的数据同步方案的示例性架构图,参考图1所示,跨地理区域的数据存储位置分为单元节点和中心节点,每个节点都有相应的数据集成服务,数据集成服务用于接收、处理、存储当地区域的数据。由于业务或技术需求,需要将各单元节点的数据同步至统一的中心节点。目前已有的数据传输同步方案主要分为离线同步和实时同步,其中离线同步主要基于批量文件传输的离线传输方案,实时同步主要基于接口服务的传输方案。图1中的数据同步方式包括:实时数据传输客户端、实时同步接收服务和文件离线同步方案,其中,实时数据传输客户端负责从各个单元节点获取实时数据,并通过接口服务的方式向中心节点传输数据;实时同步接收服务负责接收、处理各个单元节点的数据,并将数据分发到中心节点对应的数据通道中;文件离线同步方案负责通过文件接口从各个单元节点本地存储的文件中,比对、拉取文件并将文件写入中心节点,以及记录每次文件传输的时间、状态,便于传输过程失败时重启任务、避免重复传输文件。FIG1 is an exemplary architecture diagram of a cross-regional data synchronization solution in the prior art. Referring to FIG1, the data storage location across geographical regions is divided into unit nodes and central nodes. Each node has a corresponding data integration service, and the data integration service is used to receive, process, and store data in the local area. Due to business or technical requirements, it is necessary to synchronize the data of each unit node to a unified central node. The existing data transmission synchronization solutions are mainly divided into offline synchronization and real-time synchronization, wherein offline synchronization is mainly based on an offline transmission solution for batch file transmission, and real-time synchronization is mainly based on a transmission solution for interface services. The data synchronization method in FIG1 includes: a real-time data transmission client, a real-time synchronization receiving service, and a file offline synchronization solution, wherein the real-time data transmission client is responsible for obtaining real-time data from each unit node and transmitting data to the central node through an interface service; the real-time synchronization receiving service is responsible for receiving and processing the data of each unit node, and distributing the data to the data channel corresponding to the central node; the file offline synchronization solution is responsible for comparing, pulling files from the files stored locally in each unit node through the file interface, and writing the files to the central node, as well as recording the time and status of each file transmission, so as to facilitate restarting the task when the transmission process fails and avoid repeated file transmission.
参考图1所示,现有技术中跨区域的数据同步方案至少需要维护文件离线同步方案、实时同步接收服务及其实时数据传输客户端,至少需要两套组件及相应代码,并且离线传输方案依赖其他组件、方案复杂,实时传输方案研发成本高、数据一致性很难保证、所需部署资源多。并且各个跨地理区域的单元节点可能使用不同的本地存储系统,同步方案的开发过程中需要对接不同的文件接口,若单元节点增 加,相应所需的部署资源、维护成本也随之增加。因此亟需一种统一部署、易维护、低开发成本的跨区域的数据同步方案。As shown in reference to Figure 1, the cross-regional data synchronization solution in the prior art at least needs to maintain the file offline synchronization solution, the real-time synchronization receiving service and its real-time data transmission client, which requires at least two sets of components and corresponding codes, and the offline transmission solution depends on other components and is complex. The real-time transmission solution has high R&D costs, data consistency is difficult to ensure, and requires many deployment resources. In addition, each unit node across geographical regions may use different local storage systems, and different file interfaces need to be connected during the development of the synchronization solution. If the number of unit nodes increases, the corresponding deployment resources and maintenance costs will also increase. Therefore, there is an urgent need for a cross-regional data synchronization solution that is uniformly deployed, easy to maintain, and has low development costs.
发明内容Summary of the invention
本申请所要解决的技术问题是提供一种跨区域的数据同步方法、系统及计算机可读介质,可以实现统一部署、易维护且开发成本低。The technical problem to be solved by the present application is to provide a cross-regional data synchronization method, system and computer-readable medium, which can achieve unified deployment, easy maintenance and low development cost.
本申请为解决上述技术问题而采用的技术方案是一种跨区域的数据同步方法,包括:根据数据同步语言的语法结构提交数据同步语句;解析数据同步语句,获得同步计划,同步计划适于将一个或多个同步区域的单元节点的数据同步至中心节点;执行同步计划,获得执行结果。The technical solution adopted by the present application to solve the above-mentioned technical problems is a cross-regional data synchronization method, including: submitting data synchronization statements according to the grammatical structure of the data synchronization language; parsing the data synchronization statements to obtain a synchronization plan, which is suitable for synchronizing the data of unit nodes in one or more synchronization areas to a central node; executing the synchronization plan to obtain an execution result.
在本申请的一实施例中,在根据数据同步语言的语法结构提交数据同步语句的步骤之前,还包括:配置Alluxio数据编排服务的地址和目录,Alluxio数据编排服务用于挂载底层文件系统供上层计算框架和应用访问。In one embodiment of the present application, before the step of submitting a data synchronization statement according to the grammatical structure of the data synchronization language, it also includes: configuring the address and directory of the Alluxio data orchestration service, and the Alluxio data orchestration service is used to mount the underlying file system for access by the upper-level computing framework and applications.
在本申请的一实施例中,在根据数据同步语言的语法结构提交数据同步语句的步骤之前,还包括:配置元数据服务的地址、一个或多个同步区域的底层文件系统的地址、计算引擎的执行目录、同步引擎的通讯地址、同步引擎的端口、同步执行者的通讯地址、同步执行者的端口、同步网络服务的地址、同步网络服务的端口中的一项或任意项。In one embodiment of the present application, before the step of submitting a data synchronization statement according to the grammatical structure of the data synchronization language, it also includes: configuring one or any item of the address of the metadata service, the address of the underlying file system of one or more synchronization areas, the execution directory of the computing engine, the communication address of the synchronization engine, the port of the synchronization engine, the communication address of the synchronization executor, the port of the synchronization executor, the address of the synchronization network service, and the port of the synchronization network service.
在本申请的一实施例中,语法结构包括配置项,配置项包括:调度周期、调度时区、计算引擎、是否选定当地区域时区、源数据表分区、目标数据表分区、需要同步的分区列表、视图名称中的一项或任意项,其中,源数据表对应于单元节点,目标数据表对应于中心节点。In one embodiment of the present application, the grammatical structure includes configuration items, which include: a scheduling period, a scheduling time zone, a computing engine, whether to select a local area time zone, a source data table partition, a target data table partition, a list of partitions to be synchronized, and a view name, wherein the source data table corresponds to a unit node and the target data table corresponds to a central node.
在本申请的一实施例中,同步计划包括离线同步计划和/或实时同步计划,实时同步计划包括视图名称。In an embodiment of the present application, the synchronization plan includes an offline synchronization plan and/or a real-time synchronization plan, and the real-time synchronization plan includes a view name.
在本申请的一实施例中,解析数据同步语句的步骤包括:根据语法结构解析数据同步语句,获得解析结果;在解析结果为解析成功后获得数据同步语法树,数据同步语法树包括配置项的信息、源数据表的表名、目标数据表的表名;根据元数据服务的地址连接元数据服务,根据目标数据表的表名判断元数据服务中是否存在目标数据表;若目标数据表存在,则创建Alluxio数据编排服务对应于源数据表的 映射目录;将一个或多个同步区域的底层文件系统的地址中的源表目录挂载至映射目录;创建映射目录源表,映射目录源表的位置为映射目录;判断是否进行实时同步,若不进行实时同步则生成离线同步计划;若进行实时同步则根据视图名称创建视图,以及生成实时同步计划。In one embodiment of the present application, the step of parsing a data synchronization statement includes: parsing the data synchronization statement according to a grammatical structure to obtain a parsing result; after the parsing result is a successful parsing, obtaining a data synchronization syntax tree, the data synchronization syntax tree includes information about configuration items, a table name of a source data table, and a table name of a target data table; connecting to a metadata service according to an address of the metadata service, and judging whether a target data table exists in the metadata service according to the table name of the target data table; if the target data table exists, creating a mapping directory corresponding to the source data table for the Alluxio data orchestration service; mounting the source table directory in the address of the underlying file system of one or more synchronization areas to the mapping directory; creating a mapping directory source table, the location of the mapping directory source table being the mapping directory; judging whether to perform real-time synchronization, and generating an offline synchronization plan if not; creating a view according to the view name if real-time synchronization is performed, and generating a real-time synchronization plan.
在本申请的一实施例中,解析数据同步语句的步骤还包括:根据配置项的信息、Alluxio数据编排服务的地址和元数据服务的地址,生成Alluxio数据编排服务的文件元数据同步计划和元数据服务的分区元数据同步计划;计算引擎读取映射目录源表中的数据字段,并将数据字段写入目标数据表;将离线同步计划、实时同步计划、Alluxio数据编排服务的文件元数据同步计划、元数据服务的分区元数据同步计划中的一个或多个存入中心节点的数据库。In one embodiment of the present application, the step of parsing the data synchronization statement also includes: generating a file metadata synchronization plan for the Alluxio data orchestration service and a partition metadata synchronization plan for the metadata service according to information of the configuration item, the address of the Alluxio data orchestration service and the address of the metadata service; the computing engine reads the data field in the mapping directory source table and writes the data field into the target data table; and stores one or more of the offline synchronization plan, the real-time synchronization plan, the file metadata synchronization plan for the Alluxio data orchestration service and the partition metadata synchronization plan for the metadata service into the database of the central node.
在本申请的一实施例中,在解析数据同步语句,获得同步计划的步骤之后,还包括:定期读取同步计划,判断同步计划是否符合运行条件,运行条件包括计划执行时区和计划执行时间;若符合运行条件则执行同步计划。In one embodiment of the present application, after the step of parsing the data synchronization statement and obtaining the synchronization plan, it also includes: regularly reading the synchronization plan to determine whether the synchronization plan meets the operating conditions, the operating conditions including the planned execution time zone and the planned execution time; if the operating conditions are met, the synchronization plan is executed.
在本申请的一实施例中,在执行同步计划,获得执行结果的步骤之后,还包括:判断执行结果是否成功,若执行结果为成功,则将成功的执行结果写入日志;若执行结果为失败,则进行报警并将失败的执行结果写入日志。In one embodiment of the present application, after executing the synchronization plan and obtaining the execution result, it also includes: judging whether the execution result is successful. If the execution result is successful, writing the successful execution result into a log; if the execution result is a failure, issuing an alarm and writing the failed execution result into a log.
在本申请的一实施例中,在执行同步计划,获得执行结果的步骤之后,还包括显示同步计划和执行结果。In one embodiment of the present application, after the steps of executing the synchronization plan and obtaining the execution results, the synchronization plan and the execution results are also displayed.
在本申请的一实施例中,在显示同步计划和执行结果的步骤中,同步计划包括存储在中心节点的数据库中的任意同步计划。In one embodiment of the present application, in the step of displaying the synchronization plan and the execution result, the synchronization plan includes any synchronization plan stored in the database of the central node.
本申请为解决上述技术问题还提出一种跨区域的数据同步系统,包括:同步客户端模块,用于根据数据同步语言的语法结构提交数据同步语句;同步引擎模块,用于解析数据同步语句,获得同步计划,同步计划适于将一个或多个同步区域的单元节点的数据同步至中心节点;同步执行者模块,用于执行同步计划,获得执行结果;存储器,用于存储可由处理器执行的指令;处理器,用于执行指令以实现如上的数据同步方法。In order to solve the above-mentioned technical problems, the present application also proposes a cross-regional data synchronization system, including: a synchronization client module, used to submit data synchronization statements according to the grammatical structure of the data synchronization language; a synchronization engine module, used to parse the data synchronization statements and obtain a synchronization plan, and the synchronization plan is suitable for synchronizing the data of the unit nodes of one or more synchronization areas to the central node; a synchronization executor module, used to execute the synchronization plan and obtain the execution result; a memory, used to store instructions executable by a processor; a processor, used to execute instructions to implement the above data synchronization method.
在本申请的一实施例中,数据同步系统还包括同步网络服务模块,用于前端显示同步计划和执行结果。In one embodiment of the present application, the data synchronization system also includes a synchronization network service module for displaying the synchronization plan and execution results on the front end.
本申请为解决上述技术问题还提出一种存储有计算机程序代码的计算机可读 介质,计算机程序代码在由处理器执行时实现如上的数据同步方法。In order to solve the above technical problems, the present application also proposes a computer-readable medium storing computer program code, and the computer program code implements the above data synchronization method when executed by a processor.
本申请的技术方案通过统一的数据同步语言的语法结构来规范数据同步语句的内容,根据数据同步语句获得相应的同步计划,所有的同步计划无需借助额外的调度和开发,同步计划易维护且开发成本低;执行同步计划后,各个同步区域的单元节点的数据同步至中心节点,在中心节点统一部署跨区域的数据同步方法即可实现数据同步。The technical solution of the present application standardizes the content of data synchronization statements through the grammatical structure of a unified data synchronization language, and obtains corresponding synchronization plans based on the data synchronization statements. All synchronization plans do not require additional scheduling and development, and the synchronization plans are easy to maintain and have low development costs. After the synchronization plan is executed, the data of the unit nodes in each synchronization area are synchronized to the central node, and data synchronization can be achieved by uniformly deploying a cross-regional data synchronization method at the central node.
附图概述BRIEF DESCRIPTION OF THE DRAWINGS
为让本申请的上述目的、特征和优点能更明显易懂,以下结合附图对本申请的具体实施方式作详细说明,其中:In order to make the above-mentioned objects, features and advantages of the present application more obvious and easy to understand, the specific implementation methods of the present application are described in detail below with reference to the accompanying drawings, wherein:
图1是现有技术中跨区域的数据同步方案的示例性架构图;FIG1 is an exemplary architecture diagram of a cross-region data synchronization solution in the prior art;
图2是本申请一实施例的跨区域的数据同步方法的示例性流程图;FIG2 is an exemplary flow chart of a cross-region data synchronization method according to an embodiment of the present application;
图3是本申请另一实施例的跨区域的数据同步方法的示例性流程图;FIG3 is an exemplary flow chart of a cross-region data synchronization method according to another embodiment of the present application;
图4是本申请一实施例的Xensql数据同步服务的示例性架构图;FIG4 is an exemplary architecture diagram of a Xensql data synchronization service according to an embodiment of the present application;
图5是本申请一实施例的跨区域的数据同步系统的示例性架构图;FIG5 is an exemplary architecture diagram of a cross-regional data synchronization system according to an embodiment of the present application;
图6是本申请另一实施例的跨区域的数据同步系统的示例性部署示意图;FIG6 is a schematic diagram of an exemplary deployment of a cross-regional data synchronization system according to another embodiment of the present application;
图7是本申请一实施例的跨区域的数据同步系统的系统框图。FIG. 7 is a system block diagram of a cross-region data synchronization system according to an embodiment of the present application.
本发明的较佳实施方式Preferred embodiments of the present invention
为让本申请的上述目的、特征和优点能更明显易懂,以下结合附图对本申请的具体实施方式作详细说明。In order to make the above-mentioned objects, features and advantages of the present application more obvious and easy to understand, the specific implementation methods of the present application are described in detail below with reference to the accompanying drawings.
在下面的描述中阐述了很多具体细节以便于充分理解本申请,但是本申请还可以采用其它不同于在此描述的其它方式来实施,因此本申请不受下面公开的具体实施例的限制。In the following description, many specific details are set forth to facilitate a full understanding of the present application, but the present application may also be implemented in other ways different from those described herein, and therefore the present application is not limited to the specific embodiments disclosed below.
如本申请和权利要求书中所示,除非上下文明确提示例外情形,“一”、“一个”、“一种”和/或“该”等词并非特指单数,也可包括复数。一般说来,术语“包括”与“包含”仅提示包括已明确标识的步骤和元素,而这些步骤和元素不构成一个排它性的罗列,方法或者设备也可能包含其他的步骤或元素。As shown in this application and claims, unless the context clearly indicates an exception, the words "a", "an", "an" and/or "the" do not refer to the singular and may also include the plural. Generally speaking, the terms "comprises" and "includes" only indicate the inclusion of the steps and elements that have been clearly identified, and these steps and elements do not constitute an exclusive list. The method or device may also include other steps or elements.
本申请中使用了流程图用来说明根据本申请的实施例的系统所执行的操作。应当理解的是,前面或下面操作不一定按照顺序来精确地执行。相反,可以按照倒序或同时处理各种步骤。同时,或将其他操作添加到这些过程中,或从这些过程移除某一步或数步操作。Flowcharts are used in the present application to illustrate the operations performed by the system according to the embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed accurately in order. On the contrary, various steps may be processed in reverse order or simultaneously. At the same time, other operations may be added to these processes, or one or more operations may be removed from these processes.
本申请提出一种跨区域的数据同步方法,适于将各个跨地理同步区域的单元节点的数据同步至统一的中心节点,例如将北美区、欧洲区的数据同步至中国区,该数据同步方法使用户无需关心跨区域的数据同步中的时区差异、文件系统差异等问题。The present application proposes a cross-regional data synchronization method, which is suitable for synchronizing the data of unit nodes across various geographical synchronization areas to a unified central node, for example, synchronizing data from the North American region and the European region to the Chinese region. This data synchronization method allows users to avoid worrying about time zone differences, file system differences and other issues in cross-regional data synchronization.
图2是本申请一实施例的跨区域的数据同步方法的示例性流程图,参考图2所示,该实施例的跨区域的数据同步方法包括以下步骤:FIG2 is an exemplary flow chart of a cross-region data synchronization method according to an embodiment of the present application. Referring to FIG2 , the cross-region data synchronization method according to the embodiment includes the following steps:
步骤S110:根据数据同步语言的语法结构提交数据同步语句。Step S110: Submit a data synchronization statement according to the grammatical structure of the data synchronization language.
步骤S120:解析数据同步语句,获得同步计划,同步计划适于将一个或多个同步区域的单元节点的数据同步至中心节点。Step S120: Parse the data synchronization statement to obtain a synchronization plan, where the synchronization plan is suitable for synchronizing the data of the unit nodes of one or more synchronization areas to the central node.
步骤S130:执行同步计划,获得执行结果。Step S130: Execute the synchronization plan and obtain the execution result.
下面详细说明上述的步骤S110至步骤S130:The above steps S110 to S130 are described in detail below:
在步骤S110中,根据数据同步语言的语法结构提交数据同步语句。In step S110, a data synchronization statement is submitted according to the grammatical structure of the data synchronization language.
本申请创造了全新的数据同步语言(Data Synchronization Language,DSL),并相应规范了DSL的语法结构,数据同步语句用于配置数据同步过程中的各项所需参数。本申请相应创造了数据同步服务来解析数据同步语句及执行数据同步过程中的各项操作。需要说明的是,本申请将数据同步服务命名为Xensql,后文将用Xensql指代本申请的数据同步服务。DSL的语法结构及其语句示例将在后文详细展开。This application creates a new data synchronization language (Data Synchronization Language, DSL) and standardizes the grammatical structure of DSL accordingly. Data synchronization statements are used to configure various required parameters in the data synchronization process. This application correspondingly creates a data synchronization service to parse data synchronization statements and perform various operations in the data synchronization process. It should be noted that this application names the data synchronization service Xensql, and Xensql will be used to refer to the data synchronization service of this application in the following text. The grammatical structure of DSL and its statement examples will be expanded in detail later.
在一些实施例中,语法结构包括配置项,配置项包括:调度周期、调度时区、计算引擎、是否选定当地区域时区、源数据表分区、目标数据表分区、需要同步的分区列表、视图名称中的一项或任意项,其中,源数据表对应于单元节点,目标数据表对应于中心节点。In some embodiments, the grammatical structure includes configuration items, including: scheduling cycle, scheduling time zone, calculation engine, whether to select local area time zone, source data table partition, target data table partition, list of partitions to be synchronized, view name or any item, wherein the source data table corresponds to the unit node and the target data table corresponds to the central node.
示例性地,本申请的DSL的语法结构所包括的配置项信息如下表1所示:Exemplarily, the configuration item information included in the syntax structure of the DSL of the present application is shown in Table 1 below:
表1本申请的DSL的语法结构的配置项列表Table 1 List of configuration items of the grammatical structure of the DSL of this application
Figure PCTCN2023070424-appb-000001
Figure PCTCN2023070424-appb-000001
Figure PCTCN2023070424-appb-000002
Figure PCTCN2023070424-appb-000002
下面举例说明根据本申请的DSL的语法结构提交的数据同步语句:The following example illustrates a data synchronization statement submitted according to the grammatical structure of the DSL of this application:
示例1:离线同步。Example 1: Offline synchronization.
离线同步的源数据表:ods.a,目标数据表:ods.b,任务名称:test_a_to_b。The source data table for offline synchronization is ods.a, the target data table is ods.b, and the task name is test_a_to_b.
离线同步的数据同步语句:Data synchronization statement for offline synchronization:
create sync test_a_to_bcreate sync test_a_to_b
table ods.a(to_id,table_msn)table ods.a(to_id,table_msn)
toto
table ods.b(cid,msn)table ods.b(cid,msn)
with(with(
schedule.time='0 00 07?**',schedule.time = '0 00 07? **',
schedule.default.region=ch,schedule.default.region=ch,
compute.engine.sql=spark,compute.engine.sql=spark,
schedule.local.region=true,schedule.local.region=true,
source.partition.field=region,dt,source.partition.field=region,dt,
sink.partition.field=region,dt,sink.partition.field=region,dt,
sink.partition.list=us,jpsink.partition.list = us,jp
))
示例2:实时同步。Example 2: Real-time synchronization.
实时同步的源数据表:ods.a,目标数据表:ods.b,任务名称:test_real_a_to_b。The source data table for real-time synchronization is ods.a, the target data table is ods.b, and the task name is test_real_a_to_b.
实时同步的数据同步语句:Data synchronization statement for real-time synchronization:
create realsync test_real_a_to_bcreate realsync test_real_a_to_b
table ods.a(to_id,table_msn)table ods.a(to_id,table_msn)
toto
table ods.b(cid,msn)table ods.b(cid,msn)
with(with(
schedule.time='0 00 07?**',schedule.time = '0 00 07? **',
schedule.default.region=ch,schedule.default.region=ch,
compute.engine.sql=spark,compute.engine.sql=spark,
schedule.local.region=true,schedule.local.region=true,
source.partition.field=region,dt,source.partition.field=region,dt,
sink.partition.field=region,dt,sink.partition.field=region,dt,
sink.partition.list=us,jp,sink.partition.list = us,jp,
realsync.view.name=ods.b_viewrealsync.view.name=ods.b_view
))
图3是本申请另一实施例的跨区域的数据同步方法的示例性流程图。参考图3所示,该实施例中主要由Xensql数据同步服务来执行跨区域的数据同步方法,Xensql数据同步服务包括四个模块,第一个模块是Xensql同步客户端模块310,是提交DSL语言的客户端;第二个模块是Xensql同步引擎模块320,负责解析DSL语法,获取、调度同步计划;第三个模块是Xensql同步执行者模块330,负责执行Xensql同步引擎模块320下发的同步计划;第四个模块是Xensql同步网络服务模块340,负责读取同步计划和执行结果进行前端查询展示。下面将结合图3、前文的数据同步语句示例1(离线同步)、示例2(实时同步)详细介绍跨区域的数据同步方法的各个步骤。FIG3 is an exemplary flow chart of a cross-regional data synchronization method according to another embodiment of the present application. Referring to FIG3 , in this embodiment, the cross-regional data synchronization method is mainly executed by the Xensql data synchronization service, and the Xensql data synchronization service includes four modules. The first module is the Xensql synchronization client module 310, which is a client for submitting DSL language; the second module is the Xensql synchronization engine module 320, which is responsible for parsing DSL syntax, obtaining and scheduling synchronization plans; the third module is the Xensql synchronization executor module 330, which is responsible for executing the synchronization plan issued by the Xensql synchronization engine module 320; the fourth module is the Xensql synchronization network service module 340, which is responsible for reading the synchronization plan and execution results for front-end query display. The following will be combined with FIG3, the previous data synchronization statement example 1 (offline synchronization), and example 2 (real-time synchronization) to introduce in detail the various steps of the cross-regional data synchronization method.
在一些实施例中,参考图3所示,在根据数据同步语言的语法结构提交数 据同步语句的步骤之前,还包括:配置Alluxio数据编排服务的地址和目录,Alluxio数据编排服务用于挂载底层文件系统供上层计算框架和应用访问。该步骤对应图3的步骤S3103。In some embodiments, as shown in FIG3 , before submitting the data synchronization statement according to the grammatical structure of the data synchronization language, the method further includes: configuring the address and directory of the Alluxio data orchestration service, where the Alluxio data orchestration service is used to mount the underlying file system for access by the upper-layer computing framework and applications. This step corresponds to step S3103 in FIG3 .
示例性地,Alluxio数据编排服务通过统一命名空间,统一了数据访问方式,为底层存储系统和上层计算框架提供了统一的客户端应用程序编程接口(Application Program Interface,API)。Alluxio提供了多层缓存方案,加速了数据访问方式和避免重复数据的多次读取。利用Alluxio组件的统一命名空间,为多种存储系统提供统一访问方式,降低数据访问对接成本,并且提供多种文件访问接口,支持多种上层应用或计算框架,例如hive、spark。例如通过hive metastore可以建立基于Alluxio文件系统的schema映射表,在不需要改变已有数据仓库中的应用逻辑前提下,支持对于Alluxio的数据访问、计算的方式。Alluxio数据编排服务将不同地理区域的不同存储系统进行本地挂载,使用region字段作为不同区域的分区识别,并设置内存缓存加速文件的读取和避免同文件多次读取。For example, the Alluxio data orchestration service unifies the data access method through a unified namespace, and provides a unified client application programming interface (API) for the underlying storage system and the upper-level computing framework. Alluxio provides a multi-layer caching solution to accelerate data access and avoid multiple reads of duplicate data. By using the unified namespace of Alluxio components, a unified access method is provided for multiple storage systems, reducing the cost of data access docking, and providing multiple file access interfaces to support multiple upper-level applications or computing frameworks, such as hive and spark. For example, through the hive metastore, a schema mapping table based on the Alluxio file system can be established to support data access and computing methods for Alluxio without changing the application logic in the existing data warehouse. The Alluxio data orchestration service locally mounts different storage systems in different geographical regions, uses the region field as the partition identification of different regions, and sets up memory cache to accelerate file reading and avoid multiple reads of the same file.
本申请使用Alluxio作为数据编排服务,对不同地理区域的存储系统提供统一的访问方式。设计实现了离线同步和实时同步的方案,降低了后期开发和维护成本,新增单元节点的同时,只需将对应目录挂载到中心节点的Alluxio目录,无需额外的部署和运维成本。通过Alluxio统一的审计和日志,监控整体的数据同步方案,避免多个区域的监控方案和维护。This application uses Alluxio as a data orchestration service to provide a unified access method for storage systems in different geographical areas. The design implements offline synchronization and real-time synchronization solutions, reducing the cost of later development and maintenance. When adding a new unit node, you only need to mount the corresponding directory to the Alluxio directory of the central node, without additional deployment and operation and maintenance costs. Through Alluxio's unified audit and log, monitor the overall data synchronization solution and avoid monitoring solutions and maintenance in multiple regions.
在一些实施例中,在根据数据同步语言的语法结构提交数据同步语句的步骤之前,还包括:配置元数据服务的地址、一个或多个同步区域的底层文件系统的地址、计算引擎的执行目录、同步引擎的通讯地址、同步引擎的端口、同步执行者的通讯地址、同步执行者的端口、同步网络服务的地址、同步网络服务的端口中的一项或任意项。这些步骤部分对应于图3的步骤S3101、步骤S3102。In some embodiments, before submitting the data synchronization statement according to the grammatical structure of the data synchronization language, it also includes: configuring the address of the metadata service, the address of the underlying file system of one or more synchronization areas, the execution directory of the computing engine, the communication address of the synchronization engine, the port of the synchronization engine, the communication address of the synchronization executor, the port of the synchronization executor, the address of the synchronization network service, and the port of the synchronization network service. These steps partially correspond to step S3101 and step S3102 of Figure 3.
示例性地,元数据服务(Under Database,UDB)可以实现中心节点的数据查询服务。参考图3所示,Xensql数据同步服务执行跨区域的数据同步方法之前,需要部署Xensql同步客户端模块310、Xensql同步引擎模块320、Xensql同步执行者模块330、Xensql同步网络服务模块340;并配置UDB的地址、各 同步区域的底层文件系统(Underlying File System,UFS)的地址、计算引擎的执行目录、Xensql同步引擎模块320的远端程序呼叫(Remote Procedure Call,RPC)通讯地址和端口、Xensql同步执行者模块330的RPC通讯地址和端口、Xensql同步网络服务模块340的服务地址和端口。Exemplarily, the metadata service (Under Database, UDB) can implement the data query service of the central node. Referring to FIG3, before the Xensql data synchronization service executes the cross-region data synchronization method, it is necessary to deploy the Xensql synchronization client module 310, the Xensql synchronization engine module 320, the Xensql synchronization executor module 330, and the Xensql synchronization network service module 340; and configure the address of the UDB, the address of the underlying file system (Underlying File System, UFS) of each synchronization area, the execution directory of the computing engine, the remote procedure call (Remote Procedure Call, RPC) communication address and port of the Xensql synchronization engine module 320, the RPC communication address and port of the Xensql synchronization executor module 330, and the service address and port of the Xensql synchronization network service module 340.
Xensql同步客户端模块310启动后,客户端311读取各项配置信息,包括使用UDB的地址、各同步区域的UFS地址、计算引擎的执行目录、Xensql同步引擎模块320的RPC通讯地址等。After the Xensql synchronization client module 310 is started, the client 311 reads various configuration information, including the address of the UDB used, the UFS address of each synchronization area, the execution directory of the computing engine, the RPC communication address of the Xensql synchronization engine module 320, etc.
在步骤S120中,解析数据同步语句,获得同步计划,同步计划适于将一个或多个同步区域的单元节点的数据同步至中心节点。In step S120, the data synchronization statement is parsed to obtain a synchronization plan, where the synchronization plan is suitable for synchronizing the data of the unit nodes of one or more synchronization areas to the central node.
在一些实施例中,同步计划包括离线同步计划和/或实时同步计划,实时同步计划包括视图名称。In some embodiments, the synchronization plan includes an offline synchronization plan and/or a real-time synchronization plan, and the real-time synchronization plan includes a view name.
示例性地,参考图3和前文所述的数据同步语句示例1(离线同步)、示例2(实时同步)所示,通过Xensql同步客户端模块310的客户端311提交数据同步语句,Xensql同步引擎模块320解析数据同步语句,获得相应的同步计划、实时同步计划。后文将根据数据同步语句示例1(离线同步)、示例2(实时同步)介绍本申请的跨区域的数据同步方法的执行流程。离线同步的数据同步语句中“with”内的属性释义可参考前文所述的表1(本申请的DSL的语法结构的配置项列表)。Exemplarily, referring to Figure 3 and the data synchronization statement examples 1 (offline synchronization) and 2 (real-time synchronization) described above, the data synchronization statement is submitted through the client 311 of the Xensql synchronization client module 310, and the Xensql synchronization engine module 320 parses the data synchronization statement to obtain the corresponding synchronization plan and real-time synchronization plan. The following text will introduce the execution process of the cross-regional data synchronization method of the present application based on the data synchronization statement examples 1 (offline synchronization) and 2 (real-time synchronization). For the interpretation of the attributes in "with" in the offline synchronization data synchronization statement, please refer to Table 1 described above (the list of configuration items of the grammatical structure of the DSL of the present application).
在一些实施例中,解析数据同步语句的步骤包括:In some embodiments, the step of parsing the data synchronization statement includes:
步骤S1201:根据语法结构解析数据同步语句,获得解析结果;Step S1201: Parse the data synchronization statement according to the grammatical structure to obtain the parsing result;
步骤S1202:在解析结果为解析成功后获得数据同步语法树,数据同步语法树包括配置项的信息、源数据表的表名、目标数据表的表名;Step S1202: after the parsing result is that the parsing is successful, a data synchronization syntax tree is obtained, where the data synchronization syntax tree includes information of the configuration item, the table name of the source data table, and the table name of the target data table;
步骤S1203:根据元数据服务的地址连接元数据服务,根据目标数据表的表名判断元数据服务中是否存在目标数据表;Step S1203: connecting to the metadata service according to the address of the metadata service, and determining whether the target data table exists in the metadata service according to the table name of the target data table;
步骤S1204:若目标数据表存在,则创建Alluxio数据编排服务对应于源数据表的映射目录;Step S1204: If the target data table exists, create a mapping directory of the Alluxio data orchestration service corresponding to the source data table;
步骤S1205:将一个或多个同步区域的底层文件系统的地址中的源表目录挂载至映射目录;Step S1205: Mount the source table directory in the address of the underlying file system of one or more synchronization areas to the mapping directory;
步骤S1206:创建映射目录源表,映射目录源表的位置为映射目录;Step S1206: Create a mapping directory source table, the location of the mapping directory source table is the mapping directory;
步骤S1207:判断是否进行实时同步,若不进行实时同步则生成离线同步计划;若进行实时同步则根据视图名称创建视图,以及生成实时同步计划。Step S1207: Determine whether to perform real-time synchronization. If not, generate an offline synchronization plan. If real-time synchronization is performed, create a view according to the view name, and generate a real-time synchronization plan.
示例性地,参考图3和前文所述的数据同步语句示例1(离线同步)所示来说明上述的步骤S1201~S1207。Exemplarily, the above steps S1201 to S1207 are explained with reference to FIG. 3 and the data synchronization statement example 1 (offline synchronization) described above.
在前文的步骤S1201,Xensql同步引擎模块320使用RPC通信接收到Xensql同步客户端模块310的客户端311提交的数据同步语句后,在步骤S3201根据DSL的语法结构解析数据同步语句,获得解析结果,若解析不成功,则在步骤S3203抛出异常并向客户端311返回失败原因;若解析成功,在前文的步骤S1202,Xensql同步引擎模块320获得数据同步语法树,数据同步语法树包括:源数据表的表名:ods.a;源数据表的分区字段:region,dt;源数据表的数据字段:to_id,table_msn;目标数据表的表名:ods.b;目标数据表的分区字段:region,dt;目标数据表的数据字段:cid,msn;同步任务名称:test_a_to_b;调度周期:0 00 07?**(每天7点);调度时区:ch(中国);计算引擎:spark;是否选定当地区域时区调度:true;需要同步的分区列表(同步区域):us,jp(美国,日本)。In step S1201 of the previous text, after the Xensql synchronization engine module 320 receives the data synchronization statement submitted by the client 311 of the Xensql synchronization client module 310 using RPC communication, it parses the data synchronization statement according to the grammatical structure of the DSL in step S3201 to obtain the parsing result. If the parsing is unsuccessful, an exception is thrown in step S3203 and the failure reason is returned to the client 311; if the parsing is successful, in step S1202 of the previous text, the Xensql synchronization engine module 320 obtains the data synchronization syntax tree, and the data synchronization syntax tree includes: the table name of the source data table: ods.a; the partition fields of the source data table: region, dt; the data fields of the source data table: to_id, table_msn; the table name of the target data table: ods.b; the partition fields of the target data table: region, dt; the data fields of the target data table: cid, msn; the synchronization task name: test_a_to_b; the scheduling period: 0 00 07? **(7 o'clock every day); Scheduling time zone: ch (China); Computing engine: spark; Whether to select local area time zone scheduling: true; List of partitions to be synchronized (synchronization area): us, jp (United States, Japan).
在前文的步骤S1203,在图3所示的步骤S3202根据元数据服务的地址连接元数据服务UDB,在步骤S3204根据目标数据表的表名ods.b判断元数据服务UDB中是否存在目标数据表ods.b,如果不存在,则在步骤S3203抛出异常给向客户端311;在前文的步骤S1204,若目标数据表ods.b存在,则在步骤S3205创建Alluxio数据编排服务对应于源数据表ods.a的映射目录:/alluxio/warehouse/ods.db/a_sync,其中,/alluxio/warehouse是对Alluxio数据编排服务的配置,该部分可以自由定义,而ods.db是ods的库目录,a_sync是源数据表a加上后缀_sync,这部分是Xensql同步引擎模块320定义的目录写法。In step S1203 above, in step S3202 shown in FIG3 , the metadata service UDB is connected according to the address of the metadata service. In step S3204, it is determined whether the target data table ods.b exists in the metadata service UDB according to the table name ods.b of the target data table. If not, an exception is thrown to the client 311 in step S3203. In step S1204 above, if the target data table ods.b exists, a mapping directory of the Alluxio data orchestration service corresponding to the source data table ods.a is created in step S3205: /alluxio/warehouse/ods.db/a_sync, where /alluxio/warehouse is the configuration of the Alluxio data orchestration service, which can be freely defined, and ods.db is the library directory of ods, and a_sync is the source data table a plus the suffix _sync, which is the directory writing method defined by the Xensql synchronization engine module 320.
在前文的步骤S1205,在图3所示的步骤S3206将一个或多个同步区域的底层文件系统的地址中的源表目录挂载至映射目录,具体地,根据解析出的同步区域us、jp,并根据配置好的各同步区域的UFS地址,获取同步区域us和jp的UFS地址,将同步区域us和jp的UFS地址中的/ods.db/a挂载到Alluxio的映射目录/alluxio/warehouse/ods.db/a_sync下。In the previous step S1205, in step S3206 shown in Figure 3, the source table directory in the address of the underlying file system of one or more synchronization areas is mounted to the mapping directory. Specifically, according to the parsed synchronization areas us and jp, and according to the configured UFS addresses of each synchronization area, the UFS addresses of the synchronization areas us and jp are obtained, and the /ods.db/a in the UFS addresses of the synchronization areas us and jp is mounted to the Alluxio mapping directory /alluxio/warehouse/ods.db/a_sync.
在前文的步骤S1206,在图3所示的步骤S3207创建Alluxio的映射目录 源表,具体地,根据配置好的UDB的地址,连接UDB,并创建映射目录源表:ods.a_sync,设置映射目录源表ods.a_sync的位置location为Alluxio的映射目录/alluxio/warehouse/ods.db/a_sync;In step S1206 above, in step S3207 shown in Figure 3, Alluxio's mapping directory source table is created. Specifically, according to the configured UDB address, the UDB is connected, and a mapping directory source table is created: ods.a_sync. The location of the mapping directory source table ods.a_sync is set to Alluxio's mapping directory /alluxio/warehouse/ods.db/a_sync.
在前文的步骤S1207,在图3所示的步骤S3208判断是否进行实时同步,若不进行实时同步,则在步骤S3210生成离线同步计划;若进行实时同步,则在步骤S3209根据视图名称创建视图,以及在步骤S3210生成实时同步计划,具体地,通过DSL数据同步语句中的realsync.view.name属性连接UDB以创建视图。In step S1207 above, in step S3208 shown in FIG3 , it is determined whether to perform real-time synchronization. If not, an offline synchronization plan is generated in step S3210. If real-time synchronization is performed, a view is created according to the view name in step S3209, and a real-time synchronization plan is generated in step S3210. Specifically, the UDB is connected through the realsync.view.name attribute in the DSL data synchronization statement to create the view.
在一些实施例中,解析数据同步语句的步骤还包括:In some embodiments, the step of parsing the data synchronization statement further includes:
步骤S1208:根据配置项的信息、Alluxio数据编排服务的地址和元数据服务的地址,生成Alluxio数据编排服务的文件元数据同步计划和元数据服务的分区元数据同步计划;Step S1208: Generate a file metadata synchronization plan for the Alluxio data orchestration service and a partition metadata synchronization plan for the metadata service based on the configuration item information, the address of the Alluxio data orchestration service, and the address of the metadata service.
步骤S1209:计算引擎读取映射目录源表中的数据字段,并将数据字段写入目标数据表;Step S1209: the calculation engine reads the data fields in the mapping directory source table and writes the data fields into the target data table;
步骤S1210:将离线同步计划、实时同步计划、Alluxio数据编排服务的文件元数据同步计划、元数据服务的分区元数据同步计划中的一个或多个存入中心节点的数据库。Step S1210: Store one or more of the offline synchronization plan, the real-time synchronization plan, the file metadata synchronization plan of the Alluxio data orchestration service, and the partition metadata synchronization plan of the metadata service into the database of the central node.
示例性地,参考图3和前文所述的数据同步语句示例1(离线同步)所示来说明上述的步骤S1208~S1210。Exemplarily, the above steps S1208 to S1210 are explained with reference to FIG. 3 and the data synchronization statement example 1 (offline synchronization) described above.
在前文的步骤S1208,根据解析出的调度周期、调度时区、分区字段、同步区域、计算引擎、同步数据字段配置,以及配置好的元数据服务UDB的地址和Alluxio数据编排服务的地址,生成Alluxio数据编排服务的文件元数据同步计划、元数据服务UDB的分区元数据同步计划,并在前文的步骤S1209,根据解析的计算引擎配置spark,生成spark sql读取Alluxio的映射目录源表ods.a_sync中的数据字段to_id,table_msn,并将数据字段写入目标数据表ods.b中的数据字段cid,msn。In step S1208 above, based on the parsed scheduling period, scheduling time zone, partition field, synchronization area, computing engine, synchronization data field configuration, as well as the configured metadata service UDB address and Alluxio data orchestration service address, the file metadata synchronization plan of the Alluxio data orchestration service and the partition metadata synchronization plan of the metadata service UDB are generated. In step S1209 above, based on the parsed computing engine configuration, spark is configured to generate spark sql to read the data fields to_id and table_msn in the Alluxio mapping directory source table ods.a_sync, and write the data fields into the data fields cid and msn in the target data table ods.b.
在前文的步骤S1210,将离线同步计划、实时同步计划、Alluxio数据编排服务的文件元数据同步计划、元数据服务的分区元数据同步计划中的一个或多个存入中心节点的数据库321,其中,中心节点的数据库321为mysql数据库。 在步骤S3211将DSL执行成功的消息返回客户端311。In step S1210 above, one or more of the offline synchronization plan, the real-time synchronization plan, the file metadata synchronization plan of the Alluxio data orchestration service, and the partition metadata synchronization plan of the metadata service are stored in the database 321 of the central node, where the database 321 of the central node is a MySQL database. In step S3211, a message indicating that the DSL execution is successful is returned to the client 311.
在一些实施例中,在解析数据同步语句,获得同步计划的步骤之后,还包括:In some embodiments, after the step of parsing the data synchronization statement and obtaining the synchronization plan, the method further includes:
步骤S1211,定期读取同步计划,判断同步计划是否符合运行条件,运行条件包括计划执行时区和计划执行时间;Step S1211, periodically reading the synchronization plan to determine whether the synchronization plan meets the operating conditions, the operating conditions including the planned execution time zone and the planned execution time;
步骤S1212,若符合运行条件则执行同步计划。Step S1212: If the operating conditions are met, execute the synchronization plan.
示例性地,参考图3所示,Xensql同步引擎模块320还具备同步计划的调度功能。在前文的步骤S1211,Xensql同步引擎模块320的调度引擎322根据在中心节点的数据库321写入的同步计划,在步骤S3212调度引擎322每隔30秒定期读取同步计划,在前文的步骤S1212,在图3所示的步骤S3213判断同步计划是否符合运行条件,其中,运行条件包括计划执行时区和计划执行时间,若不符合运行条件,则在步骤S3215结束任务;若符合运行条件,则在步骤S3214发送同步计划,将同步计划通过RPC通信的方式发给Xensql同步执行者模块330。Exemplarily, as shown in reference to FIG3 , the Xensql synchronization engine module 320 also has a synchronization plan scheduling function. In step S1211 above, the scheduling engine 322 of the Xensql synchronization engine module 320 reads the synchronization plan regularly every 30 seconds according to the synchronization plan written in the database 321 of the central node in step S3212. In step S1212 above, in step S3213 shown in FIG3 , it is determined whether the synchronization plan meets the operating conditions, wherein the operating conditions include the planned execution time zone and the planned execution time. If the operating conditions are not met, the task is terminated in step S3215; if the operating conditions are met, the synchronization plan is sent in step S3214, and the synchronization plan is sent to the Xensql synchronization executor module 330 through RPC communication.
在步骤S130中,执行同步计划,获得执行结果。In step S130, the synchronization plan is executed to obtain an execution result.
在一些实施例中,在执行同步计划,获得执行结果的步骤之后,还包括:判断执行结果是否成功,若执行结果为成功,则将成功的执行结果写入日志;若执行结果为失败,则进行报警并将失败的执行结果写入日志。In some embodiments, after executing the synchronization plan and obtaining the execution result, it also includes: determining whether the execution result is successful. If the execution result is successful, writing the successful execution result to the log; if the execution result is failed, an alarm is issued and the failed execution result is written to the log.
示例性地,参考图3所示,Xensql同步执行者模块330执行同步计划并获得执行结果,具体地,在步骤S3301执行同步计划,在步骤S3302判断执行结果是否成功,若执行不成功,则在步骤S3304报警并将失败的执行结果写入日志;若执行成功,则在步骤S3303将成功的执行结果写入日志。Exemplarily, referring to FIG3 , the Xensql synchronization executor module 330 executes the synchronization plan and obtains the execution result. Specifically, the synchronization plan is executed in step S3301, and whether the execution result is successful is determined in step S3302. If the execution is unsuccessful, an alarm is issued in step S3304 and the failed execution result is written to the log; if the execution is successful, the successful execution result is written to the log in step S3303.
在一些实施例中,在执行同步计划,获得执行结果的步骤之后,还包括显示同步计划和执行结果。In some embodiments, after the steps of executing the synchronization plan and obtaining the execution results, the step further includes displaying the synchronization plan and the execution results.
在一些实施例中,在显示同步计划和执行结果的步骤中,同步计划包括存储在中心节点的数据库中的任意同步计划。In some embodiments, in the step of displaying the synchronization plan and the execution result, the synchronization plan includes any synchronization plan stored in the database of the central node.
示例性地,参考图3所示,Xensql同步网络服务模块340中,在步骤S3401用户登录;在步骤S3402查询同步计划,并从中心节点的数据库321获得相应的同步计划;在步骤S3403查询结果,获得中心节点的数据库321中的同步计 划和执行结果的日志,并在前端展示。用户通过Xensql同步网络服务模块340可以查看存储在中心节点的数据库321中的任意同步计划、历史下架的同步计划,执行失败、执行成功的同步计划和正在执行的同步计划。Exemplarily, referring to FIG. 3, in the Xensql synchronization network service module 340, in step S3401, the user logs in; in step S3402, the synchronization plan is queried and the corresponding synchronization plan is obtained from the database 321 of the central node; in step S3403, the query result is obtained, the synchronization plan in the database 321 of the central node and the log of the execution result are obtained and displayed on the front end. Through the Xensql synchronization network service module 340, the user can view any synchronization plan stored in the database 321 of the central node, the synchronization plan that has been removed from the shelf in the past, the synchronization plan that failed to execute, the synchronization plan that was successfully executed, and the synchronization plan that is being executed.
在全球化数据同步的场景下,本申请的跨区域的数据同步方法、Xensql数据同步服务使用户无需关心跨地理区域的数据同步中的时区差异、文件系统差异等问题。用户只需编写符合数据同步语言DSL的语法结构的数据同步语句,Xensql数据同步服务会根据数据同步语句生成相应的跨区域的离线同步计划、实时同步计划。本申请使用Alluxio数据编排服务,对不同地理区域的存储系统提供统一的访问方式,结合元数据服务UDB,将各地理区域的数据汇总计算存储在中心节点,可以实现中心节点的数据查询服务,而原始数据依然存储在当地,节省每次查询的时间和费用、并进行数据资产的沉淀、降低了后期开发和维护成本。In the scenario of global data synchronization, the cross-regional data synchronization method and Xensql data synchronization service of this application enable users to not worry about time zone differences, file system differences and other issues in data synchronization across geographical regions. Users only need to write data synchronization statements that conform to the grammatical structure of the data synchronization language DSL, and the Xensql data synchronization service will generate corresponding cross-regional offline synchronization plans and real-time synchronization plans based on the data synchronization statements. This application uses the Alluxio data orchestration service to provide a unified access method for storage systems in different geographical regions. Combined with the metadata service UDB, the data of each geographical region is aggregated and calculated and stored in the central node, which can realize the data query service of the central node, while the original data is still stored locally, saving time and cost for each query, and precipitating data assets, reducing the cost of later development and maintenance.
图4是本申请一实施例的Xensql数据同步服务的示例性架构图。参考图4所示,Xensql数据同步服务的核心包括四个部分,分别是:Xensql同步客户端410、Xensql同步引擎430、Xensql同步执行者450、Xensql同步网络服务460。FIG4 is an exemplary architecture diagram of a Xensql data synchronization service according to an embodiment of the present application. Referring to FIG4 , the core of the Xensql data synchronization service includes four parts, namely: a Xensql synchronization client 410, a Xensql synchronization engine 430, a Xensql synchronization executor 450, and a Xensql synchronization network service 460.
参考图4所示,用户在Xensql同步客户端410提交数据同步语句420,Xensql同步引擎430接收数据同步语句420后,结合元数据服务440的相关配置信息解析数据同步语句420的语法;在步骤S431判断语法是否通过,若语法未通过,则在步骤S433抛出异常,并在步骤S434将相关信息写入日志;若语法通过,则在步骤S432解析数据同步语句,写入配置存储,具体地,解析后获得离线同步计划或实时同步计划、用户需要同步的区域及元数据、需要同步的数据表等信息,将相应的同步计划写入配置数据库mysql进行存储。Xensql同步执行者450根据相应的同步计划在步骤S435执行同步计划,以及在步骤S436将相应的执行结果写入日志。Xensql同步网络服务460读取配置数据库mysql和日志中的相关数据,进行前端展示。As shown in FIG4 , the user submits a data synchronization statement 420 in the Xensql synchronization client 410. After receiving the data synchronization statement 420, the Xensql synchronization engine 430 parses the syntax of the data synchronization statement 420 in combination with the relevant configuration information of the metadata service 440; in step S431, it is determined whether the syntax is passed. If the syntax is not passed, an exception is thrown in step S433, and the relevant information is written to the log in step S434; if the syntax is passed, the data synchronization statement is parsed in step S432 and written to the configuration storage. Specifically, after parsing, the offline synchronization plan or the real-time synchronization plan, the area and metadata that the user needs to synchronize, the data table that needs to be synchronized, and other information are obtained, and the corresponding synchronization plan is written into the configuration database mysql for storage. The Xensql synchronization executor 450 executes the synchronization plan according to the corresponding synchronization plan in step S435, and writes the corresponding execution result to the log in step S436. The Xensql synchronization network service 460 reads the configuration database mysql and the relevant data in the log for front-end display.
图5是本申请一实施例的跨区域的数据同步系统的示例性架构图。参考图5所示,该跨区域的数据同步系统500的架构包括:元数据服务510、底层文件系统520、Alluxio数据编排服务530、计算引擎540、Xensql数据同步服务550。本申请基于该架构开发了集成Alluxio软件开发工具包(Software  Development Kit,SDK)操作的数据同步服务Xensql。本申请集成了Xensql和Alluxio,以实现使用DSL来实时同步、离线同步跨地理区域的数据。FIG5 is an exemplary architecture diagram of a cross-regional data synchronization system according to an embodiment of the present application. Referring to FIG5 , the architecture of the cross-regional data synchronization system 500 includes: a metadata service 510, an underlying file system 520, an Alluxio data orchestration service 530, a computing engine 540, and a Xensql data synchronization service 550. Based on the architecture, the present application has developed a data synchronization service Xensql that integrates the operation of the Alluxio Software Development Kit (SDK). The present application integrates Xensql and Alluxio to achieve real-time synchronization and offline synchronization of data across geographical regions using DSL.
图6是本申请另一实施例的跨区域的数据同步系统的示例性部署示意图。参考图6所示,需要将单元节点:北美区630、欧洲区620的数据同步至中心节点:中国区610。示例性地,在中心节点中国区610部署并启动Alluxio数据编排服务集群;将中国区610、欧洲区620、北美区630需要同步的数据目录挂载至Alluxio数据编排服务640;元数据服务650根据相关的配置信息创建同步计划、添加分区;客户端660读取相应的同步计划以及Alluxio的数据,并将相关信息写入中国区610的同步计划的映射目录源表;调度服务670定期读取同步计划以调度同步计划的执行。Figure 6 is an exemplary deployment diagram of a cross-regional data synchronization system of another embodiment of the present application. Referring to Figure 6, it is necessary to synchronize the data of the unit nodes: North America 630 and Europe 620 to the central node: China 610. Exemplarily, the Alluxio data orchestration service cluster is deployed and started in the central node China 610; the data directories that need to be synchronized in China 610, Europe 620, and North America 630 are mounted to the Alluxio data orchestration service 640; the metadata service 650 creates a synchronization plan and adds partitions according to the relevant configuration information; the client 660 reads the corresponding synchronization plan and Alluxio data, and writes the relevant information into the mapping directory source table of the synchronization plan of China 610; the scheduling service 670 periodically reads the synchronization plan to schedule the execution of the synchronization plan.
本申请集成了Alluxio和Xensql的跨地理区域同步方案,同时支持不同时区数据的同步计划,用户只需将相应区域的数据挂载至目录中,而无需关心不同区域的时区差异问题。示例性地,Xensql数据同步服务设置为北京时间区域,在中心节点的相关表中挂载中国和美国两个区域,Xensql相应生成两个同步计划:1、北京时间每天01:00同步中国区数据计划;2、北京时间每天13:00同步北美区数据计划。相关同步语句如下所示:This application integrates the cross-geographic synchronization solutions of Alluxio and Xensql, and supports synchronization plans for data in different time zones. Users only need to mount the data of the corresponding area into the directory without having to worry about the time zone differences between different regions. For example, the Xensql data synchronization service is set to the Beijing time zone, and the two regions of China and the United States are mounted in the relevant table of the central node. Xensql generates two synchronization plans accordingly: 1. Synchronize the data plan for the Chinese region at 01:00 Beijing time every day; 2. Synchronize the data plan for the North American region at 13:00 Beijing time every day. The relevant synchronization statements are as follows:
#时区差异:#Time zone differences:
北京时间:07.21 00:00对应美国时间:07.21 00:00-12:00=07.20.12:00Beijing time: 07.21 00:00 corresponds to US time: 07.21 00:00-12:00 = 07.20.12:00
#Alluxio mount:#Alluxio mount:
Figure PCTCN2023070424-appb-000003
Figure PCTCN2023070424-appb-000003
本申请还包括一种跨区域的数据同步系统,包括:同步客户端模块,用于根据数据同步语言的语法结构提交数据同步语句;同步引擎模块,用于解析数 据同步语句,获得同步计划,同步计划适于将一个或多个同步区域的单元节点的数据同步至中心节点;同步执行者模块,用于执行同步计划,获得执行结果;存储器和处理器。其中,该存储器用于存储可由处理器执行的指令;处理器用于执行该指令以实现前文所述的跨区域的数据同步方法。The present application also includes a cross-region data synchronization system, including: a synchronization client module, which is used to submit data synchronization statements according to the grammatical structure of the data synchronization language; a synchronization engine module, which is used to parse the data synchronization statements and obtain a synchronization plan, which is suitable for synchronizing the data of the unit nodes of one or more synchronization areas to the central node; a synchronization executor module, which is used to execute the synchronization plan and obtain the execution result; a memory and a processor. Among them, the memory is used to store instructions that can be executed by the processor; the processor is used to execute the instructions to implement the cross-region data synchronization method described above.
在一些实施例中,数据同步系统还包括同步网络服务模块,用于前端显示同步计划和执行结果。In some embodiments, the data synchronization system also includes a synchronization network service module for displaying the synchronization plan and execution results on the front end.
本申请前文所述的内容均可用于解释该跨区域的数据同步系统,参考图3所示,其中,Xensql同步客户端模块310为同步客户端模块的一具体实施方式,Xensql同步引擎模块320为同步引擎模块的一具体实施方式,Xensql同步执行者模块330为同步执行者模块的一具体实施方式,Xensql同步网络服务模块340为同步网络服务模块的一具体实施方式,在此不再赘述相关内容。The contents described in the foregoing of this application can all be used to explain the cross-regional data synchronization system, as shown in reference Figure 3, wherein the Xensql synchronization client module 310 is a specific implementation of the synchronization client module, the Xensql synchronization engine module 320 is a specific implementation of the synchronization engine module, the Xensql synchronization executor module 330 is a specific implementation of the synchronization executor module, and the Xensql synchronization network service module 340 is a specific implementation of the synchronization network service module, and the relevant contents will not be repeated here.
图7是本申请一实施例的跨区域的数据同步系统的系统框图。参考图7所示,该跨区域的数据同步系统700可包括内部通信总线701、处理器702、只读存储器(ROM)703、随机存取存储器(RAM)704以及通信端口705。当应用在个人计算机上时,该跨区域的数据同步系统700还可以包括硬盘706。内部通信总线701可以实现该跨区域的数据同步系统700组件间的数据通信。处理器702可以进行判断和发出提示。在一些实施例中,处理器702可以由一个或多个处理器组成。通信端口705可以实现该跨区域的数据同步系统700与外部的数据通信。在一些实施例中,该跨区域的数据同步系统700可以通过通信端口705从网络发送和接受信息及数据。该跨区域的数据同步系统700还可以包括不同形式的程序储存单元以及数据储存单元,例如硬盘706,只读存储器(ROM)703和随机存取存储器(RAM)704,能够存储计算机处理和/或通信使用的各种数据文件,以及处理器702所执行的可能的程序指令。处理器执行这些指令以实现方法的主要部分。处理器处理的结果通过通信端口传给用户设备,在用户界面上显示。FIG7 is a system block diagram of a cross-regional data synchronization system according to an embodiment of the present application. Referring to FIG7 , the cross-regional data synchronization system 700 may include an internal communication bus 701, a processor 702, a read-only memory (ROM) 703, a random access memory (RAM) 704, and a communication port 705. When applied on a personal computer, the cross-regional data synchronization system 700 may also include a hard disk 706. The internal communication bus 701 may enable data communication between components of the cross-regional data synchronization system 700. The processor 702 may make judgments and issue prompts. In some embodiments, the processor 702 may be composed of one or more processors. The communication port 705 may enable data communication between the cross-regional data synchronization system 700 and the outside. In some embodiments, the cross-regional data synchronization system 700 may send and receive information and data from the network through the communication port 705. The cross-regional data synchronization system 700 may also include different forms of program storage units and data storage units, such as a hard disk 706, a read-only memory (ROM) 703 and a random access memory (RAM) 704, which can store various data files used for computer processing and/or communication, and possible program instructions executed by the processor 702. The processor executes these instructions to implement the main part of the method. The results of the processor processing are transmitted to the user device through the communication port and displayed on the user interface.
上述的跨区域的数据同步方法可以实施为计算机程序,保存在硬盘706中,并可加载到处理器702中执行,以实施本申请的跨区域的数据同步方法。The above-mentioned cross-region data synchronization method can be implemented as a computer program, stored in the hard disk 706, and loaded into the processor 702 for execution to implement the cross-region data synchronization method of the present application.
本申请还包括一种存储有计算机程序代码的计算机可读介质,该计算机程序代码在由处理器执行时实现前文所述的跨区域的数据同步方法。The present application also includes a computer-readable medium storing a computer program code, which, when executed by a processor, implements the cross-region data synchronization method described above.
跨区域的数据同步方法实施为计算机程序时,也可以存储在计算机可读存储介质中作为制品。例如,计算机可读存储介质可以包括但不限于磁存储设备(例如,硬盘、软盘、磁条)、光盘(例如,压缩盘(CD)、数字多功能盘(DVD))、智能卡和闪存设备(例如,电可擦除可编程只读存储器(EPROM)、卡、棒、键驱动)。此外,本文描述的各种存储介质能代表用于存储信息的一个或多个设备和/或其它机器可读介质。术语“机器可读介质”可以包括但不限于能存储、包含和/或承载代码和/或指令和/或数据的无线信道和各种其它介质(和/或存储介质)。When the cross-regional data synchronization method is implemented as a computer program, it can also be stored in a computer-readable storage medium as a product. For example, a computer-readable storage medium may include, but is not limited to, magnetic storage devices (e.g., hard disks, floppy disks, magnetic strips), optical disks (e.g., compact disks (CDs), digital versatile disks (DVDs)), smart cards, and flash memory devices (e.g., electrically erasable programmable read-only memories (EPROMs), cards, sticks, key drives). In addition, the various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" may include, but is not limited to, wireless channels and various other media (and/or storage media) that can store, contain and/or carry code and/or instructions and/or data.
应该理解,上文所描述的实施例仅是示意。本文描述的实施例可在硬件、软件、固件、中间件、微码或者其任意组合中实现。对于硬件实现,处理器可以在一个或者多个特定用途集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器和/或设计为执行本文所述功能的其它电子单元或者其结合内实现。It should be understood that the embodiments described above are only illustrative. The embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For hardware implementation, the processor may be implemented in one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, and/or other electronic units designed to perform the functions described herein, or combinations thereof.
本申请的一些方面可以完全由硬件执行、可以完全由软件(包括固件、常驻软件、微码等)执行、也可以由硬件和软件组合执行。以上硬件或软件均可被称为“数据块”、“模块”、“引擎”、“单元”、“组件”或“系统”。处理器可以是一个或多个专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理器件(DAPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器或者其组合。此外,本申请的各方面可能表现为位于一个或多个计算机可读介质中的计算机产品,该产品包括计算机可读程序编码。例如,计算机可读介质可包括,但不限于,磁性存储设备(例如,硬盘、软盘、磁带……)、光盘(例如,压缩盘CD、数字多功能盘DVD……)、智能卡以及闪存设备(例如,卡、棒、键驱动器……)。Some aspects of the present application may be performed entirely by hardware, entirely by software (including firmware, resident software, microcode, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data blocks", "modules", "engines", "units", "components" or "systems". The processor may be one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DAPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, or combinations thereof. In addition, various aspects of the present application may be expressed as computer products located in one or more computer-readable media, which include computer-readable program codes. For example, computer-readable media may include, but are not limited to, magnetic storage devices (e.g., hard disks, floppy disks, tapes ...), optical disks (e.g., compact disks CDs, digital versatile disks DVDs ...), smart cards, and flash memory devices (e.g., cards, sticks, key drives ...).
计算机可读介质可能包含一个内含有计算机程序编码的传播数据信号,例如在基带上或作为载波的一部分。该传播信号可能有多种表现形式,包括电磁形式、光形式等等、或合适的组合形式。计算机可读介质可以是除计算机可读存储介质之外的任何计算机可读介质,该介质可以通过连接至一个指令执行系统、装置或设备以实现通讯、传播或传输供使用的程序。位于计算机可读介质上的程序编码可以通过任何合适的介质进行传播,包括无线电、电缆、光纤电 缆、射频信号、或类似介质、或任何上述介质的组合。A computer-readable medium may include a propagated data signal containing computer program code, such as in baseband or as part of a carrier wave. The propagated signal may be in a variety of forms, including electromagnetic, optical, etc., or a suitable combination. A computer-readable medium may be any computer-readable medium other than a computer-readable storage medium, which may be connected to an instruction execution system, device or apparatus to communicate, propagate or transmit a program for use. The program code on the computer-readable medium may be propagated via any suitable medium, including radio, cable, fiber optic cable, radio frequency signal, or similar medium, or any combination of the above mediums.
上文已对基本概念做了描述,显然,对于本领域技术人员来说,上述申请披露仅仅作为示例,而并不构成对本申请的限定。虽然此处并没有明确说明,本领域技术人员可能会对本申请进行各种修改、改进和修正。该类修改、改进和修正在本申请中被建议,所以该类修改、改进、修正仍属于本申请示范实施例的精神和范围。The basic concepts have been described above. Obviously, for those skilled in the art, the above application disclosure is only an example and does not constitute a limitation of the present application. Although not explicitly stated herein, those skilled in the art may make various modifications, improvements and corrections to the present application. Such modifications, improvements and corrections are suggested in the present application, so such modifications, improvements and corrections still belong to the spirit and scope of the exemplary embodiments of the present application.
同时,本申请使用了特定词语来描述本申请的实施例。如“一个实施例”、“一实施例”、和/或“一些实施例”意指与本申请至少一个实施例相关的某一特征、结构或特点。因此,应强调并注意的是,本说明书中在不同位置两次或多次提及的“一实施例”或“一个实施例”或“一替代性实施例”并不一定是指同一实施例。此外,本申请的一个或多个实施例中的某些特征、结构或特点可以进行适当的组合。At the same time, the present application uses specific words to describe the embodiments of the present application. For example, "one embodiment", "an embodiment", and/or "some embodiments" refer to a certain feature, structure or characteristic related to at least one embodiment of the present application. Therefore, it should be emphasized and noted that "one embodiment" or "an embodiment" or "an alternative embodiment" mentioned twice or more in different positions in this specification does not necessarily refer to the same embodiment. In addition, some features, structures or characteristics in one or more embodiments of the present application can be appropriately combined.
一些实施例中使用了描述成分、属性数量的数字,应当理解的是,此类用于实施例描述的数字,在一些示例中使用了修饰词“大约”、“近似”或“大体上”来修饰。除非另外说明,“大约”、“近似”或“大体上”表明所述数字允许有±20%的变化。相应地,在一些实施例中,说明书和权利要求中使用的数值参数均为近似值,该近似值根据个别实施例所需特点可以发生改变。在一些实施例中,数值参数应考虑规定的有效数位并采用一般位数保留的方法。尽管本申请一些实施例中用于确认其范围广度的数值域和参数为近似值,在具体实施例中,此类数值的设定在可行范围内尽可能精确。In some embodiments, numbers describing the number of components and attributes are used. It should be understood that such numbers used in the description of the embodiments are modified by the modifiers "about", "approximately" or "substantially" in some examples. Unless otherwise specified, "about", "approximately" or "substantially" indicate that the numbers are allowed to vary by ±20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximate values, which may change according to the required features of individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and adopt the general method of retaining digits. Although the numerical domains and parameters used to confirm the breadth of their range in some embodiments of the present application are approximate values, in specific embodiments, the setting of such numerical values is as accurate as possible within the feasible range.

Claims (14)

  1. 一种跨区域的数据同步方法,其特征在于,包括:A cross-region data synchronization method, characterized by comprising:
    根据数据同步语言的语法结构提交数据同步语句;Submit data synchronization statements according to the grammatical structure of the data synchronization language;
    解析所述数据同步语句,获得同步计划,所述同步计划适于将一个或多个同步区域的单元节点的数据同步至中心节点;Parsing the data synchronization statement to obtain a synchronization plan, wherein the synchronization plan is suitable for synchronizing data of unit nodes in one or more synchronization areas to a central node;
    执行所述同步计划,获得执行结果。Execute the synchronization plan and obtain an execution result.
  2. 如权利要求1所述的数据同步方法,其特征在于,在根据数据同步语言的语法结构提交数据同步语句的步骤之前,还包括:配置Alluxio数据编排服务的地址和目录,所述Alluxio数据编排服务用于挂载底层文件系统供上层计算框架和应用访问。The data synchronization method as described in claim 1 is characterized in that before the step of submitting the data synchronization statement according to the grammatical structure of the data synchronization language, it also includes: configuring the address and directory of the Alluxio data orchestration service, wherein the Alluxio data orchestration service is used to mount the underlying file system for access by the upper-level computing framework and application.
  3. 如权利要求2所述的数据同步方法,其特征在于,在根据数据同步语言的语法结构提交数据同步语句的步骤之前,还包括:配置元数据服务的地址、所述一个或多个同步区域的底层文件系统的地址、计算引擎的执行目录、同步引擎的通讯地址、所述同步引擎的端口、同步执行者的通讯地址、所述同步执行者的端口、同步网络服务的地址、所述同步网络服务的端口中的一项或任意项。The data synchronization method as described in claim 2 is characterized in that before the step of submitting the data synchronization statement according to the grammatical structure of the data synchronization language, it also includes: configuring one or any item of the address of the metadata service, the address of the underlying file system of the one or more synchronization areas, the execution directory of the computing engine, the communication address of the synchronization engine, the port of the synchronization engine, the communication address of the synchronization executor, the port of the synchronization executor, the address of the synchronization network service, and the port of the synchronization network service.
  4. 如权利要求3所述的数据同步方法,其特征在于,所述语法结构包括配置项,所述配置项包括:调度周期、调度时区、计算引擎、是否选定当地区域时区、源数据表分区、目标数据表分区、需要同步的分区列表、视图名称中的一项或任意项,其中,所述源数据表对应于所述单元节点,所述目标数据表对应于所述中心节点。The data synchronization method as described in claim 3 is characterized in that the grammatical structure includes configuration items, and the configuration items include: a scheduling period, a scheduling time zone, a calculation engine, whether to select a local area time zone, a source data table partition, a target data table partition, a list of partitions to be synchronized, and a view name, wherein the source data table corresponds to the unit node, and the target data table corresponds to the central node.
  5. 如权利要求4所述的数据同步方法,其特征在于,所述同步计划包括离线同步计划和/或实时同步计划,所述实时同步计划包括所述视图名称。The data synchronization method as described in claim 4 is characterized in that the synchronization plan includes an offline synchronization plan and/or a real-time synchronization plan, and the real-time synchronization plan includes the view name.
  6. 如权利要求4所述的数据同步方法,其特征在于,解析所述数据同步语句的步骤包括:The data synchronization method according to claim 4, wherein the step of parsing the data synchronization statement comprises:
    根据所述语法结构解析所述数据同步语句,获得解析结果;Parsing the data synchronization statement according to the grammatical structure to obtain a parsing result;
    在所述解析结果为解析成功后获得数据同步语法树,所述数据同步语法树包括所述配置项的信息、所述源数据表的表名、所述目标数据表的表名;After the parsing result is that the parsing is successful, a data synchronization syntax tree is obtained, wherein the data synchronization syntax tree includes information of the configuration item, a table name of the source data table, and a table name of the target data table;
    根据所述元数据服务的地址连接所述元数据服务,根据所述目标数据表的表名判断所述元数据服务中是否存在所述目标数据表;Connecting to the metadata service according to the address of the metadata service, and judging whether the target data table exists in the metadata service according to the table name of the target data table;
    若所述目标数据表存在,则创建所述Alluxio数据编排服务对应于所述源数据表的映射目录;If the target data table exists, create a mapping directory for the Alluxio data orchestration service corresponding to the source data table.
    将所述一个或多个同步区域的底层文件系统的地址中的源表目录挂载至所述映射目录;Mounting the source table directory in the address of the underlying file system of the one or more synchronization areas to the mapping directory;
    创建映射目录源表,所述映射目录源表的位置为所述映射目录;Create a mapping directory source table, the location of the mapping directory source table is the mapping directory;
    判断是否进行实时同步,若不进行实时同步则生成离线同步计划;若进行实时同步则根据所述视图名称创建视图,以及生成实时同步计划。Determine whether to perform real-time synchronization. If not, generate an offline synchronization plan. If real-time synchronization is performed, create a view according to the view name, and generate a real-time synchronization plan.
  7. 如权利要求6所述的数据同步方法,其特征在于,解析所述数据同步语句的步骤还包括:The data synchronization method according to claim 6, wherein the step of parsing the data synchronization statement further comprises:
    根据所述配置项的信息、所述Alluxio数据编排服务的地址和所述元数据服务的地址,生成所述Alluxio数据编排服务的文件元数据同步计划和所述元数据服务的分区元数据同步计划;Generate a file metadata synchronization plan for the Alluxio data orchestration service and a partition metadata synchronization plan for the metadata service according to the configuration item information, the address of the Alluxio data orchestration service, and the address of the metadata service;
    所述计算引擎读取所述映射目录源表中的数据字段,并将所述数据字段写入所述目标数据表;The computing engine reads the data fields in the mapping directory source table and writes the data fields into the target data table;
    将所述离线同步计划、所述实时同步计划、所述Alluxio数据编排服务的文件元数据同步计划、所述元数据服务的分区元数据同步计划中的一个或多个存入所述中心节点的数据库。Store one or more of the offline synchronization plan, the real-time synchronization plan, the file metadata synchronization plan of the Alluxio data orchestration service, and the partition metadata synchronization plan of the metadata service into the database of the central node.
  8. 如权利要求1所述的数据同步方法,其特征在于,在解析所述数据同步语句,获得同步计划的步骤之后,还包括:The data synchronization method according to claim 1, characterized in that after the step of parsing the data synchronization statement to obtain a synchronization plan, it further comprises:
    定期读取所述同步计划,判断所述同步计划是否符合运行条件,所述运行条件包括计划执行时区和计划执行时间;Periodically reading the synchronization plan to determine whether the synchronization plan meets the running conditions, the running conditions including the planned execution time zone and the planned execution time;
    若符合所述运行条件则执行所述同步计划。If the operating conditions are met, the synchronization plan is executed.
  9. 如权利要求1所述的数据同步方法,其特征在于,在执行所述同步计划,获得执行结果的步骤之后,还包括:判断所述执行结果是否成功,若所述执行结果为成功,则将成功的执行结果写入日志;若所述执行结果为失败,则进行报警并将失败的执行结果写入所述日志。The data synchronization method as described in claim 1 is characterized in that, after the step of executing the synchronization plan and obtaining the execution result, it also includes: judging whether the execution result is successful, if the execution result is successful, writing the successful execution result into a log; if the execution result is a failure, issuing an alarm and writing the failed execution result into the log.
  10. 如权利要求1所述的数据同步方法,其特征在于,在执行所述同步计划,获得执行结果的步骤之后,还包括显示所述同步计划和所述执行结果。The data synchronization method as described in claim 1 is characterized in that after the step of executing the synchronization plan and obtaining the execution result, it also includes displaying the synchronization plan and the execution result.
  11. 如权利要求10所述的数据同步方法,其特征在于,在显示所述同步计划 和所述执行结果的步骤中,所述同步计划包括存储在所述中心节点的数据库中的任意同步计划。The data synchronization method as described in claim 10 is characterized in that, in the step of displaying the synchronization plan and the execution result, the synchronization plan includes any synchronization plan stored in the database of the central node.
  12. 一种跨区域的数据同步系统,其特征在于,包括:A cross-regional data synchronization system, characterized by comprising:
    同步客户端模块,用于根据数据同步语言的语法结构提交数据同步语句;A synchronization client module, used to submit data synchronization statements according to the grammatical structure of the data synchronization language;
    同步引擎模块,用于解析所述数据同步语句,获得同步计划,所述同步计划适于将一个或多个同步区域的单元节点的数据同步至中心节点;A synchronization engine module, used for parsing the data synchronization statement to obtain a synchronization plan, wherein the synchronization plan is suitable for synchronizing data of unit nodes in one or more synchronization areas to a central node;
    同步执行者模块,用于执行所述同步计划,获得执行结果;A synchronization executor module is used to execute the synchronization plan and obtain the execution result;
    存储器,用于存储可由处理器执行的指令;a memory for storing instructions executable by a processor;
    处理器,用于执行所述指令以实现如权利要求1-11任一项所述的数据同步方法。A processor, configured to execute the instructions to implement the data synchronization method as described in any one of claims 1-11.
  13. 如权利要求12所述的数据同步系统,其特征在于,还包括同步网络服务模块,用于前端显示所述同步计划和所述执行结果。The data synchronization system as described in claim 12 is characterized in that it also includes a synchronization network service module for displaying the synchronization plan and the execution result on the front end.
  14. 一种存储有计算机程序代码的计算机可读介质,其特征在于,所述计算机程序代码在由处理器执行时实现如权利要求1-11任一项所述的数据同步方法。A computer-readable medium storing computer program code, characterized in that the computer program code, when executed by a processor, implements the data synchronization method as described in any one of claims 1-11.
PCT/CN2023/070424 2022-10-10 2023-01-04 Cross-region data synchronization method and system, and computer readable medium WO2024077802A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211235647 2022-10-10
CN202211235647.3 2022-10-10

Publications (1)

Publication Number Publication Date
WO2024077802A1 true WO2024077802A1 (en) 2024-04-18

Family

ID=85434953

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/070424 WO2024077802A1 (en) 2022-10-10 2023-01-04 Cross-region data synchronization method and system, and computer readable medium

Country Status (2)

Country Link
CN (1) CN115794941A (en)
WO (1) WO2024077802A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794941A (en) * 2022-10-10 2023-03-14 上海商米科技集团股份有限公司 Cross-region data synchronization method, system and computer readable medium
CN116881370A (en) * 2023-09-06 2023-10-13 北京久其金建科技有限公司 Data synchronization method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250514A (en) * 2016-08-04 2016-12-21 摩贝(上海)生物科技有限公司 Based on Mysql data base and the transnational method of data synchronization of SQL daily record
US20180107560A1 (en) * 2016-10-13 2018-04-19 Adobe Systems Incorporated Extensible file synchronization
CN109726250A (en) * 2018-12-27 2019-05-07 星环信息科技(上海)有限公司 Data-storage system, metadatabase synchronization and data cross-domain calculation method
CN109815254A (en) * 2018-12-28 2019-05-28 北京东方国信科技股份有限公司 Cross-region method for scheduling task and system based on big data
CN110851534A (en) * 2019-11-15 2020-02-28 上海达梦数据库有限公司 Data processing method, system and storage medium
CN113934745A (en) * 2020-06-29 2022-01-14 中兴通讯股份有限公司 Data synchronization processing method, electronic device and storage medium
CN115098587A (en) * 2022-06-13 2022-09-23 中国科学院空天信息创新研究院 Method, device, equipment and medium for synchronizing remote database
CN115129468A (en) * 2022-06-13 2022-09-30 中国核电工程有限公司 Database synchronization method, device and medium between PDMS/E3D servers
CN115794941A (en) * 2022-10-10 2023-03-14 上海商米科技集团股份有限公司 Cross-region data synchronization method, system and computer readable medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250514A (en) * 2016-08-04 2016-12-21 摩贝(上海)生物科技有限公司 Based on Mysql data base and the transnational method of data synchronization of SQL daily record
US20180107560A1 (en) * 2016-10-13 2018-04-19 Adobe Systems Incorporated Extensible file synchronization
CN109726250A (en) * 2018-12-27 2019-05-07 星环信息科技(上海)有限公司 Data-storage system, metadatabase synchronization and data cross-domain calculation method
CN109815254A (en) * 2018-12-28 2019-05-28 北京东方国信科技股份有限公司 Cross-region method for scheduling task and system based on big data
CN110851534A (en) * 2019-11-15 2020-02-28 上海达梦数据库有限公司 Data processing method, system and storage medium
CN113934745A (en) * 2020-06-29 2022-01-14 中兴通讯股份有限公司 Data synchronization processing method, electronic device and storage medium
CN115098587A (en) * 2022-06-13 2022-09-23 中国科学院空天信息创新研究院 Method, device, equipment and medium for synchronizing remote database
CN115129468A (en) * 2022-06-13 2022-09-30 中国核电工程有限公司 Database synchronization method, device and medium between PDMS/E3D servers
CN115794941A (en) * 2022-10-10 2023-03-14 上海商米科技集团股份有限公司 Cross-region data synchronization method, system and computer readable medium

Also Published As

Publication number Publication date
CN115794941A (en) 2023-03-14

Similar Documents

Publication Publication Date Title
WO2024077802A1 (en) Cross-region data synchronization method and system, and computer readable medium
CN108932282B (en) Database migration method and device and storage medium
CN108519914B (en) Big data calculation method and system and computer equipment
US5495606A (en) System for parallel processing of complex read-only database queries using master and slave central processor complexes
CN110647579A (en) Data synchronization method and device, computer equipment and readable medium
JP2003528395A (en) Method and apparatus for automatically locating data in a computer network
US10650027B2 (en) Access accelerator for active HBase database regions
CN111797121A (en) Strong consistency query method, device and system for read-write separation architecture service system
WO2019109854A1 (en) Data processing method and device for distributed database, storage medium, and electronic device
US7571164B2 (en) System and method for deferred database connection configuration
US11615076B2 (en) Monolith database to distributed database transformation
CN115374102A (en) Data processing method and system
CN111008244A (en) Database synchronization and analysis method and system
US20230418811A1 (en) Transaction processing method and apparatus, computing device, and storage medium
WO2018204865A1 (en) Data integration for distributed and massively parallel processing environments
CN113886485A (en) Data processing method, device, electronic equipment, system and storage medium
CN113282611A (en) Method and device for synchronizing stream data, computer equipment and storage medium
WO2021031583A1 (en) Method and apparatus for executing statements, server and storage medium
CN115374116A (en) High-allocable multi-data-source instant heterogeneous fusion method and system
US7739232B2 (en) Programming system for occasionally-connected mobile business applications
CN115687503A (en) Method, device and equipment for synchronizing data among databases and storage medium
CN113946628A (en) Data synchronization method and device based on interceptor
CN115982231A (en) Distributed real-time search system and method
US20220116201A1 (en) Automated development, deployment and integration service for distributed ledger-enabled applications
CN112199426B (en) Interface call management method, device, server and medium under micro-service architecture