WO2024077802A1

WO2024077802A1 - Cross-region data synchronization method and system, and computer readable medium

Info

Publication number: WO2024077802A1
Application number: PCT/CN2023/070424
Authority: WO
Inventors: 孔祥强; 林喆; 于汉岭
Original assignee: 上海商米科技集团股份有限公司
Priority date: 2022-10-10
Filing date: 2023-01-04
Publication date: 2024-04-18
Also published as: CN115794941A

Abstract

The present invention relates to a cross-region data synchronization method and system, and a computer readable medium. The cross-region data synchronization method comprises: submitting a data synchronization statement according to a syntax structure of a data synchronization language; parsing the data synchronization statement to obtain a synchronization plan, wherein the synchronization plan is suitable for synchronizing data of unit nodes of one or more synchronization regions to a central node; and executing the synchronization plan to obtain an execution result. According to the technical solution of the present invention, the content of the data synchronization statement is standardized by means of a unified syntax structure of the data synchronization language, a corresponding synchronization plan is obtained according to the data synchronization statement, all synchronization plans do not need to be subjected to additional scheduling and development, and the synchronization plans are easy to maintain and have low development cost; and after the synchronization plan is executed, the data of the unit nodes of the synchronization regions are synchronized to the central node, and data synchronization can be implemented by the unified deployment of the cross-region data synchronization method at the central node.

Description

Cross-region data synchronization method, system and computer-readable medium

Technical Field

The present invention mainly relates to the field of computer information technology, and in particular to a cross-regional data synchronization method, system and computer-readable medium.

Background technique

Under the background of the General Data Protection Regulation (GDPR) which strictly stipulates the rules for data processing, storage and management, the demand for local storage of data across multiple geographic regions in global business is becoming increasingly urgent. However, some data needs still need to be counted globally. Traditional data transmission solutions can no longer meet the needs of multi-location deployment. Moreover, as business needs increase, data stored in different geographic regions need to be synchronized and aggregated to a unified node for further processing and calculation of data.

FIG1 is an exemplary architecture diagram of a cross-regional data synchronization solution in the prior art. Referring to FIG1, the data storage location across geographical regions is divided into unit nodes and central nodes. Each node has a corresponding data integration service, and the data integration service is used to receive, process, and store data in the local area. Due to business or technical requirements, it is necessary to synchronize the data of each unit node to a unified central node. The existing data transmission synchronization solutions are mainly divided into offline synchronization and real-time synchronization, wherein offline synchronization is mainly based on an offline transmission solution for batch file transmission, and real-time synchronization is mainly based on a transmission solution for interface services. The data synchronization method in FIG1 includes: a real-time data transmission client, a real-time synchronization receiving service, and a file offline synchronization solution, wherein the real-time data transmission client is responsible for obtaining real-time data from each unit node and transmitting data to the central node through an interface service; the real-time synchronization receiving service is responsible for receiving and processing the data of each unit node, and distributing the data to the data channel corresponding to the central node; the file offline synchronization solution is responsible for comparing, pulling files from the files stored locally in each unit node through the file interface, and writing the files to the central node, as well as recording the time and status of each file transmission, so as to facilitate restarting the task when the transmission process fails and avoid repeated file transmission.

As shown in reference to Figure 1, the cross-regional data synchronization solution in the prior art at least needs to maintain the file offline synchronization solution, the real-time synchronization receiving service and its real-time data transmission client, which requires at least two sets of components and corresponding codes, and the offline transmission solution depends on other components and is complex. The real-time transmission solution has high R&D costs, data consistency is difficult to ensure, and requires many deployment resources. In addition, each unit node across geographical regions may use different local storage systems, and different file interfaces need to be connected during the development of the synchronization solution. If the number of unit nodes increases, the corresponding deployment resources and maintenance costs will also increase. Therefore, there is an urgent need for a cross-regional data synchronization solution that is uniformly deployed, easy to maintain, and has low development costs.

Summary of the invention

The technical problem to be solved by the present application is to provide a cross-regional data synchronization method, system and computer-readable medium, which can achieve unified deployment, easy maintenance and low development cost.

The technical solution adopted by the present application to solve the above-mentioned technical problems is a cross-regional data synchronization method, including: submitting data synchronization statements according to the grammatical structure of the data synchronization language; parsing the data synchronization statements to obtain a synchronization plan, which is suitable for synchronizing the data of unit nodes in one or more synchronization areas to a central node; executing the synchronization plan to obtain an execution result.

In one embodiment of the present application, before the step of submitting a data synchronization statement according to the grammatical structure of the data synchronization language, it also includes: configuring the address and directory of the Alluxio data orchestration service, and the Alluxio data orchestration service is used to mount the underlying file system for access by the upper-level computing framework and applications.

In one embodiment of the present application, before the step of submitting a data synchronization statement according to the grammatical structure of the data synchronization language, it also includes: configuring one or any item of the address of the metadata service, the address of the underlying file system of one or more synchronization areas, the execution directory of the computing engine, the communication address of the synchronization engine, the port of the synchronization engine, the communication address of the synchronization executor, the port of the synchronization executor, the address of the synchronization network service, and the port of the synchronization network service.

In one embodiment of the present application, the grammatical structure includes configuration items, which include: a scheduling period, a scheduling time zone, a computing engine, whether to select a local area time zone, a source data table partition, a target data table partition, a list of partitions to be synchronized, and a view name, wherein the source data table corresponds to a unit node and the target data table corresponds to a central node.

In an embodiment of the present application, the synchronization plan includes an offline synchronization plan and/or a real-time synchronization plan, and the real-time synchronization plan includes a view name.

In one embodiment of the present application, the step of parsing a data synchronization statement includes: parsing the data synchronization statement according to a grammatical structure to obtain a parsing result; after the parsing result is a successful parsing, obtaining a data synchronization syntax tree, the data synchronization syntax tree includes information about configuration items, a table name of a source data table, and a table name of a target data table; connecting to a metadata service according to an address of the metadata service, and judging whether a target data table exists in the metadata service according to the table name of the target data table; if the target data table exists, creating a mapping directory corresponding to the source data table for the Alluxio data orchestration service; mounting the source table directory in the address of the underlying file system of one or more synchronization areas to the mapping directory; creating a mapping directory source table, the location of the mapping directory source table being the mapping directory; judging whether to perform real-time synchronization, and generating an offline synchronization plan if not; creating a view according to the view name if real-time synchronization is performed, and generating a real-time synchronization plan.

In one embodiment of the present application, the step of parsing the data synchronization statement also includes: generating a file metadata synchronization plan for the Alluxio data orchestration service and a partition metadata synchronization plan for the metadata service according to information of the configuration item, the address of the Alluxio data orchestration service and the address of the metadata service; the computing engine reads the data field in the mapping directory source table and writes the data field into the target data table; and stores one or more of the offline synchronization plan, the real-time synchronization plan, the file metadata synchronization plan for the Alluxio data orchestration service and the partition metadata synchronization plan for the metadata service into the database of the central node.

In one embodiment of the present application, after the step of parsing the data synchronization statement and obtaining the synchronization plan, it also includes: regularly reading the synchronization plan to determine whether the synchronization plan meets the operating conditions, the operating conditions including the planned execution time zone and the planned execution time; if the operating conditions are met, the synchronization plan is executed.

In one embodiment of the present application, after executing the synchronization plan and obtaining the execution result, it also includes: judging whether the execution result is successful. If the execution result is successful, writing the successful execution result into a log; if the execution result is a failure, issuing an alarm and writing the failed execution result into a log.

In one embodiment of the present application, after the steps of executing the synchronization plan and obtaining the execution results, the synchronization plan and the execution results are also displayed.

In one embodiment of the present application, in the step of displaying the synchronization plan and the execution result, the synchronization plan includes any synchronization plan stored in the database of the central node.

In order to solve the above-mentioned technical problems, the present application also proposes a cross-regional data synchronization system, including: a synchronization client module, used to submit data synchronization statements according to the grammatical structure of the data synchronization language; a synchronization engine module, used to parse the data synchronization statements and obtain a synchronization plan, and the synchronization plan is suitable for synchronizing the data of the unit nodes of one or more synchronization areas to the central node; a synchronization executor module, used to execute the synchronization plan and obtain the execution result; a memory, used to store instructions executable by a processor; a processor, used to execute instructions to implement the above data synchronization method.

In one embodiment of the present application, the data synchronization system also includes a synchronization network service module for displaying the synchronization plan and execution results on the front end.

In order to solve the above technical problems, the present application also proposes a computer-readable medium storing computer program code, and the computer program code implements the above data synchronization method when executed by a processor.

The technical solution of the present application standardizes the content of data synchronization statements through the grammatical structure of a unified data synchronization language, and obtains corresponding synchronization plans based on the data synchronization statements. All synchronization plans do not require additional scheduling and development, and the synchronization plans are easy to maintain and have low development costs. After the synchronization plan is executed, the data of the unit nodes in each synchronization area are synchronized to the central node, and data synchronization can be achieved by uniformly deploying a cross-regional data synchronization method at the central node.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to make the above-mentioned objects, features and advantages of the present application more obvious and easy to understand, the specific implementation methods of the present application are described in detail below with reference to the accompanying drawings, wherein:

FIG1 is an exemplary architecture diagram of a cross-region data synchronization solution in the prior art;

FIG2 is an exemplary flow chart of a cross-region data synchronization method according to an embodiment of the present application;

FIG3 is an exemplary flow chart of a cross-region data synchronization method according to another embodiment of the present application;

FIG4 is an exemplary architecture diagram of a Xensql data synchronization service according to an embodiment of the present application;

FIG5 is an exemplary architecture diagram of a cross-regional data synchronization system according to an embodiment of the present application;

FIG6 is a schematic diagram of an exemplary deployment of a cross-regional data synchronization system according to another embodiment of the present application;

FIG. 7 is a system block diagram of a cross-region data synchronization system according to an embodiment of the present application.

Preferred embodiments of the present invention

In order to make the above-mentioned objects, features and advantages of the present application more obvious and easy to understand, the specific implementation methods of the present application are described in detail below with reference to the accompanying drawings.

In the following description, many specific details are set forth to facilitate a full understanding of the present application, but the present application may also be implemented in other ways different from those described herein, and therefore the present application is not limited to the specific embodiments disclosed below.

As shown in this application and claims, unless the context clearly indicates an exception, the words "a", "an", "an" and/or "the" do not refer to the singular and may also include the plural. Generally speaking, the terms "comprises" and "includes" only indicate the inclusion of the steps and elements that have been clearly identified, and these steps and elements do not constitute an exclusive list. The method or device may also include other steps or elements.

Flowcharts are used in the present application to illustrate the operations performed by the system according to the embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed accurately in order. On the contrary, various steps may be processed in reverse order or simultaneously. At the same time, other operations may be added to these processes, or one or more operations may be removed from these processes.

The present application proposes a cross-regional data synchronization method, which is suitable for synchronizing the data of unit nodes across various geographical synchronization areas to a unified central node, for example, synchronizing data from the North American region and the European region to the Chinese region. This data synchronization method allows users to avoid worrying about time zone differences, file system differences and other issues in cross-regional data synchronization.

FIG2 is an exemplary flow chart of a cross-region data synchronization method according to an embodiment of the present application. Referring to FIG2 , the cross-region data synchronization method according to the embodiment includes the following steps:

Step S110: Submit a data synchronization statement according to the grammatical structure of the data synchronization language.

Step S120: Parse the data synchronization statement to obtain a synchronization plan, where the synchronization plan is suitable for synchronizing the data of the unit nodes of one or more synchronization areas to the central node.

Step S130: Execute the synchronization plan and obtain the execution result.

The above steps S110 to S130 are described in detail below:

In step S110, a data synchronization statement is submitted according to the grammatical structure of the data synchronization language.

This application creates a new data synchronization language (Data Synchronization Language, DSL) and standardizes the grammatical structure of DSL accordingly. Data synchronization statements are used to configure various required parameters in the data synchronization process. This application correspondingly creates a data synchronization service to parse data synchronization statements and perform various operations in the data synchronization process. It should be noted that this application names the data synchronization service Xensql, and Xensql will be used to refer to the data synchronization service of this application in the following text. The grammatical structure of DSL and its statement examples will be expanded in detail later.

In some embodiments, the grammatical structure includes configuration items, including: scheduling cycle, scheduling time zone, calculation engine, whether to select local area time zone, source data table partition, target data table partition, list of partitions to be synchronized, view name or any item, wherein the source data table corresponds to the unit node and the target data table corresponds to the central node.

Exemplarily, the configuration item information included in the syntax structure of the DSL of the present application is shown in Table 1 below:

Table 1 List of configuration items of the grammatical structure of the DSL of this application

The following example illustrates a data synchronization statement submitted according to the grammatical structure of the DSL of this application:

Example 1: Offline synchronization.

The source data table for offline synchronization is ods.a, the target data table is ods.b, and the task name is test_a_to_b.

Data synchronization statement for offline synchronization:

create sync test_a_to_b

table ods.a(to_id,table_msn)

to

table ods.b(cid,msn)

with(

schedule.time = '0 00 07? **',

schedule.default.region=ch,

compute.engine.sql=spark,

schedule.local.region=true,

source.partition.field=region,dt,

sink.partition.field=region,dt,

sink.partition.list = us,jp

)

Example 2: Real-time synchronization.

The source data table for real-time synchronization is ods.a, the target data table is ods.b, and the task name is test_real_a_to_b.

Data synchronization statement for real-time synchronization:

create realsync test_real_a_to_b

table ods.a(to_id,table_msn)

to

table ods.b(cid,msn)

with(

schedule.time = '0 00 07? **',

schedule.default.region=ch,

compute.engine.sql=spark,

schedule.local.region=true,

source.partition.field=region,dt,

sink.partition.field=region,dt,

sink.partition.list = us,jp,

realsync.view.name=ods.b_view

)

FIG3 is an exemplary flow chart of a cross-regional data synchronization method according to another embodiment of the present application. Referring to FIG3 , in this embodiment, the cross-regional data synchronization method is mainly executed by the Xensql data synchronization service, and the Xensql data synchronization service includes four modules. The first module is the Xensql synchronization client module 310, which is a client for submitting DSL language; the second module is the Xensql synchronization engine module 320, which is responsible for parsing DSL syntax, obtaining and scheduling synchronization plans; the third module is the Xensql synchronization executor module 330, which is responsible for executing the synchronization plan issued by the Xensql synchronization engine module 320; the fourth module is the Xensql synchronization network service module 340, which is responsible for reading the synchronization plan and execution results for front-end query display. The following will be combined with FIG3, the previous data synchronization statement example 1 (offline synchronization), and example 2 (real-time synchronization) to introduce in detail the various steps of the cross-regional data synchronization method.

In some embodiments, as shown in FIG3 , before submitting the data synchronization statement according to the grammatical structure of the data synchronization language, the method further includes: configuring the address and directory of the Alluxio data orchestration service, where the Alluxio data orchestration service is used to mount the underlying file system for access by the upper-layer computing framework and applications. This step corresponds to step S3103 in FIG3 .

For example, the Alluxio data orchestration service unifies the data access method through a unified namespace, and provides a unified client application programming interface (API) for the underlying storage system and the upper-level computing framework. Alluxio provides a multi-layer caching solution to accelerate data access and avoid multiple reads of duplicate data. By using the unified namespace of Alluxio components, a unified access method is provided for multiple storage systems, reducing the cost of data access docking, and providing multiple file access interfaces to support multiple upper-level applications or computing frameworks, such as hive and spark. For example, through the hive metastore, a schema mapping table based on the Alluxio file system can be established to support data access and computing methods for Alluxio without changing the application logic in the existing data warehouse. The Alluxio data orchestration service locally mounts different storage systems in different geographical regions, uses the region field as the partition identification of different regions, and sets up memory cache to accelerate file reading and avoid multiple reads of the same file.

This application uses Alluxio as a data orchestration service to provide a unified access method for storage systems in different geographical areas. The design implements offline synchronization and real-time synchronization solutions, reducing the cost of later development and maintenance. When adding a new unit node, you only need to mount the corresponding directory to the Alluxio directory of the central node, without additional deployment and operation and maintenance costs. Through Alluxio's unified audit and log, monitor the overall data synchronization solution and avoid monitoring solutions and maintenance in multiple regions.

In some embodiments, before submitting the data synchronization statement according to the grammatical structure of the data synchronization language, it also includes: configuring the address of the metadata service, the address of the underlying file system of one or more synchronization areas, the execution directory of the computing engine, the communication address of the synchronization engine, the port of the synchronization engine, the communication address of the synchronization executor, the port of the synchronization executor, the address of the synchronization network service, and the port of the synchronization network service. These steps partially correspond to step S3101 and step S3102 of Figure 3.

Exemplarily, the metadata service (Under Database, UDB) can implement the data query service of the central node. Referring to FIG3, before the Xensql data synchronization service executes the cross-region data synchronization method, it is necessary to deploy the Xensql synchronization client module 310, the Xensql synchronization engine module 320, the Xensql synchronization executor module 330, and the Xensql synchronization network service module 340; and configure the address of the UDB, the address of the underlying file system (Underlying File System, UFS) of each synchronization area, the execution directory of the computing engine, the remote procedure call (Remote Procedure Call, RPC) communication address and port of the Xensql synchronization engine module 320, the RPC communication address and port of the Xensql synchronization executor module 330, and the service address and port of the Xensql synchronization network service module 340.

After the Xensql synchronization client module 310 is started, the client 311 reads various configuration information, including the address of the UDB used, the UFS address of each synchronization area, the execution directory of the computing engine, the RPC communication address of the Xensql synchronization engine module 320, etc.

In step S120, the data synchronization statement is parsed to obtain a synchronization plan, where the synchronization plan is suitable for synchronizing the data of the unit nodes of one or more synchronization areas to the central node.

In some embodiments, the synchronization plan includes an offline synchronization plan and/or a real-time synchronization plan, and the real-time synchronization plan includes a view name.

Exemplarily, referring to Figure 3 and the data synchronization statement examples 1 (offline synchronization) and 2 (real-time synchronization) described above, the data synchronization statement is submitted through the client 311 of the Xensql synchronization client module 310, and the Xensql synchronization engine module 320 parses the data synchronization statement to obtain the corresponding synchronization plan and real-time synchronization plan. The following text will introduce the execution process of the cross-regional data synchronization method of the present application based on the data synchronization statement examples 1 (offline synchronization) and 2 (real-time synchronization). For the interpretation of the attributes in "with" in the offline synchronization data synchronization statement, please refer to Table 1 described above (the list of configuration items of the grammatical structure of the DSL of the present application).

In some embodiments, the step of parsing the data synchronization statement includes:

Step S1201: Parse the data synchronization statement according to the grammatical structure to obtain the parsing result;

Step S1202: after the parsing result is that the parsing is successful, a data synchronization syntax tree is obtained, where the data synchronization syntax tree includes information of the configuration item, the table name of the source data table, and the table name of the target data table;

Step S1203: connecting to the metadata service according to the address of the metadata service, and determining whether the target data table exists in the metadata service according to the table name of the target data table;

Step S1204: If the target data table exists, create a mapping directory of the Alluxio data orchestration service corresponding to the source data table;

Step S1205: Mount the source table directory in the address of the underlying file system of one or more synchronization areas to the mapping directory;

Step S1206: Create a mapping directory source table, the location of the mapping directory source table is the mapping directory;

Step S1207: Determine whether to perform real-time synchronization. If not, generate an offline synchronization plan. If real-time synchronization is performed, create a view according to the view name, and generate a real-time synchronization plan.

Exemplarily, the above steps S1201 to S1207 are explained with reference to FIG. 3 and the data synchronization statement example 1 (offline synchronization) described above.

In step S1201 of the previous text, after the Xensql synchronization engine module 320 receives the data synchronization statement submitted by the client 311 of the Xensql synchronization client module 310 using RPC communication, it parses the data synchronization statement according to the grammatical structure of the DSL in step S3201 to obtain the parsing result. If the parsing is unsuccessful, an exception is thrown in step S3203 and the failure reason is returned to the client 311; if the parsing is successful, in step S1202 of the previous text, the Xensql synchronization engine module 320 obtains the data synchronization syntax tree, and the data synchronization syntax tree includes: the table name of the source data table: ods.a; the partition fields of the source data table: region, dt; the data fields of the source data table: to_id, table_msn; the table name of the target data table: ods.b; the partition fields of the target data table: region, dt; the data fields of the target data table: cid, msn; the synchronization task name: test_a_to_b; the scheduling period: 0 00 07? **(7 o'clock every day); Scheduling time zone: ch (China); Computing engine: spark; Whether to select local area time zone scheduling: true; List of partitions to be synchronized (synchronization area): us, jp (United States, Japan).

In step S1203 above, in step S3202 shown in FIG3 , the metadata service UDB is connected according to the address of the metadata service. In step S3204, it is determined whether the target data table ods.b exists in the metadata service UDB according to the table name ods.b of the target data table. If not, an exception is thrown to the client 311 in step S3203. In step S1204 above, if the target data table ods.b exists, a mapping directory of the Alluxio data orchestration service corresponding to the source data table ods.a is created in step S3205: /alluxio/warehouse/ods.db/a_sync, where /alluxio/warehouse is the configuration of the Alluxio data orchestration service, which can be freely defined, and ods.db is the library directory of ods, and a_sync is the source data table a plus the suffix _sync, which is the directory writing method defined by the Xensql synchronization engine module 320.

In the previous step S1205, in step S3206 shown in Figure 3, the source table directory in the address of the underlying file system of one or more synchronization areas is mounted to the mapping directory. Specifically, according to the parsed synchronization areas us and jp, and according to the configured UFS addresses of each synchronization area, the UFS addresses of the synchronization areas us and jp are obtained, and the /ods.db/a in the UFS addresses of the synchronization areas us and jp is mounted to the Alluxio mapping directory /alluxio/warehouse/ods.db/a_sync.

In step S1206 above, in step S3207 shown in Figure 3, Alluxio's mapping directory source table is created. Specifically, according to the configured UDB address, the UDB is connected, and a mapping directory source table is created: ods.a_sync. The location of the mapping directory source table ods.a_sync is set to Alluxio's mapping directory /alluxio/warehouse/ods.db/a_sync.

In step S1207 above, in step S3208 shown in FIG3 , it is determined whether to perform real-time synchronization. If not, an offline synchronization plan is generated in step S3210. If real-time synchronization is performed, a view is created according to the view name in step S3209, and a real-time synchronization plan is generated in step S3210. Specifically, the UDB is connected through the realsync.view.name attribute in the DSL data synchronization statement to create the view.

In some embodiments, the step of parsing the data synchronization statement further includes:

Step S1208: Generate a file metadata synchronization plan for the Alluxio data orchestration service and a partition metadata synchronization plan for the metadata service based on the configuration item information, the address of the Alluxio data orchestration service, and the address of the metadata service.

Step S1209: the calculation engine reads the data fields in the mapping directory source table and writes the data fields into the target data table;

Step S1210: Store one or more of the offline synchronization plan, the real-time synchronization plan, the file metadata synchronization plan of the Alluxio data orchestration service, and the partition metadata synchronization plan of the metadata service into the database of the central node.

Exemplarily, the above steps S1208 to S1210 are explained with reference to FIG. 3 and the data synchronization statement example 1 (offline synchronization) described above.

In step S1208 above, based on the parsed scheduling period, scheduling time zone, partition field, synchronization area, computing engine, synchronization data field configuration, as well as the configured metadata service UDB address and Alluxio data orchestration service address, the file metadata synchronization plan of the Alluxio data orchestration service and the partition metadata synchronization plan of the metadata service UDB are generated. In step S1209 above, based on the parsed computing engine configuration, spark is configured to generate spark sql to read the data fields to_id and table_msn in the Alluxio mapping directory source table ods.a_sync, and write the data fields into the data fields cid and msn in the target data table ods.b.

In step S1210 above, one or more of the offline synchronization plan, the real-time synchronization plan, the file metadata synchronization plan of the Alluxio data orchestration service, and the partition metadata synchronization plan of the metadata service are stored in the database 321 of the central node, where the database 321 of the central node is a MySQL database. In step S3211, a message indicating that the DSL execution is successful is returned to the client 311.

In some embodiments, after the step of parsing the data synchronization statement and obtaining the synchronization plan, the method further includes:

Step S1211, periodically reading the synchronization plan to determine whether the synchronization plan meets the operating conditions, the operating conditions including the planned execution time zone and the planned execution time;

Step S1212: If the operating conditions are met, execute the synchronization plan.

Exemplarily, as shown in reference to FIG3 , the Xensql synchronization engine module 320 also has a synchronization plan scheduling function. In step S1211 above, the scheduling engine 322 of the Xensql synchronization engine module 320 reads the synchronization plan regularly every 30 seconds according to the synchronization plan written in the database 321 of the central node in step S3212. In step S1212 above, in step S3213 shown in FIG3 , it is determined whether the synchronization plan meets the operating conditions, wherein the operating conditions include the planned execution time zone and the planned execution time. If the operating conditions are not met, the task is terminated in step S3215; if the operating conditions are met, the synchronization plan is sent in step S3214, and the synchronization plan is sent to the Xensql synchronization executor module 330 through RPC communication.

In step S130, the synchronization plan is executed to obtain an execution result.

In some embodiments, after executing the synchronization plan and obtaining the execution result, it also includes: determining whether the execution result is successful. If the execution result is successful, writing the successful execution result to the log; if the execution result is failed, an alarm is issued and the failed execution result is written to the log.

Exemplarily, referring to FIG3 , the Xensql synchronization executor module 330 executes the synchronization plan and obtains the execution result. Specifically, the synchronization plan is executed in step S3301, and whether the execution result is successful is determined in step S3302. If the execution is unsuccessful, an alarm is issued in step S3304 and the failed execution result is written to the log; if the execution is successful, the successful execution result is written to the log in step S3303.

In some embodiments, after the steps of executing the synchronization plan and obtaining the execution results, the step further includes displaying the synchronization plan and the execution results.

In some embodiments, in the step of displaying the synchronization plan and the execution result, the synchronization plan includes any synchronization plan stored in the database of the central node.

Exemplarily, referring to FIG. 3, in the Xensql synchronization network service module 340, in step S3401, the user logs in; in step S3402, the synchronization plan is queried and the corresponding synchronization plan is obtained from the database 321 of the central node; in step S3403, the query result is obtained, the synchronization plan in the database 321 of the central node and the log of the execution result are obtained and displayed on the front end. Through the Xensql synchronization network service module 340, the user can view any synchronization plan stored in the database 321 of the central node, the synchronization plan that has been removed from the shelf in the past, the synchronization plan that failed to execute, the synchronization plan that was successfully executed, and the synchronization plan that is being executed.

In the scenario of global data synchronization, the cross-regional data synchronization method and Xensql data synchronization service of this application enable users to not worry about time zone differences, file system differences and other issues in data synchronization across geographical regions. Users only need to write data synchronization statements that conform to the grammatical structure of the data synchronization language DSL, and the Xensql data synchronization service will generate corresponding cross-regional offline synchronization plans and real-time synchronization plans based on the data synchronization statements. This application uses the Alluxio data orchestration service to provide a unified access method for storage systems in different geographical regions. Combined with the metadata service UDB, the data of each geographical region is aggregated and calculated and stored in the central node, which can realize the data query service of the central node, while the original data is still stored locally, saving time and cost for each query, and precipitating data assets, reducing the cost of later development and maintenance.

FIG4 is an exemplary architecture diagram of a Xensql data synchronization service according to an embodiment of the present application. Referring to FIG4 , the core of the Xensql data synchronization service includes four parts, namely: a Xensql synchronization client 410, a Xensql synchronization engine 430, a Xensql synchronization executor 450, and a Xensql synchronization network service 460.

As shown in FIG4 , the user submits a data synchronization statement 420 in the Xensql synchronization client 410. After receiving the data synchronization statement 420, the Xensql synchronization engine 430 parses the syntax of the data synchronization statement 420 in combination with the relevant configuration information of the metadata service 440; in step S431, it is determined whether the syntax is passed. If the syntax is not passed, an exception is thrown in step S433, and the relevant information is written to the log in step S434; if the syntax is passed, the data synchronization statement is parsed in step S432 and written to the configuration storage. Specifically, after parsing, the offline synchronization plan or the real-time synchronization plan, the area and metadata that the user needs to synchronize, the data table that needs to be synchronized, and other information are obtained, and the corresponding synchronization plan is written into the configuration database mysql for storage. The Xensql synchronization executor 450 executes the synchronization plan according to the corresponding synchronization plan in step S435, and writes the corresponding execution result to the log in step S436. The Xensql synchronization network service 460 reads the configuration database mysql and the relevant data in the log for front-end display.

FIG5 is an exemplary architecture diagram of a cross-regional data synchronization system according to an embodiment of the present application. Referring to FIG5 , the architecture of the cross-regional data synchronization system 500 includes: a metadata service 510, an underlying file system 520, an Alluxio data orchestration service 530, a computing engine 540, and a Xensql data synchronization service 550. Based on the architecture, the present application has developed a data synchronization service Xensql that integrates the operation of the Alluxio Software Development Kit (SDK). The present application integrates Xensql and Alluxio to achieve real-time synchronization and offline synchronization of data across geographical regions using DSL.

Figure 6 is an exemplary deployment diagram of a cross-regional data synchronization system of another embodiment of the present application. Referring to Figure 6, it is necessary to synchronize the data of the unit nodes: North America 630 and Europe 620 to the central node: China 610. Exemplarily, the Alluxio data orchestration service cluster is deployed and started in the central node China 610; the data directories that need to be synchronized in China 610, Europe 620, and North America 630 are mounted to the Alluxio data orchestration service 640; the metadata service 650 creates a synchronization plan and adds partitions according to the relevant configuration information; the client 660 reads the corresponding synchronization plan and Alluxio data, and writes the relevant information into the mapping directory source table of the synchronization plan of China 610; the scheduling service 670 periodically reads the synchronization plan to schedule the execution of the synchronization plan.

This application integrates the cross-geographic synchronization solutions of Alluxio and Xensql, and supports synchronization plans for data in different time zones. Users only need to mount the data of the corresponding area into the directory without having to worry about the time zone differences between different regions. For example, the Xensql data synchronization service is set to the Beijing time zone, and the two regions of China and the United States are mounted in the relevant table of the central node. Xensql generates two synchronization plans accordingly: 1. Synchronize the data plan for the Chinese region at 01:00 Beijing time every day; 2. Synchronize the data plan for the North American region at 13:00 Beijing time every day. The relevant synchronization statements are as follows:

#Time zone differences:

Beijing time: 07.21 00:00 corresponds to US time: 07.21 00:00-12:00 = 07.20.12:00

#Alluxio mount:

The present application also includes a cross-region data synchronization system, including: a synchronization client module, which is used to submit data synchronization statements according to the grammatical structure of the data synchronization language; a synchronization engine module, which is used to parse the data synchronization statements and obtain a synchronization plan, which is suitable for synchronizing the data of the unit nodes of one or more synchronization areas to the central node; a synchronization executor module, which is used to execute the synchronization plan and obtain the execution result; a memory and a processor. Among them, the memory is used to store instructions that can be executed by the processor; the processor is used to execute the instructions to implement the cross-region data synchronization method described above.

In some embodiments, the data synchronization system also includes a synchronization network service module for displaying the synchronization plan and execution results on the front end.

The contents described in the foregoing of this application can all be used to explain the cross-regional data synchronization system, as shown in reference Figure 3, wherein the Xensql synchronization client module 310 is a specific implementation of the synchronization client module, the Xensql synchronization engine module 320 is a specific implementation of the synchronization engine module, the Xensql synchronization executor module 330 is a specific implementation of the synchronization executor module, and the Xensql synchronization network service module 340 is a specific implementation of the synchronization network service module, and the relevant contents will not be repeated here.

FIG7 is a system block diagram of a cross-regional data synchronization system according to an embodiment of the present application. Referring to FIG7 , the cross-regional data synchronization system 700 may include an internal communication bus 701, a processor 702, a read-only memory (ROM) 703, a random access memory (RAM) 704, and a communication port 705. When applied on a personal computer, the cross-regional data synchronization system 700 may also include a hard disk 706. The internal communication bus 701 may enable data communication between components of the cross-regional data synchronization system 700. The processor 702 may make judgments and issue prompts. In some embodiments, the processor 702 may be composed of one or more processors. The communication port 705 may enable data communication between the cross-regional data synchronization system 700 and the outside. In some embodiments, the cross-regional data synchronization system 700 may send and receive information and data from the network through the communication port 705. The cross-regional data synchronization system 700 may also include different forms of program storage units and data storage units, such as a hard disk 706, a read-only memory (ROM) 703 and a random access memory (RAM) 704, which can store various data files used for computer processing and/or communication, and possible program instructions executed by the processor 702. The processor executes these instructions to implement the main part of the method. The results of the processor processing are transmitted to the user device through the communication port and displayed on the user interface.

The above-mentioned cross-region data synchronization method can be implemented as a computer program, stored in the hard disk 706, and loaded into the processor 702 for execution to implement the cross-region data synchronization method of the present application.

The present application also includes a computer-readable medium storing a computer program code, which, when executed by a processor, implements the cross-region data synchronization method described above.

When the cross-regional data synchronization method is implemented as a computer program, it can also be stored in a computer-readable storage medium as a product. For example, a computer-readable storage medium may include, but is not limited to, magnetic storage devices (e.g., hard disks, floppy disks, magnetic strips), optical disks (e.g., compact disks (CDs), digital versatile disks (DVDs)), smart cards, and flash memory devices (e.g., electrically erasable programmable read-only memories (EPROMs), cards, sticks, key drives). In addition, the various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" may include, but is not limited to, wireless channels and various other media (and/or storage media) that can store, contain and/or carry code and/or instructions and/or data.

It should be understood that the embodiments described above are only illustrative. The embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For hardware implementation, the processor may be implemented in one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, and/or other electronic units designed to perform the functions described herein, or combinations thereof.

Some aspects of the present application may be performed entirely by hardware, entirely by software (including firmware, resident software, microcode, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data blocks", "modules", "engines", "units", "components" or "systems". The processor may be one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DAPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, or combinations thereof. In addition, various aspects of the present application may be expressed as computer products located in one or more computer-readable media, which include computer-readable program codes. For example, computer-readable media may include, but are not limited to, magnetic storage devices (e.g., hard disks, floppy disks, tapes ...), optical disks (e.g., compact disks CDs, digital versatile disks DVDs ...), smart cards, and flash memory devices (e.g., cards, sticks, key drives ...).

A computer-readable medium may include a propagated data signal containing computer program code, such as in baseband or as part of a carrier wave. The propagated signal may be in a variety of forms, including electromagnetic, optical, etc., or a suitable combination. A computer-readable medium may be any computer-readable medium other than a computer-readable storage medium, which may be connected to an instruction execution system, device or apparatus to communicate, propagate or transmit a program for use. The program code on the computer-readable medium may be propagated via any suitable medium, including radio, cable, fiber optic cable, radio frequency signal, or similar medium, or any combination of the above mediums.

The basic concepts have been described above. Obviously, for those skilled in the art, the above application disclosure is only an example and does not constitute a limitation of the present application. Although not explicitly stated herein, those skilled in the art may make various modifications, improvements and corrections to the present application. Such modifications, improvements and corrections are suggested in the present application, so such modifications, improvements and corrections still belong to the spirit and scope of the exemplary embodiments of the present application.

At the same time, the present application uses specific words to describe the embodiments of the present application. For example, "one embodiment", "an embodiment", and/or "some embodiments" refer to a certain feature, structure or characteristic related to at least one embodiment of the present application. Therefore, it should be emphasized and noted that "one embodiment" or "an embodiment" or "an alternative embodiment" mentioned twice or more in different positions in this specification does not necessarily refer to the same embodiment. In addition, some features, structures or characteristics in one or more embodiments of the present application can be appropriately combined.

In some embodiments, numbers describing the number of components and attributes are used. It should be understood that such numbers used in the description of the embodiments are modified by the modifiers "about", "approximately" or "substantially" in some examples. Unless otherwise specified, "about", "approximately" or "substantially" indicate that the numbers are allowed to vary by ±20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximate values, which may change according to the required features of individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and adopt the general method of retaining digits. Although the numerical domains and parameters used to confirm the breadth of their range in some embodiments of the present application are approximate values, in specific embodiments, the setting of such numerical values is as accurate as possible within the feasible range.

Claims

A cross-region data synchronization method, characterized by comprising:

Submit data synchronization statements according to the grammatical structure of the data synchronization language;

Parsing the data synchronization statement to obtain a synchronization plan, wherein the synchronization plan is suitable for synchronizing data of unit nodes in one or more synchronization areas to a central node;

Execute the synchronization plan and obtain an execution result.
The data synchronization method as described in claim 1 is characterized in that before the step of submitting the data synchronization statement according to the grammatical structure of the data synchronization language, it also includes: configuring the address and directory of the Alluxio data orchestration service, wherein the Alluxio data orchestration service is used to mount the underlying file system for access by the upper-level computing framework and application.
The data synchronization method as described in claim 2 is characterized in that before the step of submitting the data synchronization statement according to the grammatical structure of the data synchronization language, it also includes: configuring one or any item of the address of the metadata service, the address of the underlying file system of the one or more synchronization areas, the execution directory of the computing engine, the communication address of the synchronization engine, the port of the synchronization engine, the communication address of the synchronization executor, the port of the synchronization executor, the address of the synchronization network service, and the port of the synchronization network service.
The data synchronization method as described in claim 3 is characterized in that the grammatical structure includes configuration items, and the configuration items include: a scheduling period, a scheduling time zone, a calculation engine, whether to select a local area time zone, a source data table partition, a target data table partition, a list of partitions to be synchronized, and a view name, wherein the source data table corresponds to the unit node, and the target data table corresponds to the central node.
The data synchronization method as described in claim 4 is characterized in that the synchronization plan includes an offline synchronization plan and/or a real-time synchronization plan, and the real-time synchronization plan includes the view name.
The data synchronization method according to claim 4, wherein the step of parsing the data synchronization statement comprises:

Parsing the data synchronization statement according to the grammatical structure to obtain a parsing result;

After the parsing result is that the parsing is successful, a data synchronization syntax tree is obtained, wherein the data synchronization syntax tree includes information of the configuration item, a table name of the source data table, and a table name of the target data table;

Connecting to the metadata service according to the address of the metadata service, and judging whether the target data table exists in the metadata service according to the table name of the target data table;

If the target data table exists, create a mapping directory for the Alluxio data orchestration service corresponding to the source data table.

Mounting the source table directory in the address of the underlying file system of the one or more synchronization areas to the mapping directory;

Create a mapping directory source table, the location of the mapping directory source table is the mapping directory;

Determine whether to perform real-time synchronization. If not, generate an offline synchronization plan. If real-time synchronization is performed, create a view according to the view name, and generate a real-time synchronization plan.
The data synchronization method according to claim 6, wherein the step of parsing the data synchronization statement further comprises:

Generate a file metadata synchronization plan for the Alluxio data orchestration service and a partition metadata synchronization plan for the metadata service according to the configuration item information, the address of the Alluxio data orchestration service, and the address of the metadata service;

The computing engine reads the data fields in the mapping directory source table and writes the data fields into the target data table;

Store one or more of the offline synchronization plan, the real-time synchronization plan, the file metadata synchronization plan of the Alluxio data orchestration service, and the partition metadata synchronization plan of the metadata service into the database of the central node.
The data synchronization method according to claim 1, characterized in that after the step of parsing the data synchronization statement to obtain a synchronization plan, it further comprises:

Periodically reading the synchronization plan to determine whether the synchronization plan meets the running conditions, the running conditions including the planned execution time zone and the planned execution time;

If the operating conditions are met, the synchronization plan is executed.
The data synchronization method as described in claim 1 is characterized in that, after the step of executing the synchronization plan and obtaining the execution result, it also includes: judging whether the execution result is successful, if the execution result is successful, writing the successful execution result into a log; if the execution result is a failure, issuing an alarm and writing the failed execution result into the log.
The data synchronization method as described in claim 1 is characterized in that after the step of executing the synchronization plan and obtaining the execution result, it also includes displaying the synchronization plan and the execution result.
The data synchronization method as described in claim 10 is characterized in that, in the step of displaying the synchronization plan and the execution result, the synchronization plan includes any synchronization plan stored in the database of the central node.
A cross-regional data synchronization system, characterized by comprising:

A synchronization client module, used to submit data synchronization statements according to the grammatical structure of the data synchronization language;

A synchronization engine module, used for parsing the data synchronization statement to obtain a synchronization plan, wherein the synchronization plan is suitable for synchronizing data of unit nodes in one or more synchronization areas to a central node;

A synchronization executor module is used to execute the synchronization plan and obtain the execution result;

a memory for storing instructions executable by a processor;

A processor, configured to execute the instructions to implement the data synchronization method as described in any one of claims 1-11.
The data synchronization system as described in claim 12 is characterized in that it also includes a synchronization network service module for displaying the synchronization plan and the execution result on the front end.
A computer-readable medium storing computer program code, characterized in that the computer program code, when executed by a processor, implements the data synchronization method as described in any one of claims 1-11.