CN108052681B

CN108052681B - Method and system for synchronizing structured data between relational databases

Info

Publication number: CN108052681B
Application number: CN201810030156.2A
Authority: CN
Inventors: 毛彬; 罗威; 谭玉珊; 罗准辰; 牛海波; 张吉才; 武帅; 叶宇铭; 田昌海; 尹忠博
Original assignee: Military Science Information Research Center Of Military Academy Of Chinese Pla
Current assignee: MILITARY SCIENCE INFORMATION RESEARCH CENTER OF MILITARY ACADEMY OF THE CHINESE PLA
Priority date: 2018-01-12
Filing date: 2018-01-12
Publication date: 2020-05-26
Anticipated expiration: 2038-01-12
Also published as: CN108052681A

Abstract

The invention discloses a method for synchronizing structured data between relational databases, which is used for realizing data synchronization between a source database and a target database; the method comprises the following steps: step 1) extracting key value pair information from structured original data needing to be exported in log information of a source database and a source database according to actual service application requirements, rewriting the key value pair information into intermediate data meeting format requirements by combining a data operation type and a data entry mark format, and storing the intermediate data in a json format as intermediate export data; step 2) mapping json intermediate export data from basic data into target data according to a data import strategy of a target database by combining data cleaning operation and database format conversion operation, converting the json intermediate export data into data to be imported according with a corresponding database import format, and then importing the data into the target database; and 3) performing reverse analysis operation on the data to be imported generated in the step 2) in combination with the type of the target database to generate backup recovery data for rolling back the data version.

Description

Method and system for synchronizing structured data between relational databases

Technical Field

The invention relates to the field of data synchronization processing, in particular to a method and a system for synchronizing structured data among relational databases.

Background

In the field of data processing, it is generally necessary to perform processes such as data extraction, mapping conversion, and synchronization. The raw data is normally stored in conventional databases such as MySQL, MariaDB, SQLServer and the like to support large-scale retrieval requirements, and further data processing stores the data in relational databases such as elastic search, solr and the like which are professionally served to a data retrieval engine, so that the data needs to be mapped and synchronized among various relational databases. In addition, for enterprises, companies, data analysis organizations and the like providing large-scale data applications, since the authority control of data resources is usually equipped with security isolation settings of internal and external networks, requirements for offline unidirectional transmission are provided for data synchronization.

The traditional data synchronization scheme can well solve the data synchronization of the databases of the same type, but avoids the data mapping synchronization requirement among different databases; meanwhile, the synchronous consistency among data can be well maintained by carrying out bidirectional communication under the same network, but the problem of data backup and recovery caused by data version conflict under the limitation of unidirectional communication is difficult to timely and effectively solve.

Disclosure of Invention

The invention aims to provide a method and a system for synchronizing structured data among relational databases aiming at the synchronization of the structured data among different relational databases and the data processing requirements of the synchronization of offline structured data such as unidirectional communication and the like, which are suitable for solving the problem of data synchronization among the databases and can complete the generalized synchronization of the data among the databases, namely the mapping synchronization including the complete consistency of the data and the incomplete consistency after the data cleaning operation is added; the method is particularly suitable for completing data synchronization under the offline condition, and data reverse analysis operation is added for data version rollback; and the reliability and stability of data are ensured.

In order to achieve the above object, the present invention provides a method for synchronizing structured data between relational databases, which is used for implementing data synchronization between a source database and a target database; including but not limited to full consistent synchronization of data between databases of the same type and databases of different types and incomplete consistent synchronization of data in conjunction with data cleansing operations; the method comprises the following steps:

step 1) extracting key value pair information from structured original data needing to be exported in log information of a source database and a source database according to actual service application requirements, rewriting the key value pair information into intermediate data meeting format requirements by combining a data operation type and a data entry mark format, and storing the intermediate data in a json format as intermediate export data;

step 2) mapping json intermediate export data from basic data into target data according to a data import strategy of a target database by combining data cleaning operation and database format conversion operation, converting the json intermediate export data into data to be imported according with a corresponding database import format, and then importing the data into the target database;

and 3) performing reverse analysis operation on the data to be imported generated in the step 2) in combination with the type of the target database to generate backup recovery data for rolling back the data version.

As an improvement of the above method, the step 1) specifically includes:

step 1-1) dividing structured original data needing to be exported in a source database into: data table information, full field data and partial data;

step 1-2) exporting data table information: exporting structural information of a data table needing to be exported in a source database, extracting key value pair information, converting the key value pair information into a json format and storing the json format; the structure information of the data table includes: database name, table name, code, and field name, type, length of all fields;

step 1-3) exporting the full field data: intercepting selected items in a source database according to query statements generated in a set interval, analyzing items needing data deletion in a source database log information in a targeted mode according to deletion operation, converting structured original data of the selected items into a key value pair mode, generating a unique identifier according to a specified data item mark format to serve as a data identification code of each data item, marking data operation types, storing the data identification codes into a json format, and generating json intermediate export data; if the data entry mark format is not specified, setting the data entry mark format to be null; defaulting the data operation type of the data entry selected from the source database to be 'increased', deleting the data operation type of the data entry selected from the log information of the source database, and setting a corresponding data operation code according to the corresponding database type;

step 1-4) exporting partial data: generating query statements according to a field list appointed by a user and a set query interval, matching data in a source database to obtain a selected entry, analyzing entries needing data deletion in the log information of the source database in a targeted manner aiming at 'delete' operation, converting structured original data of the selected entry into a key value pair form, generating a unique identifier according to an appointed data entry mark format to serve as a data identification code of each data entry, marking a data operation type, storing the unique identifier into a json format, and generating json intermediate export data; if the data entry mark format is not specified, setting the data entry mark format to be null; and default data operation types of the data items selected from the source database to be modified, default data operation types of the data items selected from the log information of the source database to be deleted, and different data operation codes are set according to the corresponding database types.

As an improvement of the above method, the step 2) specifically includes:

step 2-1), dividing json intermediate derived data into the following data types: creating a data table and importing data;

step 2-2) data table creation: reducing the structure information of the data table in the json intermediate export data into key value pairs, filling a format required by creating a new table for the target database according to the type of the target database, and creating the new table for the target database;

step 2-3) data import: generating a data processing strategy by combining the type of the target database and a data format adjustment and data cleaning strategy specified by a user;

and 2-4) performing data format adjustment and data cleaning according to the data processing strategy generated in the step 2-3) by combining the data operation code and the allocated data identification code, generating a final data import statement, and importing the final data import statement into a target database.

As an improvement of the above method, the step 3) specifically includes:

step 3-1) dividing the data to be imported generated in the step 2) into reverse analysis of data table information and reverse analysis of content data according to data types;

step 3-2) reverse analysis of data table information: reading the data to be imported for creating a new table of the target database generated in the step 2-2), decoding the data into corresponding table deletion statements, and generating backup recovery data for rolling back of a subsequent data version;

step 3-3) reverse analysis of content data: and reading the data to be imported generated in the step 2-4) and used for importing the target database, and performing corresponding reverse analysis operation by combining the type of the target database to generate backup recovery data used for rolling back the subsequent data version.

As an improvement of the above method, the reverse parsing operation includes: reading a data entry to be imported, and reversely mapping the data operation code: and replacing the adding operation with a deleting operation, assigning the modifying operation with a modifying operation corresponding to the original data, replacing the deleting operation with an adding operation corresponding to the data, and generating backup recovery data for rolling back the subsequent data version by combining the item content.

The invention also provides a system for synchronizing the structured data among the relational databases, which comprises: the data synchronization engine 10, the data processing module 20, the message scheduling module 30, the data backup repository 40 and the log management repository 50;

the data synchronization engine 10 is used for taking charge of interaction between a user and a system, including task customization, authority management of the user, uploading and downloading of data and expansion and connection of other external interfaces;

the data processing module 20 is configured to receive a data processing task command sent by the message scheduling module 30, read data required by a task from the log management database 50 according to the task command, export the data from the source database, and import the data into the target database; performing reverse analysis operation on the data for rolling back the data version;

the message scheduling module 30 is configured to obtain a task request configured by a user and a configured normalized data synchronization task request from the data synchronization engine 10, and transmit a data processing task command to the data processing module 20;

the data backup storage 40 is used for storing data after format adjustment and data cleaning of data uploaded by a user in the data synchronization engine 10, storing export data packets, upload import data packets and generated reverse recovery data packets generated by the data processing module 20, storing log files generated in the operation process of the system, and storing log information files of all deletion operations obtained by periodically extracting and analyzing the operation log files in each database;

the log management library 50 is configured to manage all information interaction logs generated during the system operation process, including task customization information, user usage records, uploaded data and code storage path registration, information entry of an external interface in the data synchronization engine 10, data import and export condition records in the data processing module 20, association information records between import and export data packets, reverse recovery data packets and a task database, and task execution information records of the message scheduling module 30.

As an improvement of the above system, the data synchronization engine 10 includes:

the task customizing unit is used for configuring the normalized timing task, configuring the one-time temporary task, and configuring the tasks including the merging of a plurality of subtasks, the data operation code assignment in the exporting process, the data identification code format configuration, the attribute configuration of source data, the data format adjustment in the importing process, the uploading of the data cleaning processing unit, the attribute configuration of target data and the like, and the calling configuration of an external interface;

the authority management unit of the user is used for realizing the highest authority setting of data synchronization of an administrator, the execution authority of maintenance personnel and the temporary configuration and calling authority of data application personnel;

the interactive interface is used for uploading and downloading data; modifying and uploading the data cleaning processing unit; interaction such as task customization and attribute configuration;

and the external interface is used for expanding the application range of the data synchronization system and increasing the expandable interface management of system intelligent services, such as the increase and the change of a data import unit and a data export unit, the butt joint of an optical disk ferrying system controlled by data flow direction and the like.

As an improvement of the above system, the data processing module 20 includes: the device comprises a data export operation unit A, a data import operation unit B and a reverse analysis recovery unit C;

the data export operation unit a is configured to export original data from a source database into json intermediate export data, read source database log information about "delete" operation in the data backup repository 40, analyze export data of entry generation flag bit deletion operation that needs data deletion, and merge the export data into the json intermediate export data; dividing original data into derivation of a data table, derivation of full-field data and derivation of partial data according to data types; performing json datamation of table creation information on original data to be exported according to an export strategy of a data export task, and combining the original data of full-field data and partial data with data operation types and data standardization of a data entry mark format to generate json intermediate export data;

the data import operation unit B is used for importing json intermediate export data into a target database; dividing json intermediate import data into creation of a data table and import of data to be imported according to data types; carrying out data formatting on the data table information to be imported according to the requirement of new table creation of a target database, and finishing data mapping on the content data to be imported according to data cleaning and the import requirement of the target database so as to generate a final import statement and import the final import statement into the target database;

and the reverse analysis recovery unit C is used for performing reverse analysis operation on the data to be imported to generate backup recovery data for rolling back the data version.

The invention has the advantages that:

the method and the system of the invention enable the implementation of data synchronization business of users between different relational databases to be more convenient, solve the data recovery problem of coping with data version conflicts under the limitation of one-way communication by using the data recovery strategy, effectively cope with various different data synchronization scenes, and lay the foundation for a more intelligent data synchronization system through interface expansion.

Drawings

FIG. 1 is a schematic diagram of the process of the present invention;

FIG. 2 is a schematic diagram of the system of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

As shown in fig. 1, data synchronization is mainly divided into two steps, one is data export, which exports data from a source database: according to the actual service application requirements, the information of the structured data extraction key value pairs needing to be exported in the source database is rewritten into intermediate data meeting format requirements and stored as json texts for subsequent use or backup; secondly, data import, importing the data into a target database, and generating an execution script for data recovery: and importing the data of the json intermediate data text into the database according to the communication condition of the target database and the data import strategy of the target database, and simultaneously performing reverse analysis on the text data to generate backup recovery data which can be used for rolling back the data version.

(1) Data export

The data synchronization is to migrate all data in the source database to the target database in a consistent manner, so that the consistency of basic data of the two databases is maintained to the maximum extent. The exported data is mainly divided into three types of data table structure export, full-field data export and partial data export, wherein the data export generally involves three operations of 'adding', 'deleting' and 'modifying', and operation information of three different operations needs to be saved in the data export process.

The table structure information export part of the data exports the creation code information of the data table needing to be exported in the source database, extracts the key value pair information and converts the key value pair information into json format for storage. The structure information of the data table mainly comprises: database name, table name, code, and field name, type, length, etc. of all fields. Such as creation information of table in sql, mapping information of type in elastic search, and schema of core in solr.

Since the data deleting operation on the database does not pass through the data synchronization system, but the deleting operation on the database data is easy to distinguish and extract in the database log due to obvious characteristics, the data deleting log can be collected according to the directional analysis on the database log, so that an execution script about the data deleting operation information of a specific table in a source database is generated, and meanwhile, different data operation codes for marking the deleting operation are set according to the type of the corresponding database. The information of the part can be newly established with a special deleting operation log detection program on the database server for a specific export process to identify and synthesize into the generated export text.

The method comprises the steps of leading out parts of full-field data, intercepting parts needing to be led out in a source database according to query statements generated in a set interval, converting structured data of selected items into key value pairs, storing the key value pairs into json format for leading out, meanwhile, allocating a data identification code for each data item according to an appointed data item mark format, and if the key value pairs are not appointed, setting the data identification codes to be null; the default operation information is "add", corresponding data operation codes are set according to corresponding database types, such as insert in sql, index in elastic search, add in solr, and the like for distinguishing, if the data operation codes are specially needed, the data updating data table can be set as "modify", and the data operation codes are set as follows.

The export part of partial data generates a corresponding query statement according to a field list needing to be exported and a set interval to intercept the part needing to be exported in a source database, converts the structured data of the selected entry into a key value pair format to be saved into a json format for export, and meanwhile, allocates a data identification code for each data entry according to the designated data entry mark format, and if the data entry mark format is not designated, the data entry mark format is set to be null; the default operation information is 'modification', different data operation codes are set according to the type of the corresponding database, such as update _ set of sql, update of elastic search, add _ field of solr and the like for distinguishing, if a data table is newly built according to the export data, the data operation codes can be set to 'add', and the data operation codes are set as above.

(2) Data import

The data import mainly comprises analyzing the data obtained in the data export link and inputting the data into a target database. And according to the type of the data to be imported, establishing a data table of the target database based on the table structure data and synchronizing the data of the target database based on the table data.

And the import part for creating a new table based on the table structure data analyzes the structure information in the json import data into key value pairs, fills a format required by creating the new table for the target database according to the type of the target database, calls a corresponding table creation module to create the new table, and simultaneously generates backup recovery data for deleting the data table and used for rolling back a subsequent data version.

Synchronizing an importing part of a target database based on table data, namely, enabling imported json data to 1) call a corresponding data importing unit according to the database type of the target database, 2) process data items to be imported according to data processing units such as specified data format adjustment and data cleaning, and 3) perform data mapping synchronization according to data operation codes and equipped data identification codes; and meanwhile, generating corresponding data recovery data according to the data operation codes and the data identification codes through reverse analysis, for example, implementing deletion operation on the addition operation, implementing modification operation based on corresponding original data on the modification operation, and implementing addition operation on the corresponding data on the deletion operation, thereby generating an execution script for data recovery of data recovery.

As shown in fig. 2, the data synchronization system mainly includes a data synchronization engine 10, a data processing module 20, a message scheduling module 30, a data backup repository 40, and a log management repository 50.

(1) Data synchronization engine 10

The data synchronization engine 10 is mainly responsible for all interaction behaviors between users and data synchronization, including task customization, user authority management, uploading and downloading of data, and other external interface expansion links such as: optical disc ferry applications involved in data synchronization of unidirectional communication, and the like.

Task customization is divided into two major categories: the method comprises the steps of configuring a normalized timing task, configuring a one-time temporary task, wherein the configuration comprises the combination of a plurality of subtasks, data operation code assignment in the export process, data identification code format configuration, attribute configuration of source data, data format adjustment in the import process, uploading of a data cleaning processing unit, attribute configuration of target data and the like, and calling configuration with an external interface;

the authority management of the user relates to the highest authority of data synchronization of an administrator, the execution authority of maintenance personnel, the temporary configuration calling authority of data application personnel and the like;

the uploading and downloading of data are the most basic system interaction interfaces; modifying and uploading the data cleaning processing unit; interaction such as task customization and attribute configuration;

the connection of other external interfaces is expandable interface management for expanding the application range of the data synchronization system and increasing the intelligent service of the system, such as the increase and the modification of a data import unit and a data export unit, the butt joint of an optical disk ferrying system for controlling the data flow direction and the like.

(2) Data processing module 20

The data processing module 20 includes: the device comprises a data export operation unit A, a data import operation unit B and a reverse analysis recovery unit C;

the data export operation unit a is configured to export original data from a source database into json intermediate export data, read periodically extracted source database log information about a deletion operation in the data backup repository 40, analyze export data of an entry requiring data deletion to generate a flag bit deletion operation, and merge the export data into the json intermediate export data; the method comprises the steps of dividing original data into a data table information derivation, a full field data derivation and a partial data derivation according to data types. Performing json datamation of table creation information on original data to be exported according to an export strategy of a data export task, and combining the original data of full-field data and partial data with data operation types and data standardization of a data entry mark format to generate json intermediate export data;

the data import operation unit B is used for importing json intermediate export data into a target database; dividing json intermediate import data into creation of a data table and import of data to be imported according to data types. Dividing json intermediate export data into: standard table information to be imported is formatted according to data required by new table creation of a target database, standard content data to be imported is subjected to data cleaning and data mapping required by import of the target database, and therefore a final import statement is generated and imported into the target database;

and the reverse analysis recovery unit C is used for performing reverse analysis on the data to be imported to generate backup recovery data which is subsequently used for rolling back the data version. The method comprises the steps of conducting corresponding reverse operation type mapping on data to be imported in combination with different database types and generating single communication network by backup recovery item data of corresponding data items, and conducting data export and import in an off-line synchronization mode in a bidirectional communication network.

(3) Message scheduling Module 30

The message scheduling module 30 is responsible for collecting the task requests of the new configuration of the user and the configured normalized data synchronization task requests acquired in the data synchronization engine 10, and transmitting data processing task commands to the data processing module 20. The message scheduling module can well avoid message congestion, thereby relieving system pressure and reducing errors or omissions in data processing task execution.

(4) Data backup repository 40

The data backup repository 40 mainly stores a data format adjustment and data cleaning processing unit uploaded by a user in the data synchronization engine 10, an export data packet generated in a data processing task, an import data packet uploaded in the data processing task, a reverse recovery data packet generated in the data processing task, a log file generated in the system operation process, and log information files of all deletion operations obtained by periodically extracting and analyzing the operation log file in each database.

(5) Log management library 50

The log management library 50 is responsible for recording all information interaction logs generated in the operation process of the management system, including task customization information, user usage records, uploaded data and code storage path registration, information entry of external interfaces in the data synchronization engine 10, data import and export condition records in the data processing module 20, import and export data packets, association information records between reverse recovery data packets and a task database, and the like, and task execution information records of the message scheduling module 30.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for synchronizing structured data between relational databases is used for realizing data synchronization between a source database and a target database; including but not limited to full consistent synchronization of data between databases of the same type and databases of different types and incomplete consistent synchronization of data in conjunction with data cleansing operations; the method comprises the following steps:

step 3) performing reverse analysis operation on the data to be imported generated in the step 2) in combination with the type of the target database to generate backup recovery data for rolling back the data version;

the step 1) specifically comprises the following steps:

2. The method for synchronizing structured data between relational databases according to claim 1, wherein the step 2) specifically comprises:

3. The method for synchronizing structured data between relational databases according to claim 2, wherein the step 3) specifically comprises:

4. The method for synchronizing structured data between relational databases according to claim 3, wherein the reverse parsing operation comprises: reading a data entry to be imported, and reversely mapping the data operation code: and replacing the adding operation with a deleting operation, assigning the modifying operation with a modifying operation corresponding to the original data, replacing the deleting operation with an adding operation corresponding to the data, and generating backup recovery data for rolling back the subsequent data version by combining the item content.

5. A system for synchronizing structured data between relational databases, the system comprising: the system comprises a data synchronization engine (10), a data processing module (20), a message scheduling module (30), a data backup warehouse (40) and a log management library (50);

the data synchronization engine (10) is used for taking charge of interactive behaviors between a user and a system, and comprises task customization, authority management of the user, uploading and downloading of data and expansion and connection of other external interfaces;

the data processing module (20) is used for receiving a data processing task command sent by the message scheduling module (30), reading data required by a task from the log management library (50) according to the task command, exporting the data from a source database, and importing the data into a target database; performing reverse analysis operation on the data for rolling back the data version;

the message scheduling module (30) is used for acquiring a task request configured by a user and a configured normalized data synchronization task request from the data synchronization engine (10), and transmitting a data processing task command to the data processing module (20);

the data backup storage (40) is used for storing data which are uploaded by a user in the data synchronization engine (10) and are subjected to data format adjustment and data cleaning, storing export data packets, upload import data packets and generated reverse recovery data packets which are generated by the data processing module (20), storing log files generated in the operation process of the system and log information files of all deleting operations which are obtained by periodically extracting and analyzing the operation log files in each database;

the log management library (50) is used for managing all information interaction logs generated in the running process of the system, and comprises task customization information, user usage records, uploaded data and code storage path registration, information entry of an external interface in a data synchronization engine (10), data import and export condition records in a data processing module (20), import and export data packets, association information records between reverse recovery data packets and a task database, and task execution information records of a message scheduling module (30);

the data processing module (20) comprises: the device comprises a data export operation unit A, a data import operation unit B and a reverse analysis recovery unit C;

the data export operation unit A is used for exporting original data from a source database into json intermediate export data, reading source database log information about 'delete' operation in a data backup warehouse (40), analyzing export data of item generation zone bit delete operation needing data delete, and merging the export data into the json intermediate export data; dividing original data into derivation of a data table, derivation of full-field data and derivation of partial data according to data types; performing json datamation of table creation information on original data to be exported according to an export strategy of a data export task, and combining the original data of full-field data and partial data with data operation types and data standardization of a data entry mark format to generate json intermediate export data;

6. The system for synchronization of structured data between relational databases according to claim 5, wherein the data synchronization engine (10) comprises:

the task customizing unit is used for configuring the normalized timing task, configuring the one-time temporary task, and configuring the tasks including the combination of a plurality of subtasks, the data operation code assignment in the exporting process, the data identification code format configuration, the attribute configuration of source data, the data format adjustment in the importing process, the uploading of the data cleaning processing unit, the attribute configuration of target data and the calling configuration of an external interface;

the interactive interface is used for uploading and downloading data; modifying and uploading the data cleaning processing unit; task customization and attribute configuration;

and the external interface is used for expanding the application range of the data synchronization system and increasing the expandable interface management of the system intelligent service, and comprises the increase and the modification of a data import unit and a data export unit and the butt joint of an optical disk ferrying system controlled by the data flow direction.