CN112579613A - Database cluster difference comparison and data synchronization method, system and medium - Google Patents

Database cluster difference comparison and data synchronization method, system and medium Download PDF

Info

Publication number
CN112579613A
CN112579613A CN202011636626.3A CN202011636626A CN112579613A CN 112579613 A CN112579613 A CN 112579613A CN 202011636626 A CN202011636626 A CN 202011636626A CN 112579613 A CN112579613 A CN 112579613A
Authority
CN
China
Prior art keywords
data
database
node
backup
comparison
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011636626.3A
Other languages
Chinese (zh)
Other versions
CN112579613B (en
Inventor
王智铎
赵慧
高士连
陈春波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 32 Research Institute
Original Assignee
CETC 32 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 32 Research Institute filed Critical CETC 32 Research Institute
Priority to CN202011636626.3A priority Critical patent/CN112579613B/en
Publication of CN112579613A publication Critical patent/CN112579613A/en
Application granted granted Critical
Publication of CN112579613B publication Critical patent/CN112579613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a system and a medium for database cluster difference comparison and data synchronization, which comprises the following steps: step 1: constructing a database cluster, a node database and an adjacent node backup library; step 2: grouping tables in a database, wherein each group of tables has a relation of external key connection, fields of the tables in the group are mutually dependent, selecting the groups needing comparison and synchronization, and generating corresponding data tables; and step 3: establishing an initial backup point for the delivery data table, storing the initial backup point in a backup database and a database as an initial comparison reference for comparison, and acquiring data increment; and 4, step 4: synchronously reporting the database, comparing the management website and verifying the synchronous data when the reported data is sent to the node of the sender, and selecting to approve or reject; and 5: and maintaining the database, including service data maintenance, dictionary data maintenance, data backup and restoration and log management.

Description

Database cluster difference comparison and data synchronization method, system and medium
Technical Field
The invention relates to the technical field of databases and the Internet, in particular to a method, a system and a medium for database cluster difference comparison and data synchronization.
Background
A database is a "warehouse that organizes, stores, and manages data according to a data structure. The method is a large data set which is stored in a computer for a long time, organized, sharable and uniformly managed, and a user can add, inquire, update, delete and the like to the data in the file. Nowadays, with the rapid increase of the number of people who surf the internet, in order to adapt to the rapid increase of the concurrent access amount of the user request to the server in the big data era, some large websites start to use the database cluster to improve the reliability of the database and the concurrent processing capability of the database. The database cluster, as the name implies, uses at least two or more database servers to form a virtual single database logical image, shares the user's request to each node of the cluster through a load balancing algorithm, and reduces the request access amount of a single node. The data service is provided to the client transparently to the user like a single database system.
A database cluster is a virtual single database logical image formed by at least two or more database servers, and provides transparent data services to clients like a single database system. In terms of regions, the use of the database cluster can be located in different regions, the nodes of each region locally deploy a special database to construct an information system, and corresponding special database CRUD (adding, reading, updating and deleting) operations are carried out according to the actual service operation condition, so that a geographically discrete special database system is formed; in terms of hierarchy, each using unit is divided from top to bottom layer by layer according to authority, and the special database information of the using units is in an inclusion relationship, namely the special database information of an upper-level unit comprises the special database information of all lower-level units, and related service data in the special database of a lower-level unit are independent to form a tree-type cascaded special database system structure. If an effective data reporting system is not established, information among the special databases is difficult to flow rapidly and comprehensively, and then becomes an information isolated island, so that the cooperation efficiency and the information utilization rate among the information systems of the cascaded databases are greatly reduced, and effective, reliable and timely information support cannot be provided for the informatization combat.
The difference of the database clusters refers to the phenomenon that the data of the databases of some nodes are inconsistent because users of different nodes of the database modify the records of the data table with different contents. The synchronization of the database clusters means that when the database clusters have differences, after a database client sends a data updating request, a result is returned to the client only after each node of the clusters is completely updated. The asynchronization of the database cluster means that when the database cluster has differences, after a database client sends a data updating request, a node receiving the request immediately returns a result to the client, and the updated data can be copied and transmitted to other nodes of the cluster in the next certain time. The asynchronous technology is a weak consistency processing method.
Databases, as a medium for storing data, have a large number of data tables and complex data associations. In order to ensure the consistency of data in the database, some constraint relations are defined between data tables, and foreign key (foreign key) concepts are introduced to establish and strengthen the link relations between data. I.e., by mapping a field or fields in a table into another table, a constraint association between two tables is created. In the data synchronization process, foreign key dependence causes generation of foreign key conflicts.
Patent document CN106294496A (application number: CN201510312034.9) discloses a hadoop cluster-based data migration method, which includes: each cluster respectively calculates the current data directory information according to a data directory list which needs to be synchronized and is sent by the main server and returns the result to the main server; the main server compares the difference of the results to obtain a difference directory list; the main server splits the difference directory list according to the number of the clients executing the synchronization task; and after receiving the synchronous task execution notification initiated by the main server, each client executing the synchronous task requests the web service, acquires the split differential directory list and executes the synchronous task. According to the technical scheme of the invention, development and encapsulation are carried out on the basis of the synchronization command provided by hadoop, and the development and encapsulation comprise data difference comparison, multithreading concurrent synchronization, synchronization result verification, synchronization progress tracking and process monitoring.
Disclosure of Invention
In view of the defects in the prior art, the present invention is directed to a method, system and medium for database cluster difference comparison and data synchronization.
The method for database cluster difference comparison and data synchronization provided by the invention comprises the following steps:
step 1: constructing a database cluster, a node database and an adjacent node backup library;
step 2: grouping tables in a database, wherein each group of tables has a relation of external key connection, fields of the tables in the group are mutually dependent, selecting the groups needing comparison and synchronization, and generating corresponding data tables;
and step 3: establishing an initial backup point for the delivery data table, storing the initial backup point in a backup database and a database as an initial comparison reference for comparison, and acquiring data increment;
and 4, step 4: synchronously reporting the database, comparing the management website and verifying the synchronous data when the reported data is sent to the node of the sender, and selecting to approve or reject;
and 5: and maintaining the database, including service data maintenance, dictionary data maintenance, data backup and restoration and log management.
Preferably, the step 1 comprises:
the database cluster is of a tree structure, the higher the depth, the higher the node level, the data interaction exists between the connected nodes, the disconnected nodes are logically independent, and the direct interaction does not exist;
any node has own data, and the node generates records of the adjacent node database as backup, and the difference comparison between the adjacent nodes is the comparison between the node database and the node backup database.
Preferably, the step 2 comprises:
and adopting the editing software to perform data conversion, generating a mapping relation by configuring the table, the field and the data of the source database and the table, the field and the data of the target database, converting the data of the source database into the target database, and performing mutual extraction of the data according to the corresponding relation.
Preferably, the step 3 comprises:
and (3) newly adding a database for difference comparison: setting the data set of any data table under any node as U1, backing up the data table U2 identical to U1 to the data table U1 node by a certain node reached by the node, and making a difference set according to the main key of each piece of data, namely U1-U2, so as to obtain newly added data of U1;
database deletion difference comparison: comparing with the newly added difference, and obtaining the data deleted by U1 from U2-U1;
database modification difference alignment: u1 and U2 obtain the same data of two table primary keys, judge whether the data of other fields are equal in turn under the condition that the primary keys are the same, and when any field data are unequal, the content of the data table is updated.
The database cluster difference comparison and data synchronization system provided by the invention comprises:
module M1: constructing a database cluster, a node database and an adjacent node backup library;
module M2: grouping tables in a database, wherein each group of tables has a relation of external key connection, fields of the tables in the group are mutually dependent, selecting the groups needing comparison and synchronization, and generating corresponding data tables;
module M3: establishing an initial backup point for the delivery data table, storing the initial backup point in a backup database and a database as an initial comparison reference for comparison, and acquiring data increment;
module M4: synchronously reporting the database, comparing the management website and verifying the synchronous data when the reported data is sent to the node of the sender, and selecting to approve or reject;
module M5: and maintaining the database, including service data maintenance, dictionary data maintenance, data backup and restoration and log management.
Preferably, the module M1 includes:
the database cluster is of a tree structure, the higher the depth, the higher the node level, the data interaction exists between the connected nodes, the disconnected nodes are logically independent, and the direct interaction does not exist;
any node has own data, and the node generates records of the adjacent node database as backup, and the difference comparison between the adjacent nodes is the comparison between the node database and the node backup database.
Preferably, the module M2 includes:
and adopting the editing software to perform data conversion, generating a mapping relation by configuring the table, the field and the data of the source database and the table, the field and the data of the target database, converting the data of the source database into the target database, and performing mutual extraction of the data according to the corresponding relation.
Preferably, the module M3 includes:
and (3) newly adding a database for difference comparison: setting the data set of any data table under any node as U1, backing up the data table U2 identical to U1 to the data table U1 node by a certain node reached by the node, and making a difference set according to the main key of each piece of data, namely U1-U2, so as to obtain newly added data of U1;
database deletion difference comparison: comparing with the newly added difference, and obtaining the data deleted by U1 from U2-U1;
database modification difference alignment: u1 and U2 obtain the same data of two table primary keys, judge whether the data of other fields are equal in turn under the condition that the primary keys are the same, and when any field data are unequal, the content of the data table is updated.
According to the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above.
Compared with the prior art, the invention has the following beneficial effects:
1. on the basis of deeply excavating the special database service condition and application characteristics, special database acquisition, report and compilation software integrating functions of data management, data report, data maintenance and the like is constructed to form a complete and efficient data maintenance management service and data report system, provide a convenient and effective data acquisition, processing, storage and sharing service basis for daily maintenance and use of the special database service, and ensure the convenience, reliability and timeliness of the special database service maintenance;
2. the invention provides an external key correlation directed graph model algorithm, which displays the external key relation among data tables in the form of a directed graph, and is convenient for analyzing the reason of external key conflict;
3. the invention provides an atomic sequence generation algorithm which realizes an external key dependence searching method according to sql statements provided by a database and takes a topological sorting algorithm as a data processing method so as to solve the problem of external key conflict.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic diagram of a data corpus population;
FIG. 2 is a schematic diagram of a mapping model hierarchy;
FIG. 3 is a schematic diagram of a process for comparing differences;
FIG. 4 is a schematic diagram of the effect of contrast ratio;
FIG. 5 is a diagram illustrating a database foreign key association;
FIG. 6 is a schematic diagram of Cartesian product correlation of data tables;
FIG. 7 is a schematic diagram of incremental capture of data based on a multiple backup point design;
FIG. 8 is a diagram illustrating database log management and maintenance.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Example (b):
the database set population architecture shown in FIG. 1 includes the following steps:
step 1.1: the database set group architecture design comprises the following steps:
the database cluster is designed into a tree structure, and the authority of the cluster database of the tree structure is designed, namely, the higher the depth, the higher the node level. As can be seen from the figure, at most one upper node exists for any node, but a plurality of lower nodes are allowed to exist. Obviously, there is data interaction between connected nodes, and unconnected nodes are logically independent from each other, i.e., there is no direct interaction.
Step 1.2: the design of the node database and the adjacent node backup library comprises the following steps:
any node not only has its own data, but also after step 3.4 is completed, the node generates records of the database of the adjacent node as a backup.
The step 1.2 has the following main problems to be solved: in the initial stage of the project, the server side of each node only has a database directly operated by a database administrator, and a backup library of an adjacent node does not exist.
The solution to the problem for importing initial data into a backup repository is as follows:
the initial import of the backup database is initialized once before the database difference comparison, and the operation generates a corresponding database according to the state of the current node database. It should be noted that, the backup library is designed, so that the difference comparison between adjacent nodes is performed by comparing the database of the node with the backup library of the node, rather than by comparing the databases of two adjacent nodes, and the reason of the scheme is as follows:
each node can obtain the adjacent snapshots of the node, if any node goes down, the difference data set reported by the adjacent nodes during the down period is invalid, but the sender node stores the snapshot of the receiver node.
1) The method comprises the steps that a snapshot before a database operator operates the database is temporarily stored in the databases of other nodes, if data deviation is caused by misoperation, and even data loss and recovery cannot be caused by failure of a storage device of the node, the problems can be solved through a backup library, namely, the state before the node fails is recovered according to the copy of the backup library of the adjacent node.
2) If the service end of a certain adjacent node temporarily does not provide service, the database of the node cannot be modified, so that the contents of the backup database and the database are consistent, and the actual difference data can still be compared by the difference comparison operation under the condition that the adjacent node is offline. When the down node provides the service again, the sender compares the service again, the difference data set with the node can still be obtained, and the synchronous operation can be carried out only by reporting the difference again.
Step 2.1: initial range selection, comprising the steps of:
the tables in the database are grouped, and each group of tables has a relation of connecting foreign keys, namely, the fields of the tables in the group are mutually dependent. The database manager selects the groups to be compared and synchronized to generate a corresponding data table. The delivery window is selected based on the actual demand and the size of the data table. In the embodiment of the invention, the data table structure is complex, the data volume for data synchronization among data clusters is huge, and in order to avoid the situation that a large number of data tables are operated at one time, the tables which are related to each other need to be grouped.
Step 2.2: the definition of the source database and the target database comprises the following steps:
data conversion technology in data management relates to conversion between different data standards, and the function of converting data sources of different types and different formats into target databases of different types and different formats needs to be realized. In order to realize the conversion between the heterogeneous data, the software develops a key technology of a mapping relation model so as to solve the problem of mutual conversion between different data.
When the special database adopts the report editing software to carry out data conversion, firstly, a source database and a target database which need to be converted are determined, mapping relations are generated by configuring tables, fields and data of the source database and tables, fields and data of the target database, so that the tables, the fields and the data of the two databases have corresponding relations, then the data of the source database is converted into the target database, and the mutual extraction of the data is achieved according to the bridge of the corresponding relations.
The difficulty of data conversion is to extract data of different types and formats for mutual conversion, and rules need to be defined when mutual conversion is carried out.
The basic principle of the data conversion technology based on the mapping relation model is to extract the basic information of data tables of different types and formats and establish the mapping relation for the data conversion process. The data conversion process is divided into two modes: (1) converting data from the source repository to the target repository; (2) data is converted from the target repository to the source repository. When data is converted from a source base to a target base, converting the data from the source base to the target base according to a mapping relation according to table mapping, field mapping and data mapping relation between the source base and the target base; and when the data is converted from the target library to the source library, converting the data from the target library to the source library according to the mapping relation according to the table mapping, the field mapping and the data mapping relation between the target library and the source library.
The key of the mapping relation model is to establish mapping information between an origin library and a target library, wherein the mapping information is divided into three layers, and the mapping relation of each layer is established based on the mapping relation of the previous layer. The first level is the mapping relation between tables and is established based on the entity tables existing in the source library or the target library; the second layer is the mapping relation between fields and is established based on the table mapping relation in the first layer; the third level is the mapping relation between data, the establishment of the mapping relation is based on the field mapping relation in the second level, the establishment of the mapping relation is the key and difficult point of the model, and the integrity, the correctness and the operability of the mapping relation directly determine the quality of data conversion. The relationship of the mapping is schematically shown in fig. 2.
Step 3.1: database cluster difference comparison, comprising the following steps:
the method comprises the steps of firstly establishing an initial backup point for a related delivery data table at an initial stage according to the delivery table range, storing the initial backup point in a backup database (physically isolated from a user database) as an initial comparison reference, and comparing the user database with the backup point to obtain data increment.
Updating data of any level of nodes, wherein the nodes are called as data sending sides, database operators log in a database comparison management website, firstly, the contents of changed data are compared with database snapshots of adjacent level nodes stored by a server of the sending sides through website operation, three differences are obtained, the three differences are added, updated and deleted, the three differences are packaged into a difference data result set, and then the obtained difference data result set is reported to a server of the adjacent node. The primary and backup database tables need to satisfy the condition that the data table fields are always present. Only under the condition that the fields of the two tables are the same, the difference comparison of the contents can be carried out, and the difference comparison of the tables with different fields has no practical significance.
Database administrators under different nodes can operate the tables, and differences of the nodes need to be displayed to the administrators so as to ensure the reasonability and correctness of changed data. The nodes of the database cluster have three types of data differences of adding, updating and deleting according to the primary key defined by the table structure. The specific implementation of the data difference rapid comparison function and the discrimination method of the difference types are as follows:
step 3.2: the method for comparing the newly added differences in the database comprises the following steps:
the data set of any data table under any node is set as U1, a certain node which can be reached by the node, the data table U2 which is the same as that of U1, is backed up under the data table U1 node, and a difference set is made according to the unique identifier (primary key) of each piece of data, namely U1-U2, so that newly added data of U1 can be obtained.
Step 3.3: database deletion difference comparison, comprising the steps of:
the deleted difference discrimination method is similar to the newly added method, and only U2-U1 needs to obtain the synchronous operation of the data database deleted by U1.
Step 3.4: the database modification difference comparison comprises the following steps:
u1 and U2 obtain the same data of the main keys of the two tables, and sequentially judge whether the data of other fields are equal under the condition that the main keys are the same. When any field data are unequal, the content of the data table is updated.
The flow of data difference alignment is shown in fig. 3, and the effect of difference alignment is shown in fig. 4.
Step 4.1: the synchronous reporting of the database comprises the following steps:
after the comparison operation, when the database administrator selects the data to be synchronized, the database cluster synchronization operation is started. The reporting result set is transmitted in a node tree structure, and the node receiving the message is called a data receiver. When the authority level of the receiver is higher than that of the sender, reporting is called, and otherwise, issuing is called. When the report data is sent to the node of the sender, the database operator logs in the database comparison management website, and after the operator checks the synchronous message data, the operator needs to choose to approve or reject the synchronous message data. The synchronization and rejection operations are performed in the following steps:
a. the database background manager logs in a database background management system website, and clones the specified special database data from the special database to a local database, wherein the process is a process of accessing the oracle database, the database background manager can access the website by using a browser to check the table name and the classification of the special database, and the information is the data stored in the special database.
b. After a local node clones a part of the databases of the special database, the node is a branch node of the main database. The database background manager can operate the table contents synchronized from the dedicated database to the local node, such as inserting new data, modifying stored data, and deleting stored data.
c. And the database background manager operates the data table at the local node, and can submit the modified local content to the special database warehouse after the operation is completed. The node is used as a branch created by the node on a single machine, and a database background manager selects to submit the change content, so that the synchronization of the special database warehouse and the partial data table of the local node is executed.
d. And when the synchronous operation is selected to be executed, updating the database table and updating the backup table snapshot corresponding to the sender. And the receiver sends a receipt message to the sender to prompt that the data is put in storage and update the backup table snapshot corresponding to the node of the sender. Further, when the node where the sender is located compares again, the difference of the data is not prompted, which indicates that the difference comparison of the data of the sender node and the data of the receiver node is completed and synchronization is realized.
e. When the refusal operation is selected to be executed, the database table is not updated, and the snapshot is not updated. The receiver does not send a receipt to the sender. Further, when the sender compares the data again, the difference of the data still exists, and the database of the receiver node and the database snapshot of the receiver node stored by the sender node are not modified.
Step 4.2: the synchronous issuing of the database comprises the following steps:
a. and a database background manager of any node logs in a database background management system website, and selects to update the data of the local warehouse of the node after cloning the specified special database data to the local database on the special database, namely, updates the central warehouse data change caused by the data submitted by other nodes from the last time submitted by the node to the current time.
b. If the operation obtains the updated data from the central warehouse and the change of the local warehouse, according to the operation feedback of the warehousing of the local nodes, if no repeated record exists, no conflict exists, and if the operation obtains the updated data from the central warehouse and the change of the local warehouse has repeated record, conflict may occur
c. And for the data database background management personnel with conflicts, the background management personnel need to choose to ignore the conflicts or solve the conflicts. When the conflict is selected to be ignored, for data with conflict, if the conflict exists after a certain record is changed, any data in the local warehouse cannot be changed, meanwhile, the data in the central warehouse of the special database can also be transmitted to the local warehouse, if the conflict exists when repeated data is inserted, two identical records can be submitted, and if the identical records are deleted, the deleting operation is executed. When conflict resolution is selected, the data background website can speak conflicting data to be displayed on the browser in a form, database background management personnel need to compare conflict reasons of each record and manually insert, modify or delete the conflicting data records, and when all conflicts are resolved, data is submitted again, and then synchronization from a certain node database warehouse to a special database warehouse is completed. The purpose of updating the data of the local node is to ensure that database management personnel of the node obtains the latest updated data, and if the old data is operated, the data consistency is affected, and the phenomenon of inconsistency between the updated old data and the new data exists.
The steps 4.1 and 4.2 have the following main problems to be solved:
1) and the external key conflict processing of the data table with external key dependence:
as can be seen from the step 4.1, in the embodiment of the present invention, the add or delete operations of some data foreign key association tables have sequential execution, that is, the add or delete operations of some tables must be executed after the add or delete operation tasks of other tables are executed, otherwise, the data storage fails due to foreign key collision, which causes the failure of the data synchronization mechanism.
The solution to the problem described above is as follows:
step 4.2.1: the foreign key dependence detection method firstly makes the following symbolic definitions on the description contents, and the specific definitions are shown in table 1:
TABLE 1 symbol definitions
Figure BDA0002876514880000091
Figure BDA0002876514880000101
The specific implementation steps are as follows:
(1) parent and child data are inquired from user _ constraints, wherein the user _ constraints are constraint records such as a main key and an external key, and the inquiry result is shown in a table 2;
TABLE 2 constraint record table for foreign keys
Figure BDA0002876514880000102
(2) Carrying out Cartesian product on user _ constraints to generate an undirected graph (shown in FIG. 2) by self-linking inner join keys by taking all tables of a database as a node set T, wherein uniform steps between any two nodes in the graph can be reached, and tn in the graph represents a redundant node without an external key;
(3) taking constraint _ name row as r _ constraint _ name row as a filtering condition associated with the self-connection mapping, and screening edges meeting the condition; using the where key as a filtering condition, filtering out the data table not containing fk, rfk, which is equivalent to removing the irrelevant nodes and edges in fig. 2, and the filtered result is shown in fig. 3;
(4) and determining the tables with the dependence of the foreign key, and converting the undirected graph into the directed graph. The connect by means of the method comprises the following steps that (1) the data of each row are retrieved according to a chain type hierarchical relationship, the key word of the prior represents the direction of a pointer and is placed in one of two columns of a connection relationship, one side where the prior is located represents the in-degree, and the other side represents the out-degree;
(5) the start with key is used to identify the starting node of the search pattern structure, i.e. the starting node that needs to perform the write operation, i.e. the input of the program.
Through the above-mentioned flow, the undirected graph shown in fig. 6 is finally converted into a directed graph shown in fig. 5. The results of the execution of the complete sql statement are shown in table 1, where parent and child sets are T. With T ═ Ta,tb,tc,td,teBuilding an adjacency matrix table for rows and columns, firstly, initializing all data of the adjacency matrix table to 0, and then looking up Reference in table 3If there is Reference { t ═ t }i,tjAnd updating the element of the ith row and the jth column of the adjacent matrix to be 1. The adjacency matrix of fig. 6 is shown in table 2.
Step 4.2.2: the foreign key conflict resolution method comprises the following steps:
(1) selecting a database operator to start to execute a write operation data table as an initial data table;
(2) the user operates and accesses the data table, and defines the table as the accessed data table, so that repeated access of recursive backtracking is prevented;
(3) and (4) judging whether the data table has a data table with foreign key association which is not accessed, if not, executing (4). If yes, judging whether the data table which is not accessed exists and is referenced by the foreign key of other tables. If the child node is not referenced by the foreign key of other tables, the child node is accessed, and the execution is started (2), otherwise, the execution is executed (5).
(4) If the node is the root node, the access is finished, otherwise, the step (3) is executed.
(5) And marking the degree of entry of the node as 1, and jumping to (3).
Step 4.3: in the embodiment of the present invention, the network transmission scheme of the synchronization data is as follows:
data communication is realized by the dll dynamic link library of the embodiment of the invention, and at the service end of software, a service program calls the dll dynamic link library through jna to realize communication among nodes.
When any node initiates a synchronization request, a service process is required to call the import of oracle service, export a table function, package data to be transmitted into a file in a dmp format and send the file to a server of an adjacent node, and a database under the node records the time of sending the message request and the unique primary key identifier of the message.
When other nodes receive the synchronization request, the database operator can see the difference between the data under the node and the data requested to be synchronized by other nodes, and at the moment, after the administrator confirms that the synchronization content is error-free, the administrator can choose to accept the difference. If there is a problem with the data content being synchronized, the administrator should refuse synchronization.
When the database administrator chooses to approve, the node invokes the import function of the oracle service, imports the received dmp file into the database, and sends an approval receipt to the sender. And after the sender receives the agreement receipt, the local backup table of the receiver at the sender needs to be updated, and the table content of the sender and the table content of the receiver are consistent, namely the synchronization is finished.
When the database administrator chooses to reject, there is no need to import the received dmp file into the database and send a reject receipt to the sender. When the sender receives the refusal receipt, the backup table does not need to be updated.
Step 4.4: the scheme of user authority verification and safety processing of authority data is as follows:
in the embodiment of the present invention, the private database delivery connection relationship is in a tree-type cascade structure from the hierarchy, that is, a node in the delivery hierarchy usually has an upper level and several lower levels (except for the top private database and the bottom private database). Assuming that a node at the current level has an upper level and a lower level, the node at the current level has an initial effective backup point at the time of T1, data increment delta T1 in the time is reported to the upper level and the lower level to generate a temporary backup point at the time of T2, the upper level agrees to the reported data and the lower level rejects the reported data at the time of T3, if the temporary backup point is replaced by the effective backup point, the data increment reported is delta T2 plus delta T3 when data is reported again at the time of T4, and the sent data lacks the original delta T1; if the effective backup point is kept unchanged, the data increment reported at the time T4 is delta T1+ delta T2+ delta T3, and the reported data is repeated by delta T1. Therefore, the problem of contradiction of update logic of the backup point can occur when the single backup point is adopted for reporting in the tree-type cascading reporting system.
The solution to the above problem is as follows:
the software is supposed to adopt an incremental data capture mode based on multiple backup points. The above-described delivery scenario is still used as an example for explanation. At time T1, the current level has both the upper level initial active backup point and the lower level initial active backup point. When data is reported at time T2, an upper temporary backup point and a lower temporary backup point are generated. At the time T3, if the upper level replies to the agreement message, the upper level temporary backup point generated at the time T2 replaces the original upper level effective backup point to become a new data comparison reference; and if the lower level replies the rejection message, the lower level effective backup point remains unchanged, and the temporary backup point generated at the time T2 fails. When data is reported again at the time of T4, the reported data takes the upper effective backup point at the time of T2 as a data comparison reference, and the reported data increment is delta T2+ delta T3; the issued data takes the lower effective backup point at the initial time of T1 as a data comparison reference, and the reported data increment is delta T1+ delta T2+ delta T3, so that the problems of loss, repetition and the like of the reported data are avoided, and the problem of contradiction of update logics of the backup points is solved. As shown in fig. 7, the authority level is marked for each node, the authority marks of the data fields with authority data limitation are compared, if the authority mark level of the node is higher than the authority mark level of the received data, the record can be put into a warehouse, otherwise, the authority mark level of the node is lower than the authority mark level of the received data, the record is filtered.
And 5: the database maintenance comprises service data maintenance, dictionary data maintenance, data backup and restoration and log management. The specific embodiment is as follows:
step 5.1: and the service data maintenance comprises the function of providing cooperative maintenance operation on the service data according to the distributed data authority and supporting the realization of data maintenance operations such as addition, deletion, editing, query and the like on various service data according to the distributed authority.
Step 5.2: dictionary data maintenance, comprising the steps of:
(1) maintaining tree dictionary data: the tree dictionary data maintenance provides an editing processing function of tree dictionary data such as an army sequence list, an equipment sequence list and the like, supports the display of data in a table form, and supports the information function of a query node and a direct subordinate node thereof.
(2) Maintaining the data of the special dictionary: the special dictionary data maintenance provides an editing processing function of a special dictionary table, and supports the function that part of constrained fields are directly selected through a hooked dictionary, such as: the army number field supports matching according to conditions input by a user and selects from the articulated number dictionary; and the airplane model field supports selection from the hooked airplane model dictionary.
(3) General dictionary data maintenance: the general dictionary data maintenance provides basic functions of addition, deletion, modification and the like of general dictionary data, supports a node level editing function in a graphical and dragging mode, can realize sequencing of data items at the same level through dragging, can also realize dragging of the data items to any level subsequence of the specified data items, or can realize dragging of the data items in the subsequence to any level subsequence of the specified data items.
Step 5.3: data backup, comprising the following steps:
(1) the user logs in successfully and forwards the login request to the data backup and restoration function through the authentication request; when data loading is executed, full-library loading of data is supported; when data backup and restoration are executed, the functions of data backup, restoration and clearing are supported, and the operation is supported according to a full library, a database user and Blob data;
(2) the data backup and restoration function sends a data access request to the special database;
(3) the special database returns a data result;
(4) and the data backup and reduction function feeds back the operation result to the user.
Step 5.4: data reduction, comprising the following steps:
(1) a user logs in the system, and the system provides a service data maintenance function for the user;
(2) after the user successfully logs in, the system confirms the data authority of the user, including the authority of a maintenance unit and a service module;
(3) entering a service data maintenance process;
(4) after the service data maintenance is finished, whether maintenance errors exist or not is judged, and when no maintenance errors exist, the process is ended; when maintenance errors occur, data rollback is executed, and the data maintenance process is continued until the data maintenance is correct;
(5) and feeding back the related data change tracking information and the data rollback information maintained by the service data to the log management module.
Step 5.5: database log management, as shown in fig. 8, includes the following steps:
(1) the user logs in successfully and forwards the log management function to the log management function through the authenticated request;
(2) the log management function sends a data access request to a special database;
(3) the special database returns a data result;
(4) and the log management function feeds back the operation result to the user.
Example 2:
the invention provides a data standard upgrading method facing a naval combat data application library, which is based on application software of an original client/server (C/S) and is upgraded into web application software facing a browser/server (B/S) and multiple users and multiple nodes under the condition of not changing the design structure of an original data table so as to solve the problems of maintaining and managing a special database under various service requirements and reporting data of a cross-region cascade database under the condition of a military special network, and comprises the following steps:
1) a multilayer system B/S architecture is adopted. Following an SOA system, adopting component standardization software design, and coupling and integrating the sub-projects;
2) providing a data maintenance function, and supporting service data maintenance and dictionary data maintenance;
3) the method supports the dynamic generation of a service data maintenance page, supports the graphical and dragging modes to carry out node level editing, and supports the functions of node query and direct subordinate node information thereof;
4) providing a data rollback function and supporting data rollback configured based on user authority;
5) providing a function of examining selected data table records by a user, supporting various verification rules and providing a management function of the verification rules, supporting dynamic generation and loading of verification expressions and providing a data verification processing function;
6) the method supports a data reorganization function, provides a function of data cutting and library generation according to cutting conditions, supports a data disguising function according to a mainstream data disguising rule and a user-defined pseudo rule, and supports a function of export rule setting and related data generation;
7) providing a report self-defining configuration management function, and supporting self-defining multi-layer header management and statistical report generation;
8) the database backup and recovery functions of various conditions are supported;
9) providing a customized data conversion function of a special database and a related department database, and supporting data conversion based on user service rules;
10) the online data comparison report and the offline data report of the cascade database are supported, and the rapid data comparison analysis and the friendly man-machine interaction interface are provided.
11) Supporting data transmission based on a military data transmission protocol;
12) the online updating and management functions of the database version are supported;
13) the state visualization of an online collecting and reporting system is supported;
14) providing a data operation log function, and supporting the inquiry of detailed data change tracking information generated in the data maintenance process according to various conditions;
15) the data authority configuration combined with the application requirement characteristics of the special database is supported, and the data authority configuration comprises a configuration service module authority and a data authority configuration function;
16) and a unified picture management function is provided, and picture import and export according to a service table are supported.
The invention aims to provide a novel database cluster difference comparison and synchronization mechanism, which can realize the feedback of changed data after the database is changed, and the synchronization mechanism is designed in such a way that the generated difference data are manually confirmed by database management personnel and then synchronized in an asynchronous manner.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (9)

1. A method for database cluster difference comparison and data synchronization, comprising:
step 1: constructing a database cluster, a node database and an adjacent node backup library;
step 2: grouping tables in a database, wherein each group of tables has a relation of external key connection, fields of the tables in the group are mutually dependent, selecting the groups needing comparison and synchronization, and generating corresponding data tables;
and step 3: establishing an initial backup point for the delivery data table, storing the initial backup point in a backup database and a database as an initial comparison reference for comparison, and acquiring data increment;
and 4, step 4: synchronously reporting the database, comparing the management website and verifying the synchronous data when the reported data is sent to the node of the sender, and selecting to approve or reject;
and 5: and maintaining the database, including service data maintenance, dictionary data maintenance, data backup and restoration and log management.
2. The database cluster difference comparison and data synchronization method according to claim 1, wherein the step 1 comprises:
the database cluster is of a tree structure, the higher the depth, the higher the node level, the data interaction exists between the connected nodes, the disconnected nodes are logically independent, and the direct interaction does not exist;
any node has own data, and the node generates records of the adjacent node database as backup, and the difference comparison between the adjacent nodes is the comparison between the node database and the node backup database.
3. The database cluster difference comparison and data synchronization method according to claim 1, wherein the step 2 comprises:
and adopting the editing software to perform data conversion, generating a mapping relation by configuring the table, the field and the data of the source database and the table, the field and the data of the target database, converting the data of the source database into the target database, and performing mutual extraction of the data according to the corresponding relation.
4. The database cluster difference comparing and data synchronizing method according to claim 1, wherein the step 3 comprises:
and (3) newly adding a database for difference comparison: setting the data set of any data table under any node as U1, backing up the data table U2 identical to U1 to the data table U1 node by a certain node reached by the node, and making a difference set according to the main key of each piece of data, namely U1-U2, so as to obtain newly added data of U1;
database deletion difference comparison: comparing with the newly added difference, and obtaining the data deleted by U1 from U2-U1;
database modification difference alignment: u1 and U2 obtain the same data of two table primary keys, judge whether the data of other fields are equal in turn under the condition that the primary keys are the same, and when any field data are unequal, the content of the data table is updated.
5. A system for database cluster difference comparison and data synchronization, comprising:
module M1: constructing a database cluster, a node database and an adjacent node backup library;
module M2: grouping tables in a database, wherein each group of tables has a relation of external key connection, fields of the tables in the group are mutually dependent, selecting the groups needing comparison and synchronization, and generating corresponding data tables;
module M3: establishing an initial backup point for the delivery data table, storing the initial backup point in a backup database and a database as an initial comparison reference for comparison, and acquiring data increment;
module M4: synchronously reporting the database, comparing the management website and verifying the synchronous data when the reported data is sent to the node of the sender, and selecting to approve or reject;
module M5: and maintaining the database, including service data maintenance, dictionary data maintenance, data backup and restoration and log management.
6. The database cluster difference alignment and data synchronization system as claimed in claim 5, wherein said module M1 comprises:
the database cluster is of a tree structure, the higher the depth, the higher the node level, the data interaction exists between the connected nodes, the disconnected nodes are logically independent, and the direct interaction does not exist;
any node has own data, and the node generates records of the adjacent node database as backup, and the difference comparison between the adjacent nodes is the comparison between the node database and the node backup database.
7. The database cluster difference alignment and data synchronization system as claimed in claim 5, wherein said module M2 comprises:
and adopting the editing software to perform data conversion, generating a mapping relation by configuring the table, the field and the data of the source database and the table, the field and the data of the target database, converting the data of the source database into the target database, and performing mutual extraction of the data according to the corresponding relation.
8. The database cluster difference alignment and data synchronization system as claimed in claim 5, wherein said module M3 comprises:
and (3) newly adding a database for difference comparison: setting the data set of any data table under any node as U1, backing up the data table U2 identical to U1 to the data table U1 node by a certain node reached by the node, and making a difference set according to the main key of each piece of data, namely U1-U2, so as to obtain newly added data of U1;
database deletion difference comparison: comparing with the newly added difference, and obtaining the data deleted by U1 from U2-U1;
database modification difference alignment: u1 and U2 obtain the same data of two table primary keys, judge whether the data of other fields are equal in turn under the condition that the primary keys are the same, and when any field data are unequal, the content of the data table is updated.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
CN202011636626.3A 2020-12-31 2020-12-31 Database cluster difference comparison and data synchronization method, system and medium Active CN112579613B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011636626.3A CN112579613B (en) 2020-12-31 2020-12-31 Database cluster difference comparison and data synchronization method, system and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011636626.3A CN112579613B (en) 2020-12-31 2020-12-31 Database cluster difference comparison and data synchronization method, system and medium

Publications (2)

Publication Number Publication Date
CN112579613A true CN112579613A (en) 2021-03-30
CN112579613B CN112579613B (en) 2023-02-17

Family

ID=75144577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011636626.3A Active CN112579613B (en) 2020-12-31 2020-12-31 Database cluster difference comparison and data synchronization method, system and medium

Country Status (1)

Country Link
CN (1) CN112579613B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113986914A (en) * 2021-09-30 2022-01-28 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Tree type cascade database synchronization system and method based on data increment
CN117407407A (en) * 2023-12-15 2024-01-16 中信证券股份有限公司 Method, device, equipment and computer medium for updating multi-heterogeneous data source data set
WO2024037223A1 (en) * 2022-08-18 2024-02-22 华为云计算技术有限公司 Data operation method and apparatus

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593422A (en) * 2013-11-01 2014-02-19 国云科技股份有限公司 Virtual access management method of heterogeneous database
CN104573100A (en) * 2015-01-29 2015-04-29 无锡江南计算技术研究所 Step-by-step database synchronization method with autoincrement identifications
CN104598610A (en) * 2015-01-29 2015-05-06 无锡江南计算技术研究所 Step-by-step database data distribution uploading and synchronizing method
CN106610876A (en) * 2015-10-23 2017-05-03 中兴通讯股份有限公司 Method and device for recovering data snapshot
CN108153619A (en) * 2017-12-25 2018-06-12 杭州恩牛网络技术有限公司 A kind of data proofreading method and device
CN109189860A (en) * 2018-10-19 2019-01-11 山东浪潮云信息技术有限公司 A kind of active and standby increment synchronization method of MySQL based on Kubernetes system
CN109857596A (en) * 2019-03-07 2019-06-07 张倩 Time consistency backup method, equipment, system, device and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593422A (en) * 2013-11-01 2014-02-19 国云科技股份有限公司 Virtual access management method of heterogeneous database
CN104573100A (en) * 2015-01-29 2015-04-29 无锡江南计算技术研究所 Step-by-step database synchronization method with autoincrement identifications
CN104598610A (en) * 2015-01-29 2015-05-06 无锡江南计算技术研究所 Step-by-step database data distribution uploading and synchronizing method
CN106610876A (en) * 2015-10-23 2017-05-03 中兴通讯股份有限公司 Method and device for recovering data snapshot
CN108153619A (en) * 2017-12-25 2018-06-12 杭州恩牛网络技术有限公司 A kind of data proofreading method and device
CN109189860A (en) * 2018-10-19 2019-01-11 山东浪潮云信息技术有限公司 A kind of active and standby increment synchronization method of MySQL based on Kubernetes system
CN109857596A (en) * 2019-03-07 2019-06-07 张倩 Time consistency backup method, equipment, system, device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王佳敏: "基于冲突依赖消除的网络表格外键检测算法", 《计算机科学》 *
王智铎: "基于有向图的外键冲突解决算法设计与实现", 《计算机工程》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113986914A (en) * 2021-09-30 2022-01-28 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Tree type cascade database synchronization system and method based on data increment
CN113986914B (en) * 2021-09-30 2023-01-24 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Tree type cascade database synchronization system and method based on data increment
WO2024037223A1 (en) * 2022-08-18 2024-02-22 华为云计算技术有限公司 Data operation method and apparatus
CN117407407A (en) * 2023-12-15 2024-01-16 中信证券股份有限公司 Method, device, equipment and computer medium for updating multi-heterogeneous data source data set
CN117407407B (en) * 2023-12-15 2024-04-12 中信证券股份有限公司 Method, device, equipment and computer medium for updating multi-heterogeneous data source data set

Also Published As

Publication number Publication date
CN112579613B (en) 2023-02-17

Similar Documents

Publication Publication Date Title
CN112579613B (en) Database cluster difference comparison and data synchronization method, system and medium
US20200228393A1 (en) System and method for data replication using a single master failover protocol
US10248704B2 (en) System and method for log conflict detection and resolution in a data store
US10496667B2 (en) System and method for maintaining a master replica for reads and writes in a data store
US8930312B1 (en) System and method for splitting a replicated data partition
US9069827B1 (en) System and method for adjusting membership of a data replication group
CN113515499B (en) Database service method and system
CN108804253B (en) Parallel operation backup method for mass data backup
US7447709B1 (en) Methods and apparatus for synchronizing content
CA2281367C (en) Method and apparatus for simplified administration of large numbers of similar information handling servers
CN110209653B (en) HBase data migration method and device
US7720884B1 (en) Automatic generation of routines and/or schemas for database management
CN111753013A (en) Distributed transaction processing method and device
CN111401029A (en) Document version updating system and method based on document partition and collaborative editing
JP2023509035A (en) Transaction processing method, apparatus, computer device and computer program
US20240143386A1 (en) Using multiple blockchains for applying transactions to a set of persistent data objects in persistent storage systems
CN111930850A (en) Data verification method and device, computer equipment and storage medium
JP2023546897A (en) Object processing methods, devices, and computer equipment
CN110990336A (en) Industrial control-oriented function design method and system
JP4951137B2 (en) How to manage the database
CN116149713B (en) Program upgrading method and device for all-level equipment under tree-type heterogeneous network
CN114595288A (en) SQL command level-based multi-IDC distributed system data synchronization method
CN113128909A (en) Server management method based on power resources
CN114442947B (en) Cross-domain bucket deleting method, system, terminal and storage medium
Abuya et al. An Improved Failure Recovery Algorithm In Two-Phase Commit Protocol For Transaction Atomicity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant