CN117971978A

CN117971978A - Multi-center database data comparison method

Info

Publication number: CN117971978A
Application number: CN202410167095.XA
Authority: CN
Inventors: 吉怀胜; 仇东标
Original assignee: Focus Technology Co Ltd
Current assignee: Focus Technology Co Ltd
Priority date: 2024-02-06
Filing date: 2024-02-06
Publication date: 2024-05-03

Abstract

The invention discloses a multi-center database data comparison method, which is characterized by comprising the following steps: a plurality of service nodes are deployed in different machine rooms for managing database data of the machine room, and the different machine rooms are communicated in real time through network connection so as to support data synchronization and comparison operation; configuring a data source and a database table to be compared, selecting a node for comparison, wherein the node for comparison is responsible for calling other computer room nodes to acquire the abstract of the data, and after receiving the responses of the other computer room nodes, the node for comparison starts to compare the data to determine whether inconsistent data exists; and correcting inconsistent data, wherein the correction mode comprises manual correction and automatic correction. The problems of slow data query, low comparison efficiency, inconsistent data and the like of a cross-machine room are solved, and the effect of high-efficiency data comparison is achieved.

Description

Multi-center database data comparison method

Technical Field

The invention relates to the field of internet data comparison, in particular to a multi-center database data comparison method.

Background

With the continuous development of modern information technology, data alignment is becoming more and more important in various fields. Many businesses and organizations rely on data alignment to ensure accuracy and consistency of data. In particular in a multi-center database environment, data alignment becomes particularly critical. Multi-center databases are typically deployed in multiple geographic locations to improve performance, availability, and fault tolerance. This distributed deployment strategy introduces challenges of data consistency while improving performance. In a multi-machine room deployment, especially in a multi-write (multi-write) scenario, data updates may conflict between different machine rooms, resulting in data inconsistencies. Such inconsistencies may lead to serious consequences such as lost data, business errors, or data corruption. The data comparison can find and solve the inconsistent situation of the data by comparing the data in different databases, thereby ensuring the accuracy and reliability of the data.

At present, many methods for solving the comparison problem of the multi-center database are available, including manual comparison, automatic comparison algorithm, data matching technology and the like. However, due to the ever-increasing size and complexity of multi-center databases, conventional alignment methods may face challenges such as reduced performance, reduced matching accuracy, and the like. Therefore, there is an urgent need for a more efficient and accurate data comparison method to ensure data consistency and reliability of multi-center databases. Therefore, the invention aims at providing an efficient and low-transmission comparison method for solving the problem of data comparison of multi-data center databases of different scales so as to ensure the high consistency of the multi-machine-room databases and meet the requirements of modern distributed database applications.

Disclosure of Invention

The background art reveals the problem of data inconsistencies in multi-machine-room database deployments, as well as the shortcomings of the prior art. The invention provides an innovative multi-center database data comparison method, which aims to solve the challenges and ensure the high consistency of multi-machine room database data so as to meet the requirements of modern distributed database applications.

Specifically, the invention provides a multi-center database data comparison method, which comprises the following steps:

Step 1: a plurality of service nodes are deployed in different machine rooms for managing database data of the machine room, and the different machine rooms are communicated in real time through network connection so as to support data synchronization and comparison operation;

Step 2: configuring a data source and a database table to be compared, selecting a node for comparison, wherein the node for comparison is responsible for calling other computer room nodes to acquire the abstract of the data, and after receiving the responses of the other computer room nodes, the node for comparison starts to compare the data to determine whether inconsistent data exists; the method comprises the following specific steps:

Step 2-1: the nodes for comparison count the data amount in the database table to be compared according to the configuration information, and if the data amount in the database table exceeds a threshold value, the acquired data is written into a disk first;

Step 2-2: the nodes for comparison inquire all the primary keys in the database table, each 200 primary keys are a task, and according to the machine room where the data source is located, nodes of other machine rooms are called by multithreading to acquire the abstract of the data;

step 2-3: comparing the acquired data, and recording inconsistent data;

Step 3: and correcting inconsistent data, wherein the correction mode comprises manual correction and automatic correction.

In the step 1, the real-time communication is based on a remote procedure call protocol, and is used for sending out a request and obtaining the response of other machine room nodes;

In particular, the services provided by the nodes include data query and digest calculation, when the nodes of other machine rooms call the service nodes through a remote procedure call protocol (RPC), the service nodes query database tables appointed by the other machine rooms and calculate digests of data, wherein the digests are data fingerprints generated by using a predetermined algorithm, and the digests are used as responses of remote call.

In step 1, the predetermined algorithm specifically includes obtaining column values of each row of the database table, and processing the column values of a specific type in the process to convert the spliced character string into a data fingerprint with a fixed length for realizing unique identification and quick comparison of data.

In the step 1, the specific type of the column value includes Date, the character string includes a time stamp character string, and the algorithm of the character string conversion includes MD5 and CRC32.

In the step 1, the database types include oracle and mysql; more than one node is deployed in the same machine room; the node is connected with the database by adopting a database connection pool.

The step 3 further includes writing the difference data into a difference report, and feeding back the difference report to the client; for inconsistent data, the record primary key is stored as a history record so as to facilitate the client to review.

And step 3, the manual correction is required to manually select the data to be corrected from each piece of inconsistent data, and a correction mode is determined, wherein the correction mode comprises updating, covering and deleting, and the original value is recorded in advance for rolling back the error data no matter the manual correction or the automatic correction.

The invention has the beneficial effects that:

(1) According to the invention, through RPC communication, machine resources are fully utilized, data are rapidly compared, and the problem of data inconsistency is found;

(2) The invention is suitable for multi-center database systems with different scales, and can expand the node number according to the need;

(3) The invention adopts abstract calculation and high-efficiency comparison algorithm, reduces bandwidth occupation and reduces performance cost;

(4) The invention can automatically identify and resolve inconsistent data, and reduce manual intervention.

Drawings

FIG. 1 is a flow chart of a multi-center database data comparison method in an embodiment of the invention;

FIG. 2 is a flowchart of database data collection in an embodiment of the present invention;

FIG. 3 is a schematic diagram of a data comparison flow in an embodiment of the present invention;

fig. 4 is a schematic diagram of a data correction flow in an embodiment of the invention.

Detailed Description

The invention is further described below with reference to the drawings and exemplary embodiments.

Fig. 1 is a flowchart of a multi-center database comparison method in an embodiment of the present invention, which specifically includes:

Step 101, calling other machine room nodes to inquire the appointed data of the database, calculating the abstract and storing the inquired result;

step 102, comparing the data of different machine room databases, and analyzing difference data;

And 103, correcting the compared inconsistent data by using a manual correction or automatic correction mode, and recording a main key of the difference.

In step 101, as shown in fig. 2, specifically:

Step 101-1: the data of the data source difference table is used as the reference to the data of other data source difference tables to be compared according to the need, firstly, a statistical SQL (structured query language), such as 'select count (1) from x', is constructed, a designated database is connected, and a query SQL is sent to obtain the record number in the table.

Step 101-2: it is determined whether data needs to be written to disk. If the number of records in the table exceeds a million threshold (the threshold can be flexibly adjusted according to the size of the machine memory), a file writing thread is started, data in the memory is written into a disk, the data is prevented from exceeding the limit of the memory too much, and each disk file stores one million records at most;

Step 101-3: and acquiring all the main keys in the table, and acquiring other machine room data through the main key inquiry. While querying all the primary keys of the desired alignment, for better alignment, the system intelligently groups the primary keys into finer task units, each group containing a portion of the primary keys, e.g., 200 primary keys into a group, while querying the specific content of the desired alignment. Then, the task units transmit the primary key information and the task information contained in the task units as parameters to nodes of other machine rooms. After receiving the task information, the service node efficiently queries the data of the database of the machine room, for each row of data in the query result, the node firstly processes the column of a specific type (for example, converts the column value of the Date type into a time stamp character string accurate to nanoseconds so as to improve the accuracy of comparison), then the node splices each column value after processing into the character string, performs summary calculation by adopting a summary algorithm such as MD5 or CRC32, and finally responds the data summary to a calling party, thereby avoiding frequent large-scale data transmission, greatly reducing the use of network bandwidth and improving the efficiency and accuracy of data comparison. Such task fine-grained optimization and efficient data processing strategies together provide significant improvements in system performance and response speed.

Step 101-4: and comparing the data of all the machine rooms. And the calling party stores the received data abstract by using a key-value structure, wherein the key is a main key, the value is the data abstract, and the data of different machine rooms are stored in respective data sets. If a file writing thread exists, the file writing thread writes the response data of the service node into the disk file;

The step 102, as shown in fig. 3, specifically includes:

Step 102-1: selecting one data set as a standard data set, traversing each piece of data in the standard data set, and comparing with other data sets;

Step 102-2: for each piece of data in the standard dataset, it is checked whether the other datasets all contain the data.

Step 102-3: if a certain data set lacks a certain piece of data in the standard data set, directly marking the certain piece of data as inconsistent data;

Step 102-4: if all of the data sets contain a piece of data in the standard data set, a consistency check is performed if the check does not pass the marking of the piece of data as inconsistent data

The specific correction procedure in step 103 is shown in fig. 4, and specifically includes:

step 103-1: and acquiring data source information and difference table information and corresponding difference primary key information.

Step 103-2: and through the association relation between the pre-established database table and the plugin, evaluating whether the difference table is associated with the correction plugin or not so as to determine whether to adopt automatic correction or not. The association relation enables the system to intelligently call the corresponding plug-in to automatically correct when inconsistent data is found, so that manual intervention is obviously reduced, and the adaptability and maintainability of the whole system are enhanced. In addition, the method also provides convenience for adding new correction rules in the future. The correction logic is encapsulated in the code and the plug-in, and the invention provides a more intelligent and reliable solution for maintaining the data consistency.

Step 103-3: if the difference table has an associated correction plug-in, traversing the difference primary key, transmitting the data source information and the primary key value as parameters to the plug-in, and automatically executing data correction by the plug-in. Compared with the traditional manual correction, the automatic correction method greatly improves the correction efficiency and accuracy, and the advantages of the automatic correction method are more obvious along with the increase of the data volume.

Step 103-4: if the difference table does not have an associated correction plug-in, manual intervention is necessary, the system records the inconsistent data to form a difference report, and the system notifies the manual work to process in the form of mail. And selecting one reference data source from the data sources according to the service requirement by manpower, and manually correcting the difference table data of other data sources by taking the data as the reference data. Common correction modes include update, overlay and deletion, and for corrected data, the system records the original value before correction to prevent correction errors and thus roll back the data. And the system can save the primary key of the difference record as a history record for subsequent review.

The invention has the beneficial effects that:

The present invention is also capable of other embodiments, which are not limited in any way by the above examples, and various changes and modifications can be made by one skilled in the art without departing from the spirit and substance of the invention, and it is intended to cover all such modifications and applications of the above examples as fall within the scope of the appended claims.

Claims

1. A multi-center database data comparison method, comprising the steps of:

step 2-3: comparing the acquired data, and recording inconsistent data;

2. A multi-center database data comparison method as claimed in claim 1, wherein: in the step 1, the real-time communication is based on a remote procedure call protocol, and is used for sending out a request and obtaining the response of other machine room nodes;

3. A multi-center database data comparison method as claimed in claim 2, wherein: in step 1, the predetermined algorithm specifically includes obtaining column values of each row of the database table, and processing the column values of a specific type in the process to convert the spliced character string into a data fingerprint with a fixed length for realizing unique identification and quick comparison of data.

4. A multi-center database data comparison method as claimed in claim 3, wherein: in the step 1, the specific type of the column value includes Date, the character string includes a time stamp character string, and the algorithm of the character string conversion includes MD5 and CRC32.

5. The multi-center database data comparison method of claim 4, wherein: in the step 1, the database types include oracle and mysql; more than one node is deployed in the same machine room; the node is connected with the database by adopting a database connection pool.

6. The multi-center database data comparison method of claim 5, wherein step 3 further comprises writing the discrepancy data into a discrepancy report, and feeding back the discrepancy report to the client; for inconsistent data, the record primary key is stored as a history record so as to facilitate the client to review.

7. The method of claim 6, wherein the step 3 further comprises the step of manually selecting the data to be corrected from each piece of inconsistent data, and deciding the correction mode, wherein the correction mode comprises updating, overlaying and deleting, and the manual correction and the automatic correction are used for recording the original value in advance and rolling back the error data.