CN117971978A - Multi-center database data comparison method - Google Patents

Multi-center database data comparison method Download PDF

Info

Publication number
CN117971978A
CN117971978A CN202410167095.XA CN202410167095A CN117971978A CN 117971978 A CN117971978 A CN 117971978A CN 202410167095 A CN202410167095 A CN 202410167095A CN 117971978 A CN117971978 A CN 117971978A
Authority
CN
China
Prior art keywords
data
comparison
database
nodes
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410167095.XA
Other languages
Chinese (zh)
Inventor
吉怀胜
仇东标
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Focus Technology Co Ltd
Original Assignee
Focus Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Focus Technology Co Ltd filed Critical Focus Technology Co Ltd
Priority to CN202410167095.XA priority Critical patent/CN117971978A/en
Publication of CN117971978A publication Critical patent/CN117971978A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-center database data comparison method, which is characterized by comprising the following steps: a plurality of service nodes are deployed in different machine rooms for managing database data of the machine room, and the different machine rooms are communicated in real time through network connection so as to support data synchronization and comparison operation; configuring a data source and a database table to be compared, selecting a node for comparison, wherein the node for comparison is responsible for calling other computer room nodes to acquire the abstract of the data, and after receiving the responses of the other computer room nodes, the node for comparison starts to compare the data to determine whether inconsistent data exists; and correcting inconsistent data, wherein the correction mode comprises manual correction and automatic correction. The problems of slow data query, low comparison efficiency, inconsistent data and the like of a cross-machine room are solved, and the effect of high-efficiency data comparison is achieved.

Description

Multi-center database data comparison method
Technical Field
The invention relates to the field of internet data comparison, in particular to a multi-center database data comparison method.
Background
With the continuous development of modern information technology, data alignment is becoming more and more important in various fields. Many businesses and organizations rely on data alignment to ensure accuracy and consistency of data. In particular in a multi-center database environment, data alignment becomes particularly critical. Multi-center databases are typically deployed in multiple geographic locations to improve performance, availability, and fault tolerance. This distributed deployment strategy introduces challenges of data consistency while improving performance. In a multi-machine room deployment, especially in a multi-write (multi-write) scenario, data updates may conflict between different machine rooms, resulting in data inconsistencies. Such inconsistencies may lead to serious consequences such as lost data, business errors, or data corruption. The data comparison can find and solve the inconsistent situation of the data by comparing the data in different databases, thereby ensuring the accuracy and reliability of the data.
At present, many methods for solving the comparison problem of the multi-center database are available, including manual comparison, automatic comparison algorithm, data matching technology and the like. However, due to the ever-increasing size and complexity of multi-center databases, conventional alignment methods may face challenges such as reduced performance, reduced matching accuracy, and the like. Therefore, there is an urgent need for a more efficient and accurate data comparison method to ensure data consistency and reliability of multi-center databases. Therefore, the invention aims at providing an efficient and low-transmission comparison method for solving the problem of data comparison of multi-data center databases of different scales so as to ensure the high consistency of the multi-machine-room databases and meet the requirements of modern distributed database applications.
Disclosure of Invention
The background art reveals the problem of data inconsistencies in multi-machine-room database deployments, as well as the shortcomings of the prior art. The invention provides an innovative multi-center database data comparison method, which aims to solve the challenges and ensure the high consistency of multi-machine room database data so as to meet the requirements of modern distributed database applications.
Specifically, the invention provides a multi-center database data comparison method, which comprises the following steps:
Step 1: a plurality of service nodes are deployed in different machine rooms for managing database data of the machine room, and the different machine rooms are communicated in real time through network connection so as to support data synchronization and comparison operation;
Step 2: configuring a data source and a database table to be compared, selecting a node for comparison, wherein the node for comparison is responsible for calling other computer room nodes to acquire the abstract of the data, and after receiving the responses of the other computer room nodes, the node for comparison starts to compare the data to determine whether inconsistent data exists; the method comprises the following specific steps:
Step 2-1: the nodes for comparison count the data amount in the database table to be compared according to the configuration information, and if the data amount in the database table exceeds a threshold value, the acquired data is written into a disk first;
Step 2-2: the nodes for comparison inquire all the primary keys in the database table, each 200 primary keys are a task, and according to the machine room where the data source is located, nodes of other machine rooms are called by multithreading to acquire the abstract of the data;
step 2-3: comparing the acquired data, and recording inconsistent data;
Step 3: and correcting inconsistent data, wherein the correction mode comprises manual correction and automatic correction.
In the step 1, the real-time communication is based on a remote procedure call protocol, and is used for sending out a request and obtaining the response of other machine room nodes;
In particular, the services provided by the nodes include data query and digest calculation, when the nodes of other machine rooms call the service nodes through a remote procedure call protocol (RPC), the service nodes query database tables appointed by the other machine rooms and calculate digests of data, wherein the digests are data fingerprints generated by using a predetermined algorithm, and the digests are used as responses of remote call.
In step 1, the predetermined algorithm specifically includes obtaining column values of each row of the database table, and processing the column values of a specific type in the process to convert the spliced character string into a data fingerprint with a fixed length for realizing unique identification and quick comparison of data.
In the step 1, the specific type of the column value includes Date, the character string includes a time stamp character string, and the algorithm of the character string conversion includes MD5 and CRC32.
In the step 1, the database types include oracle and mysql; more than one node is deployed in the same machine room; the node is connected with the database by adopting a database connection pool.
The step 3 further includes writing the difference data into a difference report, and feeding back the difference report to the client; for inconsistent data, the record primary key is stored as a history record so as to facilitate the client to review.
And step 3, the manual correction is required to manually select the data to be corrected from each piece of inconsistent data, and a correction mode is determined, wherein the correction mode comprises updating, covering and deleting, and the original value is recorded in advance for rolling back the error data no matter the manual correction or the automatic correction.
The invention has the beneficial effects that:
(1) According to the invention, through RPC communication, machine resources are fully utilized, data are rapidly compared, and the problem of data inconsistency is found;
(2) The invention is suitable for multi-center database systems with different scales, and can expand the node number according to the need;
(3) The invention adopts abstract calculation and high-efficiency comparison algorithm, reduces bandwidth occupation and reduces performance cost;
(4) The invention can automatically identify and resolve inconsistent data, and reduce manual intervention.
Drawings
FIG. 1 is a flow chart of a multi-center database data comparison method in an embodiment of the invention;
FIG. 2 is a flowchart of database data collection in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data comparison flow in an embodiment of the present invention;
fig. 4 is a schematic diagram of a data correction flow in an embodiment of the invention.
Detailed Description
The invention is further described below with reference to the drawings and exemplary embodiments.
Fig. 1 is a flowchart of a multi-center database comparison method in an embodiment of the present invention, which specifically includes:
Step 101, calling other machine room nodes to inquire the appointed data of the database, calculating the abstract and storing the inquired result;
step 102, comparing the data of different machine room databases, and analyzing difference data;
And 103, correcting the compared inconsistent data by using a manual correction or automatic correction mode, and recording a main key of the difference.
In step 101, as shown in fig. 2, specifically:
Step 101-1: the data of the data source difference table is used as the reference to the data of other data source difference tables to be compared according to the need, firstly, a statistical SQL (structured query language), such as 'select count (1) from x', is constructed, a designated database is connected, and a query SQL is sent to obtain the record number in the table.
Step 101-2: it is determined whether data needs to be written to disk. If the number of records in the table exceeds a million threshold (the threshold can be flexibly adjusted according to the size of the machine memory), a file writing thread is started, data in the memory is written into a disk, the data is prevented from exceeding the limit of the memory too much, and each disk file stores one million records at most;
Step 101-3: and acquiring all the main keys in the table, and acquiring other machine room data through the main key inquiry. While querying all the primary keys of the desired alignment, for better alignment, the system intelligently groups the primary keys into finer task units, each group containing a portion of the primary keys, e.g., 200 primary keys into a group, while querying the specific content of the desired alignment. Then, the task units transmit the primary key information and the task information contained in the task units as parameters to nodes of other machine rooms. After receiving the task information, the service node efficiently queries the data of the database of the machine room, for each row of data in the query result, the node firstly processes the column of a specific type (for example, converts the column value of the Date type into a time stamp character string accurate to nanoseconds so as to improve the accuracy of comparison), then the node splices each column value after processing into the character string, performs summary calculation by adopting a summary algorithm such as MD5 or CRC32, and finally responds the data summary to a calling party, thereby avoiding frequent large-scale data transmission, greatly reducing the use of network bandwidth and improving the efficiency and accuracy of data comparison. Such task fine-grained optimization and efficient data processing strategies together provide significant improvements in system performance and response speed.
Step 101-4: and comparing the data of all the machine rooms. And the calling party stores the received data abstract by using a key-value structure, wherein the key is a main key, the value is the data abstract, and the data of different machine rooms are stored in respective data sets. If a file writing thread exists, the file writing thread writes the response data of the service node into the disk file;
The step 102, as shown in fig. 3, specifically includes:
Step 102-1: selecting one data set as a standard data set, traversing each piece of data in the standard data set, and comparing with other data sets;
Step 102-2: for each piece of data in the standard dataset, it is checked whether the other datasets all contain the data.
Step 102-3: if a certain data set lacks a certain piece of data in the standard data set, directly marking the certain piece of data as inconsistent data;
Step 102-4: if all of the data sets contain a piece of data in the standard data set, a consistency check is performed if the check does not pass the marking of the piece of data as inconsistent data
The specific correction procedure in step 103 is shown in fig. 4, and specifically includes:
step 103-1: and acquiring data source information and difference table information and corresponding difference primary key information.
Step 103-2: and through the association relation between the pre-established database table and the plugin, evaluating whether the difference table is associated with the correction plugin or not so as to determine whether to adopt automatic correction or not. The association relation enables the system to intelligently call the corresponding plug-in to automatically correct when inconsistent data is found, so that manual intervention is obviously reduced, and the adaptability and maintainability of the whole system are enhanced. In addition, the method also provides convenience for adding new correction rules in the future. The correction logic is encapsulated in the code and the plug-in, and the invention provides a more intelligent and reliable solution for maintaining the data consistency.
Step 103-3: if the difference table has an associated correction plug-in, traversing the difference primary key, transmitting the data source information and the primary key value as parameters to the plug-in, and automatically executing data correction by the plug-in. Compared with the traditional manual correction, the automatic correction method greatly improves the correction efficiency and accuracy, and the advantages of the automatic correction method are more obvious along with the increase of the data volume.
Step 103-4: if the difference table does not have an associated correction plug-in, manual intervention is necessary, the system records the inconsistent data to form a difference report, and the system notifies the manual work to process in the form of mail. And selecting one reference data source from the data sources according to the service requirement by manpower, and manually correcting the difference table data of other data sources by taking the data as the reference data. Common correction modes include update, overlay and deletion, and for corrected data, the system records the original value before correction to prevent correction errors and thus roll back the data. And the system can save the primary key of the difference record as a history record for subsequent review.
The invention has the beneficial effects that:
(1) According to the invention, through RPC communication, machine resources are fully utilized, data are rapidly compared, and the problem of data inconsistency is found;
(2) The invention is suitable for multi-center database systems with different scales, and can expand the node number according to the need;
(3) The invention adopts abstract calculation and high-efficiency comparison algorithm, reduces bandwidth occupation and reduces performance cost;
(4) The invention can automatically identify and resolve inconsistent data, and reduce manual intervention.
The present invention is also capable of other embodiments, which are not limited in any way by the above examples, and various changes and modifications can be made by one skilled in the art without departing from the spirit and substance of the invention, and it is intended to cover all such modifications and applications of the above examples as fall within the scope of the appended claims.

Claims (7)

1. A multi-center database data comparison method, comprising the steps of:
Step 1: a plurality of service nodes are deployed in different machine rooms for managing database data of the machine room, and the different machine rooms are communicated in real time through network connection so as to support data synchronization and comparison operation;
Step 2: configuring a data source and a database table to be compared, selecting a node for comparison, wherein the node for comparison is responsible for calling other computer room nodes to acquire the abstract of the data, and after receiving the responses of the other computer room nodes, the node for comparison starts to compare the data to determine whether inconsistent data exists; the method comprises the following specific steps:
Step 2-1: the nodes for comparison count the data amount in the database table to be compared according to the configuration information, and if the data amount in the database table exceeds a threshold value, the acquired data is written into a disk first;
Step 2-2: the nodes for comparison inquire all the primary keys in the database table, each 200 primary keys are a task, and according to the machine room where the data source is located, nodes of other machine rooms are called by multithreading to acquire the abstract of the data;
step 2-3: comparing the acquired data, and recording inconsistent data;
Step 3: and correcting inconsistent data, wherein the correction mode comprises manual correction and automatic correction.
2. A multi-center database data comparison method as claimed in claim 1, wherein: in the step 1, the real-time communication is based on a remote procedure call protocol, and is used for sending out a request and obtaining the response of other machine room nodes;
In particular, the services provided by the nodes include data query and digest calculation, when the nodes of other machine rooms call the service nodes through a remote procedure call protocol (RPC), the service nodes query database tables appointed by the other machine rooms and calculate digests of data, wherein the digests are data fingerprints generated by using a predetermined algorithm, and the digests are used as responses of remote call.
3. A multi-center database data comparison method as claimed in claim 2, wherein: in step 1, the predetermined algorithm specifically includes obtaining column values of each row of the database table, and processing the column values of a specific type in the process to convert the spliced character string into a data fingerprint with a fixed length for realizing unique identification and quick comparison of data.
4. A multi-center database data comparison method as claimed in claim 3, wherein: in the step 1, the specific type of the column value includes Date, the character string includes a time stamp character string, and the algorithm of the character string conversion includes MD5 and CRC32.
5. The multi-center database data comparison method of claim 4, wherein: in the step 1, the database types include oracle and mysql; more than one node is deployed in the same machine room; the node is connected with the database by adopting a database connection pool.
6. The multi-center database data comparison method of claim 5, wherein step 3 further comprises writing the discrepancy data into a discrepancy report, and feeding back the discrepancy report to the client; for inconsistent data, the record primary key is stored as a history record so as to facilitate the client to review.
7. The method of claim 6, wherein the step 3 further comprises the step of manually selecting the data to be corrected from each piece of inconsistent data, and deciding the correction mode, wherein the correction mode comprises updating, overlaying and deleting, and the manual correction and the automatic correction are used for recording the original value in advance and rolling back the error data.
CN202410167095.XA 2024-02-06 2024-02-06 Multi-center database data comparison method Pending CN117971978A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410167095.XA CN117971978A (en) 2024-02-06 2024-02-06 Multi-center database data comparison method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410167095.XA CN117971978A (en) 2024-02-06 2024-02-06 Multi-center database data comparison method

Publications (1)

Publication Number Publication Date
CN117971978A true CN117971978A (en) 2024-05-03

Family

ID=90847651

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410167095.XA Pending CN117971978A (en) 2024-02-06 2024-02-06 Multi-center database data comparison method

Country Status (1)

Country Link
CN (1) CN117971978A (en)

Similar Documents

Publication Publication Date Title
US11604804B2 (en) Data replication system
CN110990432B (en) Device and method for synchronizing distributed cache clusters across machine room
US8065269B2 (en) Immediate maintenance of materialized views
US20110153568A1 (en) High volume, high speed adaptive data replication
CN110245134B (en) Increment synchronization method applied to search service
US11150996B2 (en) Method for optimizing index, master database node and subscriber database node
US20180032567A1 (en) Method and device for processing data blocks in a distributed database
CN112417042A (en) Method and device for processing service request
US10931504B2 (en) Abstraction layer for streaming data sources
CN117971978A (en) Multi-center database data comparison method
CN116186082A (en) Data summarizing method based on distribution, first server and electronic equipment
CN115509693A (en) Data optimization method based on cluster Pod scheduling combined with data lake
US11372838B2 (en) Parallel processing of changes in a distributed system
CN109992573B (en) Method and system for realizing automatic monitoring of HDFS file occupancy rate
US7035859B2 (en) Method and system for intra-table referential integrity for relational database systems
CN102349054A (en) Automatic data store architecture detection
CN112053150A (en) Data processing method, device and storage medium
CN110633271A (en) Data extraction method from Hbase to mysql based on json
CN110263028B (en) Full-scale synchronization method applied to search service
CN114969072B (en) Data transmission method, device and equipment based on state machine and data persistence
CN111143280B (en) Data scheduling method, system, device and storage medium
CN109299035A (en) A kind of CHR file management method, system and computer readable storage medium
CN116414902B (en) Quick data source access method
CN118069743A (en) Method, equipment and medium for synchronizing data and structures of different databases in real time
CN116804983A (en) Method and system for realizing full breakpoint continuous transmission data synchronization of relational database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination