CN110209521B

CN110209521B - Data verification method and device, computer readable storage medium and computer equipment

Info

Publication number: CN110209521B
Application number: CN201910134147.2A
Authority: CN
Inventors: 吴双桥; 王珏; 杨繁
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Cloud Computing Beijing Co Ltd
Priority date: 2019-02-22
Filing date: 2019-02-22
Publication date: 2022-03-18
Anticipated expiration: 2039-02-22
Also published as: CN110209521A

Abstract

The application relates to a data verification method, a data verification device, a computer readable storage medium and computer equipment, wherein the method comprises the following steps: checking a first data table in a main database and writing a checking result into a first checking table; executing a hook program, verifying a second data table corresponding to the first data table in the slave database according to the binary log of the master database, and writing a verification result into a second verification table; and comparing the first check table with the second check table. The scheme provided by the application is wide in application range.

Description

Data verification method and device, computer readable storage medium and computer equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data verification method and apparatus, a computer-readable storage medium, and a computer device.

Background

With the development of computer technology and the rapid increase of data volume, the use of databases is more and more common. In the process of using the database, a user often encounters a scenario that the master-slave database consistency needs to be compared, for example, after the master-slave synchronization problem is repaired, whether the data are consistent needs to be confirmed, or whether the migrated data are correct or not needs to be checked after the data are migrated.

However, the current data verification method usually executes the same database operation statements on the master database and the slave database, and then compares the respective operation results obtained on the master database and the slave database to verify the consistency of the master database and the slave database. This approach is limited by the principle that the master and slave libraries need to execute identical operation statements, leading to a problem that the data checking approach has a narrow application range.

Disclosure of Invention

Therefore, it is necessary to provide a data verification method, an apparatus, a computer-readable storage medium, and a computer device for solving the technical problem that the current data verification method has a narrow application range.

A method of data verification, comprising:

checking a first data table in a main database and writing a checking result into a first checking table;

executing a hook program, verifying a second data table corresponding to the first data table in the slave database according to the binary log of the master database, and writing a verification result into a second verification table;

and comparing the first check table with the second check table.

A data verification apparatus, comprising:

the first checking module is used for checking the first data table in the main database and writing the checking result into the first checking table;

the second check module is used for executing the hook program, checking a second data table corresponding to the first data table in the slave database according to the binary log of the master database, and writing a check result into the second check table;

and the comparison module is used for comparing the first check table with the second check table.

A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the above-described data verification method.

A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above-described data verification method.

According to the data verification method, the data verification device, the computer readable storage medium and the computer equipment, the first data table in the main database is automatically verified, and the verification result is written into the first verification table. In this way, the hook program can verify the second data table of the first data table corresponding to the master database in the slave database according to the binary log of the master database, and write the verification result into the second verification table, so that the verification result of the data consistency of the master database can be obtained by comparing the first verification table with the second verification table, and the application range is greatly expanded.

Drawings

FIG. 1 is a diagram of an exemplary data verification method;

FIG. 2 is a flow diagram illustrating a data verification method according to one embodiment;

FIG. 3 is a schematic diagram of an embodiment of data consistency checking;

FIG. 4 is a schematic diagram illustrating data consistency checking during real-time synchronization in one embodiment;

FIG. 5 is a diagram illustrating corresponding data blocks in a master-slave database in one embodiment;

FIG. 6 is a block diagram showing the structure of a data verification apparatus according to an embodiment;

FIG. 7 is a block diagram showing the structure of a data verification apparatus according to another embodiment;

FIG. 8 is a block diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

FIG. 1 is a diagram of an exemplary data verification method. Referring to fig. 1, the data verification method is applied to a data verification system. The data verification system includes a master database-resident server 110, a slave database-resident server 120, and a verification server 130. The master database-site server 110, the slave database-site server 120, and the verification server 120 are connected through a network. The server may be implemented by an independent server or a server cluster composed of a plurality of servers. It should be noted that the server 110 in which the master database is located and the server 120 in which the slave database is located may be the same server or may be independent servers; the verification server 130 may be one of the master database server 110 and the slave database server 120, or may be another server independent of the master database server 110 and the slave database server 120. The verification server 130 is used to perform the data verification method.

Specifically, the verification server 130 may run a data verification tool, and the verification server 130 may perform the data verification method through the data verification tool. The data verification tool may be specifically implemented by a computer program, and the computer program may be executed to execute the data verification method.

In one embodiment, as shown in FIG. 2, a data verification method is provided. The embodiment mainly illustrates that the method is applied to the verification server 130 in fig. 1. Referring to fig. 2, the data verification method specifically includes the following steps:

s202, checking the first data table in the main database and writing the checking result into the first checking table.

It is understood that the master database and the slave database in the following text are two independent databases in which a data migration relationship or a data synchronization relationship exists. Typically, data is migrated or synchronized by the master database to the slave database. A database is a repository for storing data, which is essentially a file system that stores data in a particular format. The user can add, modify, delete and query the data in the database. The database stores data in a table as an organization unit, namely a data table. The check table is also a data table and is used for recording the check result of the data.

In a particular embodiment, the master database and the slave database are both relational databases.

Specifically, the verification server may first determine a data table to be subjected to data consistency verification between the master database and the slave database. The data consistency check here refers to checking the consistency of data between two corresponding data tables in the master database and the slave database.

It can be understood that after the master-slave synchronization problem is repaired or data migration is performed, data consistency in the master-slave database generally needs to be checked to ensure normal synchronization between the master and the slave or successful data migration. In this case, the data table to be subjected to data consistency check between the master database and the slave database may be a master-slave synchronous data table, or may be a data table after data migration. The number of data tables to be checked for data consistency between the master database and the slave database may be one or more.

Further, the verification server may perform verification on a data table to be subjected to data consistency verification in the master database, which is hereinafter referred to as a first data table, and write a obtained verification result into the first verification table. The verification performed on the first data table is different from the data consistency verification described above, and means that the data in the data table is verified in a preset verification manner. There are various Check methods, such as Cyclic Redundancy Check (CRC) or exclusive-or Check (BCC).

For example, after the master database migrates the data tables 1, 2 and 3 to the slave database, the data tables 1, 2 and 3 may be data tables to be checked between the master database and the slave database, and the data tables 1, 2 and 3 in the master database are the first data tables. For another example, the data tables 4, 5, and 6 between the master database and the slave database have a synchronization relationship, and the data tables 4, 5, and 6 may be data tables to be checked between the master database and the slave database, where the data tables 4, 5, and 6 in the master database are the first data tables.

Of course, the data table to be subjected to data consistency check between the master database and the slave database may be a completely migrated data table or a partially migrated data table; the data table to be subjected to data consistency check between the master database and the slave database can be a data table of all master-slave synchronization or a data table of partial master-slave synchronization, and a user can select the data table by self definition.

It should be noted that, in the embodiment of the present application, the data tables in the master database and the slave database are not locked in the process of performing the data consistency check between the master database and the slave database. That is, in the process of performing data consistency check between the master database and the slave database, data synchronization may exist between the master database and the slave database in real time, and a user may also perform read-write operation on the master database.

In another embodiment, the verification server may also instruct the server where the master database is located to perform the verification operation, and the server where the master database is located verifies the first data table and writes the obtained verification result into the first verification table.

S204, executing the hook program, checking a second data table corresponding to the first data table in the slave database according to the binary log of the master database, and writing a checking result into the second checking table.

Among them, the hook program is a program segment that handles events. Each time a specific event hooked by the hook program is triggered, the hook program is started to perform a corresponding operation. In this embodiment, the specific event hooked by the hook program is to replay a log record related to the first data table check operation in the binary log of the master database from the slave database. And after the hook program is started, generating a corresponding check instruction according to the log record of the binary log of the master database, checking a second data table corresponding to the first data table in the slave database according to the check instruction, and writing a check result into the second check table.

It is understood that data synchronization between the master database and the slave database is the process of master-slave replication. In short, the data of the Master database (Master) and the Slave database (Slave) are guaranteed to be consistent. After the Master has data change, the Slave will automatically synchronize the changed data from the Master (possibly with a certain delay), so as to ensure the consistency of the data in this way, which is the Master-Slave copy.

There are many ways of master-slave replication, one of which is a binary log based approach. Specifically, the Master records the data change into a binary log, the Slave reads the binary log of the Master through an I/O thread, and executes the data change recorded in the binary log locally one by one to complete the change of local data, so that the data change of the Master database is reflected into the Slave database, namely the binary log is replayed.

The binary log of the master database is used for recording relevant information of operations which can modify the master database. In a popular way, similar to Select or Show and the like, the operation of the database cannot be modified, and the binary log cannot record the relevant information of the operation; similar to an Insert or delete, etc., that modifies an operation of a database, the binary log records information about the operation. Because of the process of data consistency check between the master database and the slave database, data synchronization can exist between the master database and the slave database in real time, and a user can also perform read-write operation on the master database. Then, the binary log of the master database includes not only the log record related to the first data table verification operation, but also log records for performing other operations on the data table in the master database. Then the hook program will be initiated only if the portion of the binary log associated with the first data table check operation is recorded during replay when the master database's binary log is replayed from the database.

The binary log of the main database can have a plurality of different modes, and the recording modes of the binary log in the different modes for the operation of modifying the database are different. For example, the binary log may specifically record a database operation Statement (SQL) for modifying the database and context information, and may also record a modification result obtained after the SQL statement is executed. The modification result here can be understood as a modification record of data, such as which data is modified, what data is modified, and the like.

In the method, Based on the binary log of the STATEMENT schema (Based on the Replication of SQL statements, state-Based Replication, SBR), operation statements for modifying data, that is, SQL statements and context information, are recorded. For example, the operations are performed on the database: the update a set test is 'test', and if STATEMENT mode is used, this update statement will be recorded in the binary log. Based on the binary log of ROW-Based Replication (RBR), the ROWs of data affected by the statements that modify the database and the modifications of these ROWs, i.e. the data modification results, are recorded. For example, the operations are performed on a database: if the update asset test is 'test', the data lines affected by the update statement and the modification of these lines will be recorded in the binary log if the ROW mode is used.

In the conventional data consistency check through binary log replay, the data consistency check can be successfully performed only when the operation of modifying the database by the binary log of the master database is performed by directly recording the SQL statement and the context information (for example, the binary log based on the STATEMENT schema). The principle is that SQL sentences are executed on data tables (blocks) in a master database to generate verification results of the data tables (blocks) in the master database, the executed SQL sentences are directly recorded into binary logs based on STATEMENT mode, the same SQL sentences are transmitted to a slave database to be executed through binary log replay, the verification results of the corresponding data tables (blocks) are calculated on the slave database, and finally the verification results of the corresponding data blocks on the master database and the slave database are compared, so that whether the data of the data tables (blocks) verified in the master database and the slave database are consistent or not is judged. However, this is a limitation for the way the binary log of the master database records the operation of modifying the database, not for the master-slave synchronization that directly records the database operation statements and the context information (such as the binary log based on the ROW mode). If the binary log of the master database records the operation of modifying the database in a manner other than directly recording database operation statements and context information (such as a binary log based on a ROW mode), the binary log does not have complete SQL statements but also records the relevant information of the SQL statements correspondingly, and the hook program can be executed to check the data tables (blocks) related to the slave database in the same checking manner according to the relevant information recorded in the binary log, and write the checking result into the checking table.

Specifically, the check server may replay the binary log of the master database in the slave database, and when a log record related to a first data table check operation is replayed, start a hook program by which a data table corresponding to a data table from which a currently replayed log record originates in the slave database is determined according to the replayed log record and a check manner by which the master database checks the data table from which the currently replayed log record originates, check the determined data table in the determined check manner, and write a check result in the second check data table. Wherein the hook program may be a pre-compiled code segment. The hook program can determine a data table corresponding to the data table from which the log record originates from the slave database according to the log record in the binary log of the master database, determine a verification mode through which the master database verifies the data table from which the log record originates, and then verify the determined data table according to the determined verification mode.

For example, if the log record 1 in the binary log of the master database records information related to the modification operation of the data table 1 in the master database, the hook program may determine, according to the log record 1, a data table 1 'in the slave database corresponding to the data table 1, determine a verification method a by which the master database verifies the data table 1, and then verify the data table 1' according to the verification method a.

S206, comparing the first check table with the second check table.

The first check table and the second check table are check tables and are data tables for recording check results. Specifically, the first check table may be a check table in the master database, and is used to record a check result of checking the data table in the master database; the second check table may be a check table in the slave database, and is used to record a check result of checking the data table in the slave database. Therefore, the result of data consistency check of the master database and the slave database can be obtained by comparing the first check table with the second check table.

Specifically, the verification server may compare the verification results recorded in the first verification table and the second verification table, and output which data tables have inconsistent verification results. And if the verification results of all the data tables are consistent, the master data and the slave data are consistent.

In one embodiment, the first check table and the second check table may be the same check table, that is, the check table may record a check result of checking the data table in the master database or record a check result of checking the data table in the slave database. Of course, when the first check table and the second check table are different check tables, the two data tables may record a check result of checking the data table in the master database and a check result of checking the data table in the slave database. Therefore, the two verification results (such as two columns, one column is the verification result of the master database, and the other column is the verification result of the slave database) in any verification table are compared, and the result of verifying the data consistency of the master database and the slave database can be obtained.

According to the data verification method, the first data table in the main database is automatically verified, and the verification result is written into the first verification table. In this way, the hook program can verify the second data table of the first data table corresponding to the master database in the slave database according to the binary log of the master database, and write the verification result into the second verification table, so that the verification result of the data consistency of the master database can be obtained by comparing the first verification table with the second verification table, and the application range is greatly expanded.

In one embodiment, the data verification method further includes: executing a table building statement in a main database to create a first check table; recording the table building statement into a binary log of a main database; a second check table is created upon replaying a table build statement in the binary log from the database.

It is understood that the SQL statement includes DML (Data management Language) and DDL (Data Definition Language). Both of the statements are statements that modify the database, and when the database executes the statements, the binary log records the relevant information of the statements. The DML statement may be, for example, an INSERT statement, an UPDATE statement, or a DELETE statement. DDL statements such as CREATE statements, etc.

When the binary logs based on different modes record relevant information of the DML statements, the recording modes may be different. For example, the binary log based on STATEMENT mode is different from the DML statement recorded by the binary log based on ROW mode. However, when the binary logs based on different modes record the relevant information of the DDL statement, the recording manner is the same, that is, the DDL statement is directly recorded.

Specifically, after the master database executes the table building statement to create the first check table, since the table building statement belongs to the DDL statement, the table building statement is directly written into the binary log regardless of the format of the binary log of the master database. Thus, the table building statement is replayed from the database, namely, the table building statement is directly transferred to the slave database to be executed, and a second check table which is the same as the first check table is built in the slave database.

For example, after the verification server is connected to the master database, a Replication protocol supported by the master database may take over a binary log (i.e., Binlog) of the master database, and then execute a table building statement on the master database to create a verification table cd b, check sums for storing data comparison temporary data (i.e., a verification result obtained by verifying the data table); and replaying the table building statement on the slave database through the Binlog, and executing the table building statement from the slave database so as to create a check table cdb.

FIG. 3 illustrates a schematic diagram of data consistency checking in one embodiment. Referring to fig. 3, in the present embodiment, a table building statement is executed in the master database first, and a check table cdb.

It will be appreciated that the table building operation to build the first check table in the master database occurs before the table building operation to build the second check table from the database. The operation of the master database to verify the first data table occurs before the operation of the slave database to verify the second data table corresponding to the first data table. However, the table creation operation for creating the second check table from the database has no explicit sequential relationship with the check operation of the first data table from the master database.

In the present embodiment, after the check table is created by the table creation statement on the master database, the table creation statement is replayed in the slave database to create the same check table as the master database on the slave database. Therefore, when the data is verified subsequently to obtain the verification result, the verification result can be recorded in the verification table, and the data consistency verification can be carried out by comparing the verification table.

In one embodiment, when the data consistency check of the master database and the slave database is carried out, the real-time synchronization operation between the master database and the slave database and the read-write operation of the master database are allowed. Then, in the binary log in the master database, there are not only log records related to the first data table verification operation, but also log records related to the real-time synchronization operation and/or the read-write operation. And the two log records may be interleaved. That is, there may be a scenario where the real-time synchronization operation is performed several times, then the first data table verification operation is performed several times, and then the read/write operation is performed several times.

Specifically, when the check server replays the binary log of the master database in the slave database, when the replay log record is a log record related to real-time synchronous operation and/or read-write operation, the check server executes synchronous operation on the data table in the slave database; when the replay log record is a log record associated with a first data table check operation, then the check operation is performed by the hook program.

For example, the log records recorded in the binary log are: the data in line 3 of Table A was modified to XXX. Then, when the log record is replayed from the slave database, the data of the 3 rd row of the table A in the slave database is modified to XXX, so that the data change of the master database is synchronized to the slave database.

For another example, the log records recorded in the binary log are: adding data into the checking table, wherein the data is the checking result of the table A; then, when the log record is replayed from the slave database, the data is added into the check table of the slave database, and the added data is the check result of the table A in the slave database, so that the data change of the master database is synchronized into the slave database.

FIG. 4 is a schematic diagram illustrating data consistency checking during real-time synchronization in one embodiment. Referring to fig. 4, the master database includes data tables a1, a2 and A3 …, the slave database includes data tables B1, B2 and B3 …, a1 and B1 are corresponding data tables, and a2 and B2 are corresponding data tables. In the present embodiment, the data consistency check process is to check the data consistency of a1 and B1 and the data consistency of a2 and B2. It is assumed that the data block ai in the data table a1 is verified at a certain time point, and the first verification result Ji is recorded in the verification table of the master database, and a corresponding log record Ck is generated in the binary log of the master database. Before checking the data block ai +1 in the data table a1, the operation of other users is performed once on the data table a2, and meanwhile, a corresponding log record Ck +1 is generated in the binary log of the master database, and the data block ai +1 in the data table a1 is checked, and a corresponding log record Ck +2 is generated in the binary log of the master database. When the log record of the binary log of the master database is replayed from the database, when the log record Ck is replayed, checking a data block bi of a data table B1 according to the log record Ck, wherein bi and ai are corresponding blocks; when the replay is continued to the log record Ck +1, the data table B2 is synchronized according to the log record Ck + 1; when the replay is continued to the log record Ck +2, the data block bi +1 of the data table B1 is checked according to the log record Ck +2, the bi +1 and ai +1 are corresponding blocks, and the log record is continuously replayed.

In this embodiment, data differences between two databases in a master-slave relationship may be compared online, even if there is still real-time data synchronization between the master and slave, or there is a read-write operation on the master database.

In one embodiment, S202 includes: dividing a first data table in a main database into a plurality of first data blocks; and sequentially checking each first data block, and writing the checking result into a first checking table. S204 comprises the following steps: and executing a hook program, sequentially checking second data blocks corresponding to the first data blocks in the slave database according to the log recording sequence of the binary log of the master database, and writing the checking result into a second checking table.

It can be understood that, in general, the data size of a data table in the database is large. When the whole data table with large data quantity is checked at a time, the calculation time is usually long. In this embodiment, a data table is divided into a plurality of data blocks, and then different data blocks are verified respectively, so that the time consumption in calculation can be reduced, and the verification efficiency can be improved.

Specifically, the verification server may divide the first data table in the master database into a plurality of first data blocks, sequentially verify each first data block, and write the verification result into the first verification table. In this way, the log record recorded in time sequence in the binary log of the master database can reflect the checking sequence of the plurality of first data blocks in the master database. And when the hook program is executed in the slave database, sequentially checking second data blocks corresponding to the first data blocks in the slave database according to the log recording sequence of the binary log of the master database, and writing the checking result into a second checking table. For example, the first data block N in the master database has a parity rank M in the plurality of first data blocks, and then the second data block N' corresponding to the first data block N in the slave database has a parity rank M in the second data block corresponding to each first data block.

For example, FIG. 5 shows a diagram of corresponding data blocks in a master-slave database in one embodiment. Referring to fig. 5, in the present embodiment, data table a1 in the master database and data table B1 in the slave database are corresponding data tables (i.e., a1 and B1 are synchronized data tables), and data block an in data table a1 and data block bn in data table B1 are corresponding data blocks. When the server where the check server or the master database is located checks an to obtain a first check result and writes the first check result into the check table of the master database, a log record m is correspondingly generated in a database log of the master database, when the log record m is replayed from the slave database, the check server or the server where the slave database is located firstly determines a data block from which the log record m originates as an, a data block corresponding to an in the slave database is bn, and then checks bn to obtain a second check result, and the second check result is written into the check table of the slave database.

It can be understood that the data blocks in the master database are checked in sequence and have time sequence; then the log records in the database logs of the master database are also generated sequentially, with a time order. The log records in the database log of the slave database replay master database are also replayed in sequence. The operation result of thus operating the check table is also processed sequentially when it is replayed from the database.

In a specific embodiment, the fields of the check table may include a database name (db), a data table name (tbl), a data block name (chunk), a data block left boundary (lower _ boundary), a data block right boundary (upper _ boundary), a check result of a data block in the master database (master _ crc), a number of rows of data blocks checked in the master database (master _ count), a check result of a data block in the slave database (this _ crc), and a number of rows of data blocks checked in the slave database (this _ count), etc. Therefore, when the log record is replayed, the server is verified or the server where the database is located is checked, and the data block from which the log record originates can be quickly determined according to the data recorded in the log record and written in the table.

With continued reference to fig. 3, when the checking server or the server where the main database resides checks the first chunk of the data table a, the left boundary of the chunk may be determined and marked as X (for example, the left boundary of a 1-1000 data block is 1), and it is determined whether there is a left boundary of the next chunk. When the left boundary of the next chunk exists, the last chunk of the current chunk, which is not the data table a, is described, and the cyclic redundancy check is performed on the current chunk to obtain the master _ crc of the current chunk and the row number master _ count of the current chunk. When the left boundary of the next chunk does not exist, the current chunk is the last chunk of the data table a, the maximum value of the current chunk is the right boundary Y of the current chunk, and cyclic redundancy check is performed on the current chunk to obtain the master _ crc of the current chunk and the row number master _ count of the current chunk.

In the embodiment, the data table to be verified is divided into the data blocks, and the calculation is performed by taking the data blocks as units, so that the whole locking of the data table is avoided, and the influence on the normal read-write service of the main database is reduced. Moreover, the checking sequence of the second data blocks corresponding to the first data blocks in the master database is consistent with the checking sequence of the first data blocks, so that the data blocks corresponding to the master database and the slave database are checked in the same sequence, and the problem of checking errors caused by the influence of other database operations in the checking process is avoided.

In one embodiment, sequentially checking each first data block and writing the checking result into the first checking table includes: sequentially taking each first data block obtained by division as a current data block; checking all columns of the current data block and carrying out row number statistics; and writing the verification result and the line number into a first verification table, and counting the verification sum line number of the next first data block serving as the current data block to the last first data block.

Wherein the first data block may comprise a plurality of lines of data. Specifically, the verification server may sequentially use each of the divided first data blocks as a current data block, verify all columns of the current data block, perform row number statistics on the current data block, write an obtained verification result and row number into a first verification table, then use the next first data block as the current data block, and perform the foregoing operation on the current data block until the last first data block.

In one embodiment, each first data block has the same number of rows as the corresponding second data block. Executing a hook program, sequentially verifying second data blocks corresponding to the first data blocks in the slave database according to the log recording sequence of the binary log of the master database, and writing the verification result into a second verification table, wherein the hook program comprises the following steps: executing a hook program, and sequentially determining second data blocks corresponding to the first data blocks in the slave database according to the log recording sequence of the binary log of the master database; checking all the columns of each second data block which is determined in sequence and carrying out row number statistics; the check result and the number of rows are written into a second check table.

It will be appreciated that the verification server executes a hook procedure on the slave database, and the determined second data block corresponding to the first data block is the same as the number of rows of the first data block. If the row numbers are different, the difference exists in the data of the master database and the slave database. In the conventional data consistency checking method, when the number of rows is consistent and the data blocks in the slave database are checked, only the data in the corresponding column of the data block corresponding to the data block in the master database is checked. Then, when the data columns of the corresponding data tables in the master database and the slave database are inconsistent, the conventional data consistency checking method may cause a problem of wrong comparison results. In this embodiment, when the hook program is used to verify the data block in the slave database, the data in all columns in the data block is verified, so that the occurrence of the error is effectively avoided.

It should be noted that the process of sequentially checking the data blocks may be continuous, that is, after the current data block is processed, the next data block is immediately processed. When the pressure of the master database or the slave database is large, there may be a time interval between different data block processing, such as waiting for 100ms to process the next data block after processing one data block.

In addition, when the verification interruption of the data table needs to be re-verified, or the currently processed data block is re-processed, the master database may delete the corresponding historical verification result in the first verification table. And the deleting operation is synchronized to the slave database through the binary log, and the historical verification result of the corresponding data block in the second verification table is also deleted from the slave database.

In the above embodiment, when the data block in the master library is verified, the data of all the data columns of the data block is verified; when the corresponding data block in the slave library is verified according to the hook program, the data of all the data columns of the corresponding data block in the slave library are also verified, so that the problem that the verification errors occur when the columns in the corresponding databases in the master library and the slave library are different is effectively solved.

In one embodiment, partitioning a first data table in a master database into a plurality of first data blocks comprises: acquiring a unique index of a first data table of a main database; the first data table is divided into a plurality of first data blocks by rows according to key values included by the unique index.

The unique index of a data table is the set of key values of each row in the data table. The unique index includes key values that are different from one another. Typically, there may be a unique index to each data table in the database.

Specifically, after the data table which needs to be subjected to data verification is determined, the verification server may obtain a unique index of the data table, sort the rows of data included in the data table according to key values included in the unique index, divide the key values of the data table of the primary database into a plurality of data blocks, that is, divide the data block into a plurality of data blocks by rows, where each data block includes a plurality of rows of data.

For example, for one of the tables t1 of the data table to be subjected to data verification, a unique index of the table is found, and the data of the table is divided into small data blocks (chunks) according to key values included in the unique index, such as 1-1000, 1001-1999.

It is understood that the multiple rows of data included in a data block may be adjacent data rows or non-adjacent data rows. This is because the row number of a row of data in the data table is not necessarily the same as the sequence number of the row of data's key value in the unique index. For example, the third row of data in data table a has a row ordering of 3 in data table a, but the key values of the row of data are not ordered in 3 in the unique index. However, since there may be an add/delete operation on the data table, the row number cannot uniquely identify a row of data, but uniquely identifies the row of data by the key value of the row of data.

When the check server does not find the unique index of the data table, the display of the prompt message can be triggered. The content of the prompting message includes the risk of asking the user whether to add a unique index to the data table, and not to add a unique index, so as to provide a self-selection way for the user. The unique index is not added, so that the time consumption of data verification is increased, the master-slave synchronization is hardly influenced, and the correctness of the result is not influenced.

Referring to fig. 3 again, in this embodiment, the data table to be subjected to data verification includes a data table a, then a unique index of the data table a is obtained, and the data table a is divided into a plurality of data blocks (chunks) by rows according to key values included in the unique index.

In the embodiment, the table to be compared is divided into blocks according to the unique index, and the calculation is carried out by taking the blocks as units, so that the whole locking of the data table is avoided, and less time is consumed; and the influence on other read-write operations different from the verification operation in the master database and the slave database in the verification process can be reduced.

In one embodiment, sequentially checking each first data block and writing the checking result into the first checking table includes: sequentially taking each first data block obtained by division as a current data block; when the current data block is verified, starting transaction operation, and calculating the current data block according to a preset verification mode through the transaction operation to obtain a corresponding first verification result; and after the first verification result is written into the first verification table through the transaction operation, ending the transaction operation and using the next first data block as the current data block to verify until the last first data block.

It should be noted that the operations on the database include different types of operations, such as normal operations and transaction operations. The common operations comprise single operations of adding, deleting, searching or modifying the database, and the transaction operations are operations of adding, deleting, searching, modifying and the like to the database in batches. Under a transaction is an atomic operation, which only includes success or failure results, and no intermediate results occur.

Specifically, the verification server or the server where the master database is located may sequentially use each divided data block as a current data block, when the current data block is verified, start a transaction operation, operate the current data block according to a preset verification method through the transaction operation to obtain a corresponding first verification result, continue to record the first verification result of the current data block to a first verification table through the transaction operation, end the transaction operation, and then use the next data block as the current data block to execute the operation performed on the current data block until the last data block. It is understood that for each data block, the checksum and checksum result recording of the data block is performed by one transaction operation.

In one embodiment, when the verification server or the server in which the master database is located determines the data tables to be subjected to data verification, it may be detected whether the engines corresponding to the data tables in the master database support transaction operations. When the engines corresponding to the data tables in the main database support transaction operation, the verification server or the server where the main database is located continues to execute the subsequent steps; when the engines corresponding to the data tables in the main database do not support the transaction operation, the verification server or the server where the main database is located continues to execute the subsequent steps to trigger the reminding message. The reminding message is used for reminding the user to modify the engine corresponding to the data table which does not support the transaction operation, and after the user updates the engine corresponding to the data table which does not support the transaction operation to the engine which supports the transaction operation, the subsequent steps are continuously executed. Wherein, the engine supporting the transaction operation is, for example, an InnoDB engine or a TokuDB engine supported by MySQL.

In one embodiment, the isolation level of the master database is the repeat read level when the transaction operation is initiated. Starting transaction operation, and calculating the current data block according to a preset verification mode through the transaction operation to obtain a corresponding first verification result, wherein the method comprises the following steps: starting a transaction operation; generating a database snapshot when the transaction operation is started based on the concurrency control; and operating the current data block in the database snapshot according to a preset verification mode through transaction operation to obtain a corresponding first verification result.

It can be understood that when the data tables between the master database and the slave database are verified and the master database supports the read-write operations of other users, if the read-write operations of other users and the verification operations are performed on the same data table, the verification result may be affected, resulting in a verification error.

The main database in the embodiment of the present application supports MVCC (Multi-Version concurrent Control). It should be noted that, in general, if a user reads data from the database and another user writes data, the user reading the data may see "half-written" or inconsistent data. To solve this problem, the simplest method is to lock the database and let all users reading data wait for the user work writing data to complete, but this is inefficient. MVCC uses a different approach, where each user that reads data connected to a database sees a snapshot of the database at a certain instant, and changes caused by a write operation by a user that writes data are not visible to other users that read data until the write operation is complete (or until a database transaction commits).

Typically, MVCC only works at two isolation levels, REPEATABLE READ (repeated READ level) and READ COMMITTED (READ only committed data). In this embodiment, when a transaction operation is performed to verify a database, the isolation level of the main database is a repeat read level, and data obtained by repeat read operations of any number in the transaction operation is the same at the isolation level, so that the correctness of the verification result is ensured regardless of whether the read data table has an update operation.

In this embodiment, by setting the isolation level between the concurrent control and the database, the data obtained by any number of repeated read operations in the transaction operation for performing database verification is the same, and the correctness of the verification result is ensured regardless of whether the read data table has an update operation.

With continued reference to fig. 3, when the checking server or the server in which the master database resides checks the current data block, in a transaction operation, the cyclic redundancy check result master _ crc and the row number master _ count of the current chunk are calculated and written into the checksums table of the master database. The operation result of the operation can be recorded in a database log of the master database, when the operation result of the database log is replayed in the slave database, a hook function is started, the cyclic redundancy check result this _ crc and the line number this _ count of the current chunk corresponding to the chunk are calculated through the hook function, and the result is written in a checksums table of the slave database. The main database can obtain this _ crc and this _ count obtained from the database and write the data into a checksums table of the main database; the master _ crc and the master _ count obtained from the master database can also be obtained from the slave database and written into the checksums table of the slave database. And the server where the verification server or the main database is located acquires the next chunk again, and circularly executes the operation executed on the current chunk. Until the current chunk is the last chunk to be verified, the chunk is (— infinity, X) or (Y, + ∞), and at this time, the verification of the data table a is finished, and the verification result can be obtained according to the verification table of the master database or the verification table of the slave database.

It will be appreciated that while checking data table A, it is also possible to add rows to the data table A, and thus, the range of unique indices for data table A is set to (— infinity, + ∞).

In the embodiment, the data verification of one data block and the recording of the verification result are completed through one transaction operation, so that the correctness of the verification result is ensured.

In general, the existing data consistency check requires that the database log of the master database must be a binary log based on STATEMENT schema, and the binary log for other schemas (such as a binary log based on ROW schema) is limited. Moreover, the current master-slave synchronization in the industry does not basically adopt STATEMENT mode binary logs for master-slave synchronization because the STATEMENT mode has more known problems, such as inconsistency of master-slave data may be caused seriously. For master-slave synchronization of binary logs in a non-STATEMENT mode, the binary log mode of a database connection session needs to be modified to STATEMENT, and the modification needs an account number of a user providing SUPER authority. Generally, users are sensitive to database accounts that provide high privileges. Moreover, in some special scenarios, the existing data consistency check cannot compare the data difference of the master database and the slave database. For example, the definition of the first fields of the data table in the master database and the corresponding data table in the slave database is the same, but the data table in the slave database has one or more fields more than the data table in the master database, i.e., the other data in the tables are the same except for the last column or columns. In such a case, the existing data consistency check may lead to erroneous conclusions comparing the two tables, possibly leading to catastrophic consequences.

In conjunction with the foregoing embodiments, it can be appreciated that the embodiments of the present application do not require the SUPER authority to force the modification of the database log mode to the STATEMENT mode. Moreover, in the embodiment of the present application, based on the unique index of the data table to be checked and the hook program (also called hook operation) executed when the log record is replayed, the data consistency check task is completed in the data consistency check process without depending on replaying the SQL statement recorded in the binary log based on the STATEMENT mode. Meanwhile, in the embodiment of the application, the hook program is executed to enable all data of the data block corresponding to the main data on the slave database to participate in CRC, so that the situation that misjudgment is possibly generated when one or more columns are added to the data table of the slave database can be avoided.

Moreover, between the databases with normal master-slave synchronization, if the data of any data block is consistent at the same snapshot time point of the master-slave database, the data of the master-slave database is definitely consistent. Since the binary log of the master database is strictly linearly ordered in the time dimension, when the slave database is ready to execute the hook program, the data snapshot time point of the slave database is consistent with the time when the master database performs the corresponding data block check. This inference is easily verified by a back-up method because data of any data block at the same snapshot time point in the master-slave database is consistent between the databases in which the master-slave synchronization is normal, and if the data of the master-slave database is inconsistent, the data of the master-slave database cannot be consistent at the same snapshot point in the previous comparison.

It should be understood that, although the steps in the flowcharts of the above embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the above embodiments may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the sub-steps or the stages of other steps.

As shown in FIG. 6, in one embodiment, a data verification apparatus 600 is provided. Referring to fig. 6, the data verification apparatus 600 includes: a first check module 601, a second check module 602, and a comparison module 603.

The first checking module 601 is configured to check the first data table in the master database and write a checking result into the first checking table.

And the second check module 602 is configured to execute the hook program, check a second data table, which corresponds to the first data table, in the slave database according to the binary log of the master database, and write a check result into the second check table.

The comparison module 603 is configured to compare the first check table with the second check table.

The data verification device 600 automatically verifies the first data table in the primary database, and writes the verification result into the first verification table. In this way, the hook program can verify the second data table of the first data table corresponding to the master database in the slave database according to the binary log of the master database, and write the verification result into the second verification table, so that the verification result of the data consistency of the master database can be obtained by comparing the first verification table with the second verification table, and the application range is greatly expanded.

In one embodiment, the first check module 601 is further configured to divide the first data table in the master database into a plurality of first data blocks; and sequentially checking each first data block, and writing the checking result into a first checking table. The second check module 602 is further configured to execute a hook program, sequentially check second data blocks in the slave database corresponding to the first data blocks according to a log recording sequence of the binary log in the master database, and write a check result into the second check table.

In an embodiment, the first checking module 601 is further configured to take each of the divided first data blocks as a current data block in sequence; checking all columns of the current data block and carrying out row number statistics; and writing the verification result and the line number into a first verification table, and performing verification and line number statistics on the next first data block serving as the current data block until the last first data block.

In one embodiment, each first data block has the same number of rows as the corresponding second data block. The second check module 602 is further configured to execute a hook program, and sequentially determine, according to a log recording sequence of a binary log of the master database, second data blocks in the slave database, where the second data blocks correspond to the first data blocks, respectively; checking all the columns of each second data block which is determined in sequence and carrying out row number statistics; the check result and the number of rows are written into a second check table.

In one embodiment, the first check module 601 is further configured to obtain a unique index of a first data table of the primary database; the first data table is divided into a plurality of first data blocks by rows according to key values included by the unique index.

In an embodiment, the first checking module 601 is further configured to take each of the divided first data blocks as a current data block in sequence; when the current data block is verified, starting transaction operation, and calculating the current data block according to a preset verification mode through the transaction operation to obtain a corresponding first verification result; and after the first verification result is written into the first verification table through the transaction operation, ending the transaction operation and using the next first data block as the current data block to verify until the last first data block.

In one embodiment, the isolation level of the master database is the repeat read level when the transaction operation is initiated. The first check module 601 is further configured to start a transaction operation; generating a database snapshot when the transaction operation is started based on the concurrency control; and operating the current data block in the database snapshot according to a preset verification mode through transaction operation to obtain a corresponding first verification result.

As shown in fig. 7, in an embodiment, the data verification apparatus 600 further includes: a first table building module 604 and a second table building module 605.

A first table building module 604, configured to execute a table building statement in the master database to create a first check table; and recording the table building statement into a binary log of the master database.

A second table building module 605, configured to create a second check table during replay of the table building statements in the binary log from the database.

In one embodiment, the binary log belonging to the master database is a row replication schema-based binary log.

FIG. 8 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the verification server 130 in fig. 1. As shown in fig. 8, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the data verification method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a data verification method. Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the data verification apparatus provided in the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 8. The memory of the computer device may store therein various program modules constituting the data checking apparatus, such as a first checking module 601, a second checking module 602, and a comparison module 603 shown in fig. 6. The respective program modules constitute computer programs that cause the processors to execute the steps in the data verification methods of the embodiments of the present application described in the present specification.

For example, the computer device shown in fig. 8 may verify the first data table in the master database through the first verification module 601 in the data verification apparatus 600 shown in fig. 6 and write the verification result into the first verification table. And executing the hook program through the second checking module 602, checking a second data table corresponding to the first data table in the slave database according to the binary log of the master database, and writing a checking result into the second checking table. The first check table and the second check table are compared by the comparison module 603.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above-described data verification method. The steps of the data verification method herein may be steps in the data verification methods of the various embodiments described above.

In one embodiment, a computer-readable storage medium is provided, storing a computer program that, when executed by a processor, causes the processor to perform the steps of the above-described data verification method. The steps of the data verification method herein may be steps in the data verification methods of the various embodiments described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for data verification, the method comprising:

when modification operation on a data table in a main database is generated, recording relevant information for executing the modification operation based on a binary log of a ROW mode;

determining a first data table to be subjected to data consistency check in the main database;

performing data table verification on the data in the first data table in the main database in a preset verification mode and writing a verification result into the first verification table;

replaying the binary log of the master database in a slave database;

when the log record related to the checking operation of the first data table is replayed, executing a hook program, determining a second data table corresponding to the first data table from which the currently replayed log record originates in the slave database according to the log record related to the checking operation of the first data table in the binary log of the master database through the hook program, determining a preset checking mode adopted by the master database for checking the data table of the first data table, performing data table checking on the data in the second data table in the slave database according to the preset checking mode, and writing a checking result into the second checking table;

and comparing the first check table with the second check table.

2. The method according to claim 1, wherein the performing, for the first data table in the master database, a data table check on the data in the first data table by using a preset check method and writing a check result into the first check table comprises:

dividing the first data table in the master database into a plurality of first data blocks;

sequentially verifying each first data block, and writing verification results into the first verification table;

the data table verification is performed on the data in the second data table in the slave database according to the preset verification mode, and the verification result is written into a second verification table, including:

and according to the log recording sequence of the binary log of the master database, sequentially verifying second data blocks corresponding to the first data blocks in the slave database, and writing verification results into the second verification table.

3. The method of claim 2, wherein said sequentially verifying each of the first data blocks and writing the verification result into the first verification table comprises:

sequentially taking each first data block obtained by division as a current data block;

checking all columns of the current data block and carrying out row number statistics;

and writing the verification result and the line number into the first verification table, and performing verification sum line number statistics on the next first data block serving as the current data block until the last first data block.

4. The method of claim 2, wherein each of the first data blocks has the same number of rows as the corresponding second data block; the sequentially verifying the second data blocks corresponding to the first data blocks in the slave database according to the log recording sequence of the binary log of the master database, and writing the verification result into the second verification table includes:

sequentially determining second data blocks corresponding to the first data blocks in the slave database according to the log recording sequence of the binary log of the master database;

checking all the columns of the second data blocks which are determined in sequence and carrying out row number statistics;

writing a check result and a number of rows to the second check table.

5. The method of claim 2, wherein the dividing the first data table in the master database into a plurality of first data blocks comprises:

obtaining a unique index of the first data table of the master database;

and dividing the first data table into a plurality of first data blocks according to the key values included by the unique index.

6. The method of claim 2, wherein said sequentially verifying each of the first data blocks and writing the verification result into the first verification table comprises:

when the current data block is verified, starting transaction operation, and calculating the current data block according to a preset verification mode through the transaction operation to obtain a corresponding first verification result;

and after the first verification result is written into the first verification table through the transaction operation, ending the transaction operation, and verifying the next first data block as the current data block until the last first data block.

7. The method of claim 6, wherein when the transaction operation is initiated, the isolation level of the master database is a repeat read level; the starting transaction operation, which is to calculate the current data block according to a preset verification mode through the transaction operation to obtain a corresponding first verification result, includes:

initiating the transaction operation;

generating a database snapshot when the transaction operation is started based on concurrency control;

and operating the current data block in the database snapshot according to a preset verification mode through the transaction operation to obtain a corresponding first verification result.

8. The method of claim 1, further comprising:

executing a table building statement in the master database to create the first check table;

recording the table building statement into the binary log of the master database;

replaying the build table statements in the binary log in the slave database, creating the second check table.

9. A data verification apparatus, the apparatus comprising:

the log recording module is used for recording relevant information for executing modification operation based on a binary log of a ROW mode when the modification operation on the data table in the main database is generated;

the first checking module is used for determining a first data table to be subjected to data consistency checking in the main database; performing data table verification on the data in the first data table in the main database in a preset verification mode and writing a verification result into the first verification table;

a second check module for replaying the binary log of the master database in a slave database; when the log record related to the checking operation of the first data table is replayed, executing a hook program, determining a second data table corresponding to the first data table from which the currently replayed log record originates in the slave database according to the log record related to the checking operation of the first data table in the binary log of the master database through the hook program, determining a preset checking mode adopted by the master database for checking the data table of the first data table, performing data table checking on the data in the second data table in the slave database according to the preset checking mode, and writing a checking result into the second checking table;

10. The apparatus of claim 9, wherein the first checking module is further configured to divide the first data table in the master database into a plurality of first data blocks; sequentially verifying each first data block, and writing verification results into the first verification table;

the second check module is further configured to check, in sequence, second data blocks in the slave database, which correspond to the first data blocks, according to the log recording order of the binary log in the master database, and write a check result into the second check table.

11. The apparatus according to claim 10, wherein the first checking module is further configured to take each of the divided first data blocks as a current data block in turn; checking all columns of the current data block and carrying out row number statistics; and writing the verification result and the line number into the first verification table, and performing verification sum line number statistics on the next first data block serving as the current data block until the last first data block.

12. The apparatus of claim 10, wherein each of the first data blocks has a same number of rows as a corresponding second data block; the second check module is further configured to sequentially determine, according to a log recording order of the binary log of the master database, second data blocks in the slave database, which correspond to the first data blocks, respectively; checking all the columns of the second data blocks which are determined in sequence and carrying out row number statistics; writing a check result and a number of rows to the second check table.

13. The apparatus of claim 10, wherein the first check module is further configured to obtain a unique index of the first data table of the master database; and dividing the first data table into a plurality of first data blocks according to the key values included by the unique index.

14. The apparatus according to claim 10, wherein the first checking module is further configured to take each of the divided first data blocks as a current data block in turn; when the current data block is verified, starting transaction operation, and calculating the current data block according to a preset verification mode through the transaction operation to obtain a corresponding first verification result; and after the first verification result is written into the first verification table through the transaction operation, ending the transaction operation, and verifying the next first data block as the current data block until the last first data block.

15. The apparatus of claim 14, wherein when the transaction operation is initiated, the isolation level of the master database is a repeat read level; the first checking module is also used for starting the transaction operation; generating a database snapshot when the transaction operation is started based on concurrency control; and operating the current data block in the database snapshot according to a preset verification mode through the transaction operation to obtain a corresponding first verification result.

16. The apparatus of claim 9, further comprising:

a first table building module for executing a table building statement in the master database to create the first check table; recording the table building statement into the binary log of the master database;

a second table building module to replay the table building statements in the binary log in the slave database to create the second check table.

17. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 8.

18. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 8.