CN116150264A

CN116150264A - Database replica data master-slave consistency verification method, device, equipment and medium

Info

Publication number: CN116150264A
Application number: CN202211463714.7A
Authority: CN
Inventors: 金官丁
Original assignee: Shanghai Hotpu Network Technology Co ltd
Current assignee: Shanghai Hotpu Network Technology Co ltd
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2023-05-23

Abstract

The application discloses a method, a device, equipment and a medium for checking the primary and backup consistency of database replica data, wherein the method comprises the following steps: split index selection: splitting the index requires selecting an index field with low repeatability so as to conveniently divide the intervals of the data; confirming the size of the split block: splitting the table into proper sizes by using a splitting index so that the number of split intervals is closest to a preset value; comparing data block consistency: judging the consistency of the data blocks by calculating the sum of the crc32 of each data block; deep comparing the data consistency in the data block: for data inconsistency of the data blocks, a splitting index range of the data blocks needs to be recorded, the data blocks are further split according to the sizes of the data blocks, and the crc32 of the sub data blocks is calculated. The method and the device can efficiently and accurately calculate the consistency of the main data and the standby data, do not need to lock the table and the line data, and do not influence the normal operation of the service.

Description

Database replica data master-slave consistency verification method, device, equipment and medium

Technical Field

The present invention relates to the field of distributed relational databases, and in particular, to a method, an apparatus, a device, and a medium for checking consistency of data primary and backup of a database replica.

Background

In order to improve the safety and reliability of data, a general relational database is provided with one or more standby machines to synchronize the data in real time, and meanwhile, a plurality of copies exist, so that the data safety is ensured, and even if a main database is hung, the database can be automatically switched to the standby database. In order to ensure the consistency of the main data and the standby data during the main-standby switching, a method for detecting the consistency of the main data and the standby data is needed, so that whether the consistency of the main data and the standby data can be detected rapidly and accurately under the TB data quantity is important.

Many current primary and backup data consistency require lock row data when detecting consistency. When each sub data block is verified, firstly locking the line data in the block, prohibiting DML locking operation, recording the execution position of a main library after obtaining the lock, suspending SQL threads after the operation of the copy thread of the slave library is executed to the recorded position of the main library, waiting for the consistency of the data block after the master-slave synchronization, and avoiding error detection caused by the copy delay.

Disclosure of Invention

The invention aims to provide a method, a device, equipment and a medium for checking the primary and secondary consistency of database duplicate data, so as to solve the problems in the technical background.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

in a first aspect, the present invention provides a method for checking the primary and backup consistency of data of a database replica, including the following steps:

index de-duplication is carried out on the data table according to a preset splitting rule, and a splitting index is obtained;

determining the size of split blocks according to the number of rows of the split index and the preset target interval number, so that the number of the split blocks is closest to the target interval number, splitting the data table according to the size of the split blocks by using the split index, and generating at least one split block, wherein each split block corresponds to one data block;

for each row of data of the data block, splicing all column values to form a row character string, calculating the crc32 of the row character string, accumulating the crc32 of all row character strings in the data block to obtain the crc32 of the whole data block, representing the splitting index value of the uniqueness of the data block, and judging the consistency of the main data and the standby data by judging the crc32 value of the corresponding data block on the main data and the standby data;

and for the data blocks with inconsistent main and standby data, recording the range of splitting indexes corresponding to the data blocks, further splitting the data blocks into at least one sub data block according to the size of the data blocks, calculating the crc32 of the sub data block, and judging the consistency of the main and standby data by judging the crc32 value of the corresponding sub data block on the main and standby.

Preferably, the splitting rule includes: the main key and the unique key are preferably selected, then the index field with low repeatability in the index statistical information is selected, and the numerical index is preferably selected under the condition that the repeatability is the same.

In the above, each split block corresponds to a section (minimum-maximum) of a primary key or a unique key, for example, the table number of the data table is 100w, and the number of split blocks is 10w, which is the preset target section number.

Preferably, the determining the size of the split block according to the number of rows of the split index and the preset target interval number includes the following steps:

if the split field is of a numerical value type, the split interval number m1=the number of rows/10 of the data table after de-duplication ^N N is an integer greater than or equal to 1, the numerical values of the interval numbers M1 calculated when N is different values are compared, N when the interval numbers M1 are closest to the preset target interval numbers is taken, the calculated interval numbers M1 are the number of the split blocks, and then the sizes of the split blocks are determined;

if the splitting field is a character string type, the number of splitting intervals m2=the number of left (column, N) after duplication removal, N is an integer greater than or equal to 0, where left (column, N) represents the left N characters of the splitting field of the intercepting character string, the number of intervals M2 calculated when N is different values is compared, N when the number of intervals M2 is closest to a preset target number of intervals is taken, and the calculated number of intervals M2 is the number of splitting blocks, so as to determine the size of the splitting blocks.

In the above description, the split field is a primary key or a unique key of the data table, and the split field interval (minimum-maximum value) can be used to quickly scan the data of the whole split block in the data table or calculate the crc32 value of the split block.

Preferably, the preset target interval number is 100000.

Preferably, the calculating the sum of the crc32 of each data block is that the crc32 value of each data block is directly calculated at the database level through the relational database Group grammar (Group by) and a Group of corresponding data blocks.

Preferably, if there is inconsistency in the data of the corresponding sub data block on the primary and secondary, the inconsistent sub data block is marked.

In a second aspect, the present invention provides a database replica data master/slave consistency check device, including:

the splitting index selection module is used for carrying out index de-duplication on the data table according to a preset splitting rule to obtain a splitting index;

the splitting block size confirming module is used for determining the sizes of the splitting blocks according to the number of lines of the splitting index and the preset target interval number, so that the number of the splitting blocks is closest to the target interval number, splitting the data table into intervals according to the sizes of the splitting blocks by using the splitting index, and generating at least one splitting block, wherein each splitting block corresponds to one data block;

the data block consistency comparison module is used for judging the consistency of the data blocks by calculating the sum of the crcs 32 of each data block, namely, each row of data of the data blocks is spliced with all column values to form a row character string, the crcs 32 of the row character string are calculated, the crcs 32 of all row character strings in the data blocks are accumulated to obtain the crcs 32 of the whole data block, the splitting index value representing the uniqueness of the data block is obtained, and the consistency of the main data and the standby data is judged by judging the crcs 32 value of the corresponding data block on the main data and the standby data;

the data block internal data consistency deep comparison module is used for recording the range of splitting indexes of the data blocks for the data blocks with inconsistent main and standby data, splitting the data blocks into at least one sub data block according to the size of the data blocks, calculating the crc32 of the sub data blocks, and judging the main and standby data consistency by judging the crc32 value of the corresponding sub data block on the main and standby.

Preferably, the preset target interval number is 100000.

In a third aspect, the present invention also discloses an electronic device, including:

a processor; and

a memory having executable code stored thereon, which when executed by a processor causes the processor to perform the method according to the first aspect of the present application.

In a fourth aspect, the present invention also provides a computer readable storage medium having stored thereon executable code which when executed by a processor of an electronic device causes the processor to perform the method of the first aspect of the present application.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

the invention provides a method, a device, equipment and a medium for checking the consistency of the primary and the secondary data of a database copy, which can efficiently and accurately calculate the consistency of the primary and the secondary data without locking a table and line data and affecting the normal operation of a service.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:

FIG. 1 shows a flow chart of a primary and backup data consistency detection algorithm;

FIG. 2 is a schematic diagram of a database replica data master-slave consistency check device;

fig. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and more obvious, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

It is noted that the terms "first," "second," and the like in the description and claims of the present invention and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order, and it is to be understood that the data so used may be interchanged where appropriate. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

FIG. 1 is a flowchart of a primary and backup data consistency detection algorithm according to an embodiment of the present invention.

As shown in FIG. 1, the method for checking the primary and secondary consistency of the data of the database copy specifically comprises the following steps:

step 101: splitting index selection.

And carrying out index de-duplication on the data table according to a preset splitting rule to obtain a splitting index.

The splitting index selects index fields with low repetition degree as far as possible so as to conveniently divide the intervals of the data. In this embodiment, the splitting rule is: and selecting a primary key and a unique key preferentially, and selecting a numerical value type preferentially under the condition of the same repeatability with low repeatability in index statistical information. If the split columns are all the same, splitting may not be possible. The repetition degree of the primary key and the unique key is 0, the common index repetition degree is generally greater than or equal to 0, the lower the repetition degree is, the more the index column values are subjected to de-duplication, the more data block segmentation is facilitated, the data is prevented from being concentrated in a certain two values, and the data blocks are prevented from being too large in part and the data is not left in part. For a data block which is too large, the detection time consumption is increased, and due to the existence of the DML, the probability of data inconsistency is increased due to the fact that master-slave (namely master-slave) data are not synchronous, repeated detection is caused, and the detection efficiency is greatly affected. The numerical index is also preferred because the numerical value can better divide the range of the interval, and the size of the interval is calculated according to the total number of lines (the number of lines after the index value is de-duplicated) and the number of target intervals to be split, and then all the range of the interval can be calculated according to the maximum value and the minimum value.

Step 102: confirming the split block size.

And splitting the data table into intervals by using the splitting index to split the data table into proper sizes. That is, the size of the split blocks is determined, so that the number of the split blocks is closest to the preset target interval number, the data table is divided into intervals according to the determined size of the split blocks by using the split index, at least one split block is generated, and each split block corresponds to one data block. The more rows of the data table, the larger the split block.

The step of determining the size of the split block is as follows:

the target interval number is set to 100000.

If the split field is a numerical value type, counting the number of split fields/10 after the index value is de-duplicated ^N The number (N is an integer greater than or equal to 1), namely the number of split sections, comparing the number of sections calculated by the number of sections with the number of N calculated by different values, taking the value of N when the number of sections is closest to 100000, wherein the calculated number of sections is the number of split blocks, and further determining the size of the split blocks.

For example, assuming that the total number of rows of the original table is M1, the index column data is de-duplicated according to the selected index to obtain M2 (m2 < =m1, when the index is a primary key or a unique key, m2=m1). According to the above calculation method, if M2 is 10w, n=0 if the calculated number of sections is to be closest to 100000, the number of split blocks is 10w, and the size of split blocks is 1; if M2 is 100w, to make the calculated number of intervals closest to 100000, n=1, the number of split blocks is 10w, and the size of split blocks is 10; if M2 is 50w, n=1 if the calculated number of intervals is closest to 100000, the number of split blocks is 5w, and the size of split blocks is 10.

For the splitting index of the character string type, the number of left (column, N) after duplication removal (N is an integer greater than or equal to 0) can be counted, namely the splitting interval number, wherein left (column, N) represents the left N characters of the splitting field of the intercepted character string, the number of the interval number calculated by comparing N with different values is the numerical value of the interval number, N when the interval number is closest to 100000 is taken, the calculated interval number is the number of splitting blocks, and the size of the splitting blocks is determined.

In the above description, the split field is a primary key or a unique key of the data table, and the split field interval (minimum-maximum value) can be used to quickly scan the data of the whole split block in the data block or calculate the crc32 value of the split block.

Step 103: data block consistency is compared.

The data block consistency is determined by calculating the sum of the crc32 for each data block.

The method comprises the following specific steps: for each row of data, all column values are spliced to form a row character string, the crc32 of the row character string is calculated, the crc32 of all row character strings in the data block are accumulated to obtain the crc32 of the whole data block, the splitting index value representing the uniqueness of the data block is obtained, and the consistency of the main data and the standby data is judged by judging the crc32 value of the corresponding data block on the main data block and the slave data.

Specifically, through the relational database grouping grammar (Group by), a Group of corresponding data blocks can be directly calculated from the crc32 of each Group at the database level, so that a plurality of network IOs can be saved in local calculation, the data consistency detection program does not need to pull all line data, only needs to query about 100000 data block splitting index values and the crc32 value of the corresponding data block, and the comparison of the data block consistency can be completely performed in the memory due to the fact that the result set is not large.

The concrete calculation grammar is as follows:

sum(crc32(concat(col1,col2....,colN))from group by index_col/N。

where sum is a distributed database summing aggregation function, crc32 is the crc32 value of the calculated string, and concat is the concatenation of all columns into one string.

Step 104: and deeply comparing the data consistency in the data blocks.

For data block data inconsistencies, a data block split index range needs to be recorded. According to the size of the data block, the data block is further split into at least one sub-data block, and the sub-data block or row crc32 (corresponding to the data block with the size of 1) is calculated, which corresponds to repeating the operations of steps 101-103, except that the interval is reduced from the whole data table to the inconsistent interval, and the maximum and minimum values are changed. For inconsistencies, recordings are made. The inconsistent rows are repeatedly compared, so that the influence of the master-slave copy delay is eliminated.

On the other hand, as shown in fig. 2, the invention also provides a database copy data primary and backup consistency verification device, which specifically comprises a split index selection module, a split block size confirmation module, a data block consistency comparison module and a data block internal data consistency deep comparison module.

The splitting index selection module is used for carrying out index de-duplication on the data table according to a preset splitting rule to obtain a splitting index.

The splitting block size confirming module is used for determining the sizes of the splitting blocks according to the number of lines of the splitting index and the preset target interval number, so that the number of the splitting blocks is closest to the target interval number, the splitting index is used for carrying out interval splitting on the data table according to the sizes of the splitting blocks, at least one splitting block is generated, and each splitting block corresponds to one data block.

The data block consistency comparison module is used for judging the consistency of the data blocks by calculating the sum of the crcs 32 of each data block, namely, each row of data of the data blocks is spliced with all column values to form a row character string, the crcs 32 of the row character string are calculated, the crcs 32 of all row character strings in the data blocks are accumulated to obtain the crcs 32 of the whole data block, the splitting index value representing the uniqueness of the data block is obtained, and the consistency of the main data and the standby data is judged by judging the crcs 32 value of the corresponding data block on the main data and the standby data.

The data block internal data consistency deep comparison module is used for recording the range of splitting indexes of the data block for the data block with inconsistent main and standby data, splitting the data block into at least one sub data block according to the size of the data block, calculating the crc32 of the sub data block, and judging the main and standby data consistency by judging the crc32 value of the corresponding sub data block on the main and standby.

FIG. 3 illustrates a schematic diagram of a computing device according to an embodiment of the present disclosure.

As shown in fig. 3, the computing device 200 disclosed herein may include a processor 210 and a memory 220. A memory 220 for storing a computer program; wherein the processor 210 executes computer programs in the memory 220 to implement the methods provided by the method embodiments described above. The specific implementation process can be referred to the relevant description above, and will not be repeated here.

In an embodiment, an electronic device is used for illustrating a database replica data master-slave consistency check device provided by the application. The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform the desired functions.

The memory may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and a processor may execute the program instructions to implement the methods and/or other desired functions in the various embodiments of the present application above. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.

Furthermore, the present invention provides a computer-readable storage medium having stored therein a computer program for implementing the method provided by the method embodiments described above when executed by a processor.

In practice, the computer program in this embodiment may write program code for performing the operations of the embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

In practice, a computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Those of skill would further appreciate that the various illustrative logical blocks, modules, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus and methods according to embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In summary, the invention provides a method, a device, equipment and a medium for checking the primary and secondary consistency of database duplicate data, which realize the consistency detection of lock-free tables and row-free level locks. By adopting the detection algorithm of the software, the consistency of the main data and the standby data can be calculated efficiently and accurately, the lock list and the line data are not needed, and the normal operation of the service is not influenced.

The above description of the specific embodiments of the present invention has been given by way of example only, and the present invention is not limited to the above described specific embodiments. Any equivalent modifications and substitutions for the present invention will occur to those skilled in the art, and are also within the scope of the present invention. Accordingly, equivalent changes and modifications are intended to be included within the scope of the present invention without departing from the spirit and scope thereof.

Claims

1. The method for checking the consistency of the database duplicate data is characterized by comprising the following steps:

2. The database replica data master-slave consistency check method according to claim 1, wherein the splitting rule comprises: the main key and the unique key are preferably selected, then the index field with low repeatability in the index statistical information is selected, and the numerical index is preferably selected under the condition that the repeatability is the same.

3. The method for checking the consistency of the primary and secondary data of the database replica according to claim 1, wherein the determining the size of the split block according to the number of rows of the split index and the preset target interval number comprises the following steps:

4. The method for checking consistency of data primary and secondary of a database replica according to claim 1, wherein the calculating of the sum of the crc32 of each data block is performed directly at the database level by a relational database Group grammar Group by a Group of corresponding data blocks.

5. The method for checking the consistency of the primary and secondary data of the copy data of the database according to claim 1, wherein if the data of the corresponding sub data blocks on the primary and secondary data blocks are inconsistent, the inconsistent sub data blocks are marked.

6. The method for checking the consistency of the database replica data master and slave according to claim 1, wherein the preset target interval number is 100000.

7. The database replica data master-slave consistency check system is characterized by comprising:

8. The database replica data master/slave consistency verification system according to claim 7, wherein said splitting rule comprises: the main key and the unique key are preferably selected, then the index field with low repeatability in the index statistical information is selected, and the numerical index is preferably selected under the condition that the repeatability is the same.

9. A computing device, comprising:

a processor; and

a memory having executable code stored thereon which, when executed by a processor, causes the processor to perform the database replica data master-slave consistency check method of any of claims 1-6.

10. A computer readable storage medium having executable code stored thereon, which when executed by a processor of an electronic device causes the processor to perform the database replica data master-slave consistency check method of any of claims 1-6.