CN117762996A - Method and equipment for grouping comparison database data - Google Patents

Method and equipment for grouping comparison database data Download PDF

Info

Publication number
CN117762996A
CN117762996A CN202311826398.XA CN202311826398A CN117762996A CN 117762996 A CN117762996 A CN 117762996A CN 202311826398 A CN202311826398 A CN 202311826398A CN 117762996 A CN117762996 A CN 117762996A
Authority
CN
China
Prior art keywords
data
rowid
packet
comparison
mapping relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311826398.XA
Other languages
Chinese (zh)
Inventor
孙峰
谭畅
彭青松
刘启春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Dream Database Co ltd
Original Assignee
Wuhan Dream Database Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Dream Database Co ltd filed Critical Wuhan Dream Database Co ltd
Priority to CN202311826398.XA priority Critical patent/CN117762996A/en
Publication of CN117762996A publication Critical patent/CN117762996A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and equipment for grouping comparison database data. The method mainly comprises the following steps: constructing ROWID mapping relation tables of a left table and a right table; calculating the number of the groups according to the number of rows of the left table and the right table and the size of a memory which can be provided by the current system; acquiring the ROWID mapping relation of each group from the ROWID mapping relation table through the number of rows of the left table and the number of groups, and constructing the data query condition of each group based on the ROWID mapping relation; according to the data query conditions of the groups, extracting the preset number of group data in parallel for comparison; and after the data of a certain group is compared, sequentially extracting the data of the next group for comparison until the comparison of all the groups is completed, and generating a comparison report according to the residual data in the memory. The invention can reduce the memory resource requirement in the comparison process, reduce or avoid the exchange times of the data pages and the disk caused by insufficient memory, and achieve the purpose of improving the comparison performance.

Description

Method and equipment for grouping comparison database data
Technical Field
The present invention relates to the field of database technologies, and in particular, to a method and apparatus for packet comparison of database data.
Background
With the development of database technology, data migration technology between databases is applied to various industries, after two databases complete data migration, in order to check the correctness of data migration, data comparison is performed on tables in the two databases to ensure the consistency of data in the two databases, or because data differences need to be found out when faults occur in the migration process, an efficient and rapid data comparison solution needs to be found. At present, data migration and data comparison between two databases are two completely independent processes, after the data migration is completed, a data comparison tool is used for comparing the data of two tables (a left table and a right table), in order to improve the data comparison performance, the tables on two sides are generally divided into N groups independently, then the databases on two sides are connected by G pieces and query to obtain data, MD5 values are calculated for each queried data, algorithms such as HASH or B trees are adopted to perform opposite flushing on the MD5 with the same left and right, and finally, a comparison report is generated by the MD5 values which are not flushed and records of the MD5 values, so that comparison is completed. The existing method has better contrast performance on the tables with smaller data scale under the condition of sufficient memory, but when the two tables have massive data and insufficient memory to accommodate all contrast records, two adjacent records in the database can be recorded in different data pages of a HASH or B tree due to the characteristic of MD5 value comparison dispersion, which can cause frequent exchange of data between the memory and the disk during contrast, thereby reducing contrast performance. Therefore, how to solve the comparison efficiency of mass data in the scene becomes an important technical problem to be solved in the industry.
In view of this, how to overcome the defects existing in the prior art and solve the technical problem of the comparison performance of the mass data is a difficult problem to be solved in the technical field.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art: when there is mass data in both tables and there is insufficient memory to accommodate all of the comparison records, due to the relatively decentralized nature of the MD5 values, two adjacent records in the database may be recorded in different data pages of the HASH or B tree, which may result in frequent exchange of data between memory and disk during comparison, thereby degrading performance of the comparison. In order to solve the above-mentioned short-board problem encountered in the process of comparing mass data, the invention provides a method and equipment for grouping and comparing database data, which integrates two processes of data migration and data comparison, when migrating data, the ROWID needs to be added as the query item of the source library, and when the line number is fixed, the ROWID of the data of the line in the target library needs to be obtained, and the mapping relation is stored. After the migration of data is completed, a plurality of ROWID mapping relations between a source end and a target end are obtained, the data scale is not required to be queried from the source end and the target end during data comparison, and the mapping relations stored by the previous data migration are directly compared according to the required grouping, so that the grouping based on the mapping relations can ensure the maximum possible collision elimination of the data in each grouping of the left table and the right table, reduce the requirement on memory resources in the comparison process, reduce or avoid the exchange times of data pages and magnetic discs caused by insufficient memory, and achieve the purpose of improving the comparison performance.
The embodiment of the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method of packet contrast database data, comprising:
constructing ROWID mapping relation tables of a left table and a right table;
calculating the number of the groups according to the number of rows of the left table and the right table and the size of a memory which can be provided by the current system;
acquiring the ROWID mapping relation of each group from the ROWID mapping relation table through the number of rows of the left table and the number of groups, and constructing the data query condition of each group based on the ROWID mapping relation;
according to the data query conditions of the groups, extracting the preset number of group data in parallel for comparison;
and after the data of a certain group is compared, sequentially extracting the data of the next group for comparison until the comparison of all the groups is completed, and generating a comparison report according to the residual data in the memory.
Further, the step of constructing the ROWID mapping relation table of the left table and the right table specifically includes:
acquiring source library data by taking the ROWID as an additional query item of the source library data, and transmitting the source library data carrying the ROWID to a target library to perform data migration; wherein, the table of the source library is a left table, and the table of the target library is a right table;
when X lines of data are arranged, constructing a ROWID mapping relation by using a left table line number currently extracted by a left table, a left table ROWID corresponding to the left table line number and a right table ROWID returned by the line data in a target library;
after the data migration is completed, forming the ROWID mapping relation of the K left tables and the K right tables, namely constructing the ROWID mapping relation tables of the left tables and the right tables.
Further, when data migration is performed, counting the number of the migrated rows of the left table and the number of the successfully-stored rows of the right table, wherein the number of the migrated rows of the left table is the number of the left table, and the number of the successfully-stored rows of the right table is the number of the right table.
Further, the calculating the number of the packets according to the number of rows of the left table and the right table and the memory size available to the current system specifically includes:
the number of packets n= ((l+r) ×z)/((M- (L-R) ×z)/G); the number of the groups N is rounded upwards, L is the number of rows of the left table, R is the number of rows of the right table, M is the memory size used for comparison, G is the preset number of the groups which can be extracted simultaneously in parallel, and Z is the memory size occupied by single-row data in a comparison algorithm.
Further, when the value of (L-R) Z approaches, equals or exceeds the value of M, the comparison is terminated.
Further, the obtaining the ROWID mapping relation of each packet from the ROWID mapping relation table through the number of rows and the number of packets in the left table, and constructing the data query condition of each packet based on the ROWID mapping relation specifically includes:
splitting the line number L of the left table into N groups based on the number N of the groups;
acquiring the initial line number of each packet;
searching the nearest left table row number in the ROWID mapping relation table through the initial row number, thereby obtaining the ROWID mapping relation corresponding to each group;
and each packet combines the ROWID mapping relation of the packet and the ROWID mapping relation of the upper and lower adjacent packets to construct a left table condition and a right table condition of the data query condition of each packet.
Further, the constructing the left table condition and the right table condition of the data query condition of each packet by combining the ROWID mapping relationship of each packet and the ROWID mapping relationship of the upper and lower adjacent packets specifically includes:
when a packet has only a packet adjacent downward, the left table condition of the data query condition of the packet is: the ROWID is smaller than or equal to the left table ROWID in the ROWID mapping relation of the self-grouping; the right table conditions of the data query conditions of the packet are: the ROWID is smaller than or equal to the right table ROWID in the ROWID mapping relation of the self-grouping;
when a packet has an upwardly adjacent packet and a downwardly adjacent packet, the left table condition of the data query condition of the packet is: the ROWID is larger than the left table ROWID in the ROWID mapping relation of the upward adjacent packet, and the ROWID is smaller than or equal to the left table ROWID in the ROWID mapping relation of the self packet; the right table conditions of the data query conditions of the packet are: the ROWID is larger than the right table ROWID in the ROWID mapping relation of the upward adjacent packet, and the ROWID is smaller than or equal to the right table ROWID in the ROWID mapping relation of the self packet;
when a packet has only an upwardly adjacent packet, the left table condition of the data query condition of the packet is: the ROWID is larger than the left table ROWID in the ROWID mapping relation of the upward adjacent packet; the right table conditions of the data query conditions of the packet are: the ROWID is greater than the right table ROWID in the ROWID mapping of the upward neighbor packet.
Further, the parallel extraction of the preset number of packet data according to the packet data query condition for comparison specifically includes:
according to the left table condition and the right table condition of the data query conditions of the groups, G groups of connections are created at the same time, the data of the left table and the right table in the G groups are respectively extracted according to the grouping sequence for comparison, the same data collision is eliminated, and different data are reserved in a memory; wherein G is the preset number of packets which can be extracted simultaneously in parallel.
Further, for the created G group of connections, each group of connections includes two connections, which connect the source library and the target library, respectively, for data extraction, the grouping is marked as contrast complete when both connections complete data extraction.
In another aspect, the present invention provides a device for grouping comparison database data, specifically: the method comprises at least one processor and a memory, wherein the at least one processor and the memory are connected through a data bus, and the memory stores instructions executed by the at least one processor, and the instructions are used for completing the method for comparing data in the database by packets in the first aspect after being executed by the processor.
Compared with the prior art, the invention has the beneficial effects that: the invention provides a method and equipment for grouping and comparing database data, which are used for grouping and comparing data after massive data are migrated among databases so as to check the correctness of data migration or find out the data difference caused by faults in the migration process.
In the conventional scheme, data migration and data comparison are two independent processes, which results in a condition of lacking in grouping decision when the comparison performance is improved by adopting a grouping comparison optimization means in the data comparison process, the left table and the right table are respectively divided into N groups based on respective estimated data amounts, and data between the N groups of the two tables have no one-to-one mapping relationship, for example: when the data extraction and comparison of the first groups on two sides are completed, a large amount of data of the first group of the left table is left in the memory, and the data need to be compared and eliminated by the data acquired by the query of the second or third group on the right. The phenomenon causes a large amount of inconsistent data to occupy limited memory resources in the comparison process, so that the memory can be burst in the subsequent comparison process, and the data pages and the disk are required to be exchanged to free the memory for comparison, thereby reducing the performance of data comparison.
The invention integrates two processes of data migration and data comparison, when data is migrated, ROWID is needed to be added as query item of source library, and every X rows, the data of the row is needed to be obtained in target library ROWID of target library, and the mapping relation is stored. After the migration of data is completed, the ROWID mapping relations of K source ends and target ends are obtained, the data scale is not required to be queried from the source ends and the target ends during data comparison, and the mapping relation stored by the previous data migration is directly compared according to the required grouping, so that the grouping based on the mapping relation can ensure the maximum possible collision elimination of the data in each grouping of the left table and the right table, reduce the requirement on memory resources in the comparison process, reduce or avoid the exchange times of data pages and magnetic discs caused by insufficient memory, and achieve the purpose of improving the comparison performance.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the embodiments of the present invention will be briefly described below. It is evident that the drawings described below are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a flow chart of a method for packet-comparison database data according to embodiment 1 of the present invention;
FIG. 2 is an expanded flowchart of step 100 provided in embodiment 1 of the present invention;
FIG. 3 is a flowchart showing a specific step 300 according to embodiment 1 of the present invention;
fig. 4 is a schematic diagram of an apparatus structure of packet comparison database data according to embodiment 3 of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
It should be noted that, if not conflicting, the various features of the embodiments of the present invention may be combined with each other, which are all within the protection scope of the present application. In addition, while functional block division is performed in a device diagram and logical order is shown in a flowchart, in some cases, the steps shown or described may be performed differently than block division in a device, or order in a flowchart.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The invention will be described in detail below with reference to the drawings and examples.
Example 1:
aiming at the data comparison scheme after massive data migration among databases, the embodiment of the invention provides a method for comparing database data in a grouping way, which is shown in figure 1 and comprises the following specific steps.
Step 100: and constructing ROWID mapping relation tables of the left table and the right table. The step takes the ROWID as an additional query item of the source library to acquire source library data, sends the source library data to the target library, and sequentially executes the target library according to the acquisition sequence of the data to realize the data migration from the source library to the target library. And after the migration of the data is completed, obtaining the ROWID mapping relation of the K source terminals and the target terminals.
Step 200: the number of packets is calculated by the number of rows in the left and right tables and the size of memory available to the current system. In this step, in order to prevent the limited memory from being exploded to cause the loss of contrast performance, the rate of acquiring the data from the database needs to be increased by taking into account the packet query, which requires calculating the number N of packets before the comparison.
Step 300: and acquiring the ROWID mapping relation of each packet from the ROWID mapping relation table through the number of rows of the left table and the number of packets, and constructing the data query condition of each packet based on the ROWID mapping relation. In the step, the left table is split into N groups by using the line number L of the left table, the initial line number of each group is obtained, and the nearest left table line number is searched in the ROWID mapping relation table through the initial line number, so that the ROWID mapping relation corresponding to each group is obtained. In addition, each group can respectively construct the condition of the group inquiry scope by combining the groups adjacent to each other up and down, and the group without the groups adjacent to each other up and down does not construct the closed zone, and only one open zone is used.
Step 400: and extracting the preset number of packet data in parallel for comparison according to the packet data query conditions. In the comparison, the same data collision is eliminated, and the different data collision is reserved in the memory.
Step 500: and after the data of a certain group is compared, sequentially extracting the data of the next group for comparison until the comparison of all the groups is completed, and generating a comparison report according to the residual data in the memory. After all the grouping comparison is completed, namely data comparison is completed, and finally a comparison report is generated according to the residual data in the memory, namely the required comparison report.
By the method, the embodiment integrates two processes of data migration and data comparison, when data is migrated, ROWID is required to be added as a query item of a source library, and every X rows, the ROWID of the data of the row in a target library is required to be acquired, and the mapping relation is stored. After the migration of data is completed, the ROWID mapping relations of K source ends and target ends are obtained, the data scale is not required to be queried from the source ends and the target ends during data comparison, and the mapping relation stored by the previous data migration is directly compared according to the required grouping, so that the grouping based on the mapping relation can ensure the maximum possible collision elimination of the data in each grouping of the left table and the right table, reduce the requirement on memory resources in the comparison process, reduce or avoid the exchange times of data pages and magnetic discs caused by insufficient memory, and achieve the purpose of improving the comparison performance.
Referring to fig. 2, in a specific embodiment, the step 100 of constructing the ROWID mapping tables of the left table and the right table specifically includes the following steps.
Step 101: acquiring source library data by taking the ROWID as an additional query item of the source library data, and transmitting the source library data carrying the ROWID to a target library to perform data migration; wherein, the table of the source library is a left table, and the table of the target library is a right table.
Step 102: and when data of every X rows is obtained, constructing a ROWID mapping relation by using the left table row number currently extracted by the left table, the left table ROWID corresponding to the left table row number and the right table ROWID returned by the data in the target library.
Step 103: after the data migration is completed, forming the ROWID mapping relation of the K left tables and the K right tables, namely constructing the ROWID mapping relation tables of the left tables and the right tables.
In the above process, assuming that the table of the source library is a left table and the table of the target library is a right table, the target library needs to count the number of rows of migration data when executing, and at every interval X rows, and constructing a ROWID mapping relation by using the currently extracted line number of the left table, the line ROWID and the right table ROWID returned by the line in the target library, and forming the ROWID mapping relation of K left tables and right tables after data migration is completed. The ROWID mapping table is exemplified as follows:
left list row number Left-hand table ROWID Right watch ROWID
N1 L_RID1 R_RID1
N2 L_RID2 R_RID2
N3 L_RID3 R_RID3
In the data migration process, various errors (data interception, type conversion failure, target library failure and the like) may occur in the target library during storage, so that the data of the left table cannot be successfully migrated to the right table, and therefore the number of the rows migrated from the left table and the number of the rows successfully stored in the right table are respectively counted during migration, the number of the rows migrated from the left table is the number L of the rows of the left table, and the number of the rows successfully stored in the right table is the number R of the rows of the right table.
The number of lines X of the interval needs to be set in combination with the size M of the currently available memory when setting, to ensure that the memory M can accommodate 2 times of X line records (left table and right table), since the ROWID of the target library data is to be acquired, too close interval will affect the performance of data migration, and X is set as one thousandth of the number of lines by default, it should be noted that this embodiment is for comparison of mass data, one thousandth of mass data is not a decimal, and X may be a dynamic value, and this value is to be evaluated according to the data amount of mass data.
In a specific embodiment, the calculating the number of packets according to the number of rows of the left table and the right table and the memory size available in the current system in step 200 specifically includes: the number of packets n= ((l+r) ×z)/((M- (L-R) ×z)/G); the number of the packets N is rounded upwards, L is the number of rows of the left table, R is the number of rows of the right table, M is the memory size (i.e. the size of the current available memory) used for comparison, G is the preset number of the packets which can be extracted simultaneously in parallel, and Z is the memory size occupied by single-row data in a comparison algorithm. In a specific embodiment, the comparison is terminated when the value of (L-R) Z approaches, equals or exceeds the value of M. When the value of (L-R) Z approaches (e.g., reaches more than 90% of the value of M) and equals or exceeds the value of M, a large number of inconsistent rows are indicated, and no comparison can be made by adopting the scheme.
For the above calculation formula, if an error occurs in the migration process, the number R of rows in the right table is definitely smaller than the number L of rows in the left table, and this part of differential data is definitely left in the memory to be eliminated by collision during comparison, so when the packet is calculated, this part of memory is reserved, and when the size (L-R) Z of the memory to be reserved is close to or exceeds the memory M used for comparison, the comparison is terminated.
The main function of the formula is to divide mass data into N groups, and simultaneously extract data of G groups for comparison according to the existing available memory each time, so that the memory can be ensured to be utilized to the maximum extent, and the performance loss caused by memory explosion can be prevented; and the speed of acquiring the data from the database can be improved by adopting a parallel query mode, so that the data of each group can be completed in the memory as much as possible during comparison, the frequency of memory data and disk exchange during comparison is reduced, and the aim of improving the performance is fulfilled.
Referring to fig. 3, in a specific embodiment, for the number of rows and the number of packets in the left table in step 300, the ROWID mapping relationship of each packet is obtained from the ROWID mapping relationship table, and the data query condition of each packet is constructed based on the ROWID mapping relationship specifically includes the following steps.
Step 301: the number of rows L of the left table is split into N packets based on the number of packets N.
Step 302: the starting line number of each packet is obtained.
Step 303: and searching the nearest left table row number in the ROWID mapping relation table through the initial row number, thereby acquiring the ROWID mapping relation corresponding to each group.
Step 304: and each packet combines the ROWID mapping relation of the packet and the ROWID mapping relation of the upper and lower adjacent packets to construct a left table condition and a right table condition of the data query condition of each packet.
In a specific embodiment, when a packet has only a downward adjacent packet, the left table condition of the data query condition of the packet is: the ROWID is smaller than or equal to the left table ROWID in the ROWID mapping relation of the self-grouping; the right table conditions of the data query conditions of the packet are: the ROWID is smaller than or equal to the right table ROWID in the ROWID mapping relation of the self-grouping; when a packet has an upwardly adjacent packet and a downwardly adjacent packet, the left table condition of the data query condition of the packet is: the ROWID is larger than the left table ROWID in the ROWID mapping relation of the upward adjacent packet, and the ROWID is smaller than or equal to the left table ROWID in the ROWID mapping relation of the self packet; the right table conditions of the data query conditions of the packet are: the ROWID is larger than the right table ROWID in the ROWID mapping relation of the upward adjacent packet, and the ROWID is smaller than or equal to the right table ROWID in the ROWID mapping relation of the self packet; when a packet has only an upwardly adjacent packet, the left table condition of the data query condition of the packet is: the ROWID is larger than the left table ROWID in the ROWID mapping relation of the upward adjacent packet; the right table conditions of the data query conditions of the packet are: the ROWID is greater than the right table ROWID in the ROWID mapping of the upward neighbor packet.
For the above process, the left table is split into N groups by using the line number L of the left table, and the start line number of each group is obtained, and the nearest left table line number is searched in the ROWID mapping relation table by the start line number, so as to obtain the ROWID mapping relation corresponding to each group. Examples are as follows:
grouping Left list row number Left-hand table ROWID Right watch ROWID
1 N1 L_RID1 R_RID1
X N2 L_RID2 R_RID2
N N3 L_RID3 R_RID3
Each group combines the adjacent groups to construct the condition of the group inquiry scope, and the group without the adjacent groups does not construct the closed zone, and only uses one open zone. Examples are as follows:
grouping Left-hand table condition Right Table condition
1 ROWID<=L_RID1 ROWID<=R_RID1
X L_RID1<ROWID<=L_RID2 R_RID1<ROWID<=R_RID2
N L_RID2<ROWID R_RID2<ROWID
In a specific embodiment, for the data query condition according to the packet in step 400, extracting the preset number of packet data in parallel for comparison specifically includes: according to the left table condition and the right table condition of the data query conditions of the groups, G groups of connections are created at the same time, the data of the left table and the right table in the G groups are respectively extracted according to the grouping sequence for comparison, the same data collision is eliminated, and different data are reserved in a memory; wherein G is the preset number of packets which can be extracted simultaneously in parallel. In a specific embodiment, for a created G group of connections, where each group of connections includes two connections that connect the source library and the target library, respectively, for data extraction, due to speed differences, the packet needs to be marked as contrast complete when both connections complete data extraction. In this embodiment, the preset of G needs to be determined according to the current hardware configuration, and is a dynamic value, if the number of CPU cores is large enough, the G may be set to be larger, otherwise, it is set to be smaller.
In a specific embodiment, for step 500, after a certain packet comparison is completed, the comparison of the next packet is started in sequence until all packet comparisons are completed. And finally, generating a comparison report according to the data remained in the memory, and completing data comparison.
The scheme is mainly used for verifying the consistency of data after mass data migration is carried out on two databases, and finding out different data. The traditional method divides data migration and data comparison into two independent steps, so that no basis exists for data grouping during data comparison, the left table and the right table are not mapped in a targeted mode, more data can remain in the memory after each grouping comparison is finished, and finally the memory is exploded, so that the overall comparison performance is affected. The above steps of the embodiment of the present invention can be explained as follows:
firstly, mass data migration and data comparison are carried out on two databases, one-to-one ROWID mapping relation is generated on data of a left table and a right table at each interval X action when data are migrated, basis is provided for data comparison and grouping, the same data row can be eliminated by the most possible collision after each group of comparison is completed, and the phenomenon that the same data on two sides are left in a limited memory for a long time to directly support the memory to cause the reduction of comparison performance is avoided.
Secondly, when the data comparison is performed on the calculated packet N, the quantity of memory occupied by the known inconsistent data is required to be reserved in advance, then the residual memory is split into G pairs of parallel packets, so that the comparison performance is accelerated by adopting a mode of multi-connection concurrent data extraction during the data comparison, and finally the packet number N is calculated according to the total quantity of the left table and the right table data.
Example 2:
based on the method for grouping the comparison database data provided in embodiment 1, this embodiment 2 describes the present invention in more detail through a specific application scenario.
For convenience of illustration, the data size of the left table is simplified, and the above scheme is illustrated as follows:
the source database has a table T1 (ID INT) and has 4 rows of data, distributed as follows:
the data migration process is as follows:
the T1 of the left table is migrated to the T1 of the right table, X is set to be 1, and a pair of ROWID mapping relations needs to be generated every 1 row.
When the data of the left table is inserted into the right table, RETURN ROWID INTO:RID grammar can be added behind the INSERT statement to obtain the ROWID value of the inserted row in the target library, and the following mapping relation can be generated by combining the row number of the row in the left table and the ROWID of the left table, wherein B represents the ROWID of the target library.
Left list row number Left-hand table ROWID Right watch ROWID
1 A1 B1
In the process of data migration, the number of lines L extracted from the left table and the number of lines R successfully put in the right table are required to be counted, and if the target database is restarted due to failure when the 4 th line data of the left table is migrated, so that the line data migration fails (the failed line is mapped only when the successful line is mapped and is just separated by X lines), after the data migration is completed, the number of lines L extracted from the left table is 4, the number of lines R successfully put in the right table is 3, one line is different, and the final ROWID relation mapping table is as follows.
Assuming that the current memory M is 6 bytes, each record needs to occupy 1 byte of memory during comparison, and the number G of packets for parallel comparison is set to 2.
The number of packets can be obtained by the formula n= ((l+r) ×z)/((M- (L-R) ×z)/G):
N=((4+3)*1)/((6-(4-3)*1)/2)=3。
and respectively positioning corresponding ROWID mapping relations in the ROWID relation mapping table according to the calculated comparison group number N (3) and the left table number L (4).
And constructing query conditions of each group through the groups adjacent to the mapping relation, and not constructing corresponding closed intervals without the groups adjacent to each other.
Grouping Left-hand table condition Right Table condition
1 ROWID<=A1 ROWID<=B1
2 A1<ROWID<=A2 B1<ROWID<=B2
3 A2<ROWID B2<ROWID
And simultaneously, the 1 st and the 2 nd grouping queries are run for data comparison, and at the moment, the memory with 6 bytes can meet the memory requirement of a comparison algorithm, because the memory required by the data quantity of the two groupings for comparison is 4 bytes.
When the 1 st packet has been compared and the 2 nd packet is still in progress, the consistent data of the first packet is crashed out, 2 bytes of memory of the packet are emptied, and 4 bytes are available for the next packet in addition to the 2 bytes needed by the 2 nd packet currently running, at which time the memory is sufficient to enable the comparison of the 3 rd packet.
After all the packet comparison is completed, a record with the left table ROWID of A4 remains in the memory, and the record is generated into a comparison report to complete the comparison.
In summary, the present embodiment provides a method for comparing database data in a packet, which performs packet comparison on data transferred between databases of massive data, so as to check the correctness of data transfer or find out the data difference caused by faults in the transfer process. After the scheme is adopted, the comparison of each group can be collided and the consistent data can be eliminated, so that the memory is saved for the comparison of the next group, the situation that the data of the previous group in the traditional scheme is remained in the memory after the comparison is completed and the next group is required to be collided and eliminated is prevented, and the defect that the comparison performance is reduced due to the fact that the memory is burst under extreme conditions is avoided.
In the conventional scheme, data migration and data comparison are two independent processes, which results in a condition of lacking in grouping decision when the comparison performance is improved by adopting a grouping comparison optimization means in the data comparison process, the left table and the right table are respectively divided into N groups based on respective estimated data amounts, and data between the N groups of the two tables have no one-to-one mapping relationship, for example: when the data extraction and comparison of the first groups on two sides are completed, a large amount of data of the first group of the left table is left in the memory, and the data need to be compared and eliminated by the data acquired by the query of the second or third group on the right. The phenomenon causes a large amount of inconsistent data to occupy limited memory resources in the comparison process, so that the memory can be burst in the subsequent comparison process, and the data pages and the disk are required to be exchanged to free the memory for comparison, thereby reducing the performance of data comparison.
The invention integrates two processes of data migration and data comparison, when data is migrated, ROWID is needed to be added as query item of source library, and every X rows, the data of the row is needed to be obtained in target library ROWID of target library, and the mapping relation is stored. After the migration of data is completed, the ROWID mapping relations of K source ends and target ends are obtained, the data scale is not required to be queried from the source ends and the target ends during data comparison, and the mapping relation stored by the previous data migration is directly compared according to the required grouping, so that the grouping based on the mapping relation can ensure the maximum possible collision elimination of the data in each grouping of the left table and the right table, reduce the requirement on memory resources in the comparison process, reduce or avoid the exchange times of data pages and magnetic discs caused by insufficient memory, and achieve the purpose of improving the comparison performance.
Example 3:
on the basis of the method for packet comparison database data provided in the foregoing embodiments 1 to 2, the present invention further provides an apparatus for packet comparison database data that can be used to implement the foregoing method, as shown in fig. 4, which is a schematic diagram of an apparatus architecture according to an embodiment of the present invention. The apparatus for packet contrast database data of the present embodiment includes one or more processors 21 and a memory 22. In fig. 4, a processor 21 is taken as an example.
The processor 21 and the memory 22 may be connected by a bus or otherwise, for example in fig. 4.
The memory 22 is used as a non-volatile computer readable storage medium for storing non-volatile software programs, non-volatile computer executable programs, and modules, such as the method and system of packet contrast database data in embodiments 1-2. The processor 21 performs various functional applications and data processing of the apparatus for packet comparison database data, that is, implements the methods of packet comparison database data of embodiments 1 to 2, by running nonvolatile software programs, instructions, and modules stored in the memory 22.
The memory 22 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 22 may optionally include memory located remotely from processor 21, which may be connected to processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules are stored in the memory 22 and when executed by the one or more processors 21 perform the method of packet contrast database data in embodiments 1-2 described above, for example, performing the various steps shown in fig. 1-3 described above.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. Technical details not described in detail in this embodiment may be found in the methods provided in the embodiments of the present application.
It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, or may be implemented by hardware. Those skilled in the art will appreciate that all or part of the processes implementing the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the program may include processes of the embodiments of the methods described above when executed. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in details for the sake of brevity; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. A method of packet comparison database data, comprising:
constructing ROWID mapping relation tables of a left table and a right table;
calculating the number of the groups according to the number of rows of the left table and the right table and the size of a memory which can be provided by the current system;
acquiring the ROWID mapping relation of each group from the ROWID mapping relation table through the number of rows of the left table and the number of groups, and constructing the data query condition of each group based on the ROWID mapping relation;
according to the data query conditions of the groups, extracting the preset number of group data in parallel for comparison;
and after the data of a certain group is compared, sequentially extracting the data of the next group for comparison until the comparison of all the groups is completed, and generating a comparison report according to the residual data in the memory.
2. The method for grouping comparison database data according to claim 1, wherein the constructing the ROWID mapping tables of the left table and the right table specifically comprises:
acquiring source library data by taking the ROWID as an additional query item of the source library data, and transmitting the source library data carrying the ROWID to a target library to perform data migration; wherein, the table of the source library is a left table, and the table of the target library is a right table;
when X lines of data are arranged, constructing a ROWID mapping relation by using a left table line number currently extracted by a left table, a left table ROWID corresponding to the left table line number and a right table ROWID returned by the line data in a target library;
after the data migration is completed, forming the ROWID mapping relation of the K left tables and the K right tables, namely constructing the ROWID mapping relation tables of the left tables and the right tables.
3. The method for grouping and comparing database data according to claim 2, wherein when data migration is performed, counting the number of rows migrated from the left table and the number of rows successfully stored in the right table, wherein the number of rows migrated from the left table is the number of rows of the left table, and the number of rows successfully stored in the right table is the number of rows of the right table.
4. The method for comparing database data according to claim 1, wherein calculating the number of packets by the number of rows of the left table and the right table and the memory size available to the current system comprises:
the number of packets n= ((l+r) ×z)/((M- (L-R) ×z)/G); the number of the groups N is rounded upwards, L is the number of rows of the left table, R is the number of rows of the right table, M is the memory size used for comparison, G is the preset number of the groups which can be extracted simultaneously in parallel, and Z is the memory size occupied by single-row data in a comparison algorithm.
5. The method of packet contrast database data of claim 4, wherein the comparison is terminated when the value of (L-R) x Z approaches, equals or exceeds the value of M.
6. The method for comparing database data with packets according to claim 1, wherein the step of obtaining the ROWID mapping relation of each packet from the ROWID mapping relation table by the number of rows of the left table and the number of packets, and constructing the data query condition of each packet based on the ROWID mapping relation specifically comprises:
splitting the line number L of the left table into N groups based on the number N of the groups;
acquiring the initial line number of each packet;
searching the nearest left table row number in the ROWID mapping relation table through the initial row number, thereby obtaining the ROWID mapping relation corresponding to each group;
and each packet combines the ROWID mapping relation of the packet and the ROWID mapping relation of the upper and lower adjacent packets to construct a left table condition and a right table condition of the data query condition of each packet.
7. The method for comparing database data according to claim 5, wherein the constructing the left table condition and the right table condition of the data query condition of each packet by combining the respective ROWID mapping relationship and the ROWID mapping relationship of the upper and lower adjacent packets comprises:
when a packet has only a packet adjacent downward, the left table condition of the data query condition of the packet is: the ROWID is smaller than or equal to the left table ROWID in the ROWID mapping relation of the self-grouping; the right table conditions of the data query conditions of the packet are: the ROWID is smaller than or equal to the right table ROWID in the ROWID mapping relation of the self-grouping;
when a packet has an upwardly adjacent packet and a downwardly adjacent packet, the left table condition of the data query condition of the packet is: the ROWID is larger than the left table ROWID in the ROWID mapping relation of the upward adjacent packet, and the ROWID is smaller than or equal to the left table ROWID in the ROWID mapping relation of the self packet; the right table conditions of the data query conditions of the packet are: the ROWID is larger than the right table ROWID in the ROWID mapping relation of the upward adjacent packet, and the ROWID is smaller than or equal to the right table ROWID in the ROWID mapping relation of the self packet;
when a packet has only an upwardly adjacent packet, the left table condition of the data query condition of the packet is: the ROWID is larger than the left table ROWID in the ROWID mapping relation of the upward adjacent packet; the right table conditions of the data query conditions of the packet are: the ROWID is greater than the right table ROWID in the ROWID mapping of the upward neighbor packet.
8. The method for comparing database data according to claim 1, wherein the parallel extraction of the preset number of packet data according to the data query condition of the packet for comparison specifically comprises:
according to the left table condition and the right table condition of the data query conditions of the groups, G groups of connections are created at the same time, the data of the left table and the right table in the G groups are respectively extracted according to the grouping sequence for comparison, the same data collision is eliminated, and different data are reserved in a memory; wherein G is the preset number of packets which can be extracted simultaneously in parallel.
9. The method of grouping contrast database data according to claim 8, wherein for a created G group of connections, each group of connections includes two connections that connect the source and target libraries, respectively, for data extraction, the grouping is marked as contrast complete when both connections complete data extraction.
10. An apparatus for packet comparison of database data, characterized by:
comprising at least one processor and a memory connected by a data bus, the memory storing instructions for execution by the at least one processor, the instructions, when executed by the processor, for performing the method of packet contrast database data of any of claims 1-9.
CN202311826398.XA 2023-12-27 2023-12-27 Method and equipment for grouping comparison database data Pending CN117762996A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311826398.XA CN117762996A (en) 2023-12-27 2023-12-27 Method and equipment for grouping comparison database data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311826398.XA CN117762996A (en) 2023-12-27 2023-12-27 Method and equipment for grouping comparison database data

Publications (1)

Publication Number Publication Date
CN117762996A true CN117762996A (en) 2024-03-26

Family

ID=90325565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311826398.XA Pending CN117762996A (en) 2023-12-27 2023-12-27 Method and equipment for grouping comparison database data

Country Status (1)

Country Link
CN (1) CN117762996A (en)

Similar Documents

Publication Publication Date Title
CN107688999B (en) Block chain-based parallel transaction execution method
US10348827B2 (en) Distributed storage system, cluster node and range management method thereof
CN108564470B (en) Transaction distribution method for parallel building blocks in block chain
CN108984789B (en) Distributed accounting method and device, storage medium and electronic equipment
CN110147407B (en) Data processing method and device and database management server
CN105988862A (en) Distributed transaction processing method and device
CN111563093A (en) Detection and avoidance system and method for union block chain conflict transaction
CN109063005B (en) Data migration method and system, storage medium and electronic device
CN110570311B (en) Block chain consensus method, device and equipment
CN114116665B (en) Method for writing transaction log in parallel in database to promote processing efficiency
CN112256656A (en) Transaction rollback method and device, database, system and computer storage medium
CN114328133A (en) Single-mechanism distributed conflict detection method and system and deposit separation framework
CN113064764A (en) Method and apparatus for performing blocks in a blockchain system
CN112559629B (en) Large object initialization method and device based on log analysis synchronization
CN115988001A (en) Consensus voting processing method, device, equipment and medium for block chain system
CN111522648B (en) Transaction processing method and device for block chain and electronic equipment
CN117762996A (en) Method and equipment for grouping comparison database data
CN115665174B (en) Gradient data synchronization method, system, equipment and storage medium
US11151157B2 (en) Database management method
CN110515939A (en) A kind of multi-column data sort method based on GPU
CN107203550B (en) Data processing method and database server
CN106599326B (en) Recorded data duplication eliminating processing method and system under cloud architecture
CN111522873B (en) Block generation method, device, computer equipment and storage medium
US20180101543A1 (en) Distributed storage server, server device included therein, and method of operating server device
RU2490702C1 (en) Method of accelerating processing of multiple select-type request to rdf database using graphics processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination