CN115952172A

CN115952172A - Data matching method and device based on temporary table of database

Info

Publication number: CN115952172A
Application number: CN202310215543.4A
Authority: CN
Inventors: 王龙强
Original assignee: CHANJET INFORMATION TECHNOLOGY CO LTD
Current assignee: CHANJET INFORMATION TECHNOLOGY CO LTD
Priority date: 2023-03-08
Filing date: 2023-03-08
Publication date: 2023-04-11
Anticipated expiration: 2043-03-08
Also published as: CN115952172B

Abstract

The invention provides a data matching method and a device based on a temporary table of a database, which relate to the technical field of database processing, and the method comprises the following steps: the method comprises the steps of obtaining a first data table of a source database, a second data table of a target database and a data matching range; and a judging step, namely judging whether the data quantity of the first data table and the data quantity of the second data table in the data matching range are both larger than a first threshold value, if not, directly matching the data of the first data table in the data matching range with the data of the second data table in the data matching range to obtain a matching result, if so, establishing a first temporary data table in the source database, establishing a second temporary data table and a third temporary data table in the target database, and matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain the matching result. The invention improves the data matching efficiency and the data security.

Description

Data matching method and device based on temporary table of database

Technical Field

The invention relates to the technical field of database processing, in particular to a data matching method and device based on a database temporary table.

Background

In the prior art, any tables in the two databases are compared one by one, which has the disadvantages of slow speed, large memory occupation and the like, that is, in the prior art, a circular comparison mode is generally adopted for data matching.

In addition, in the data processing process, many current service scenarios depend on the functions of data synchronization and subscription, the links are different in length, if some of the environments lose part of data, the service function may be unavailable, and a means for verifying the accuracy of the data content is absent, so that the current problems mainly include:

1. the observability of the synchronous task does not have uniform log information to be traceable, and the change data is more, so that all data change logs cannot be stored.

2. Data loss is often the cause of problem re-finding, and sometimes, data loss is relatively passive and lacks of monitoring and early warning.

3. The complexity of the link, the number of synchronous data tables is large, and the data volume is large, and no scheme can compare data differences between the tables efficiently and accurately at present.

Therefore, it is a technical challenge how to perform efficient and safe matching of data tables, and in the data matching process, the performance of the system needs to be minimally affected.

Disclosure of Invention

The present invention proposes the following technical solutions to address one or more technical defects in the prior art.

A data matching method based on a temporary table of a database comprises the following steps:

an acquisition step, namely acquiring a first data table of a source database, a second data table of a target database and a data matching range;

judging, namely judging whether the data volumes of the first data table and the second data table in the data matching range are both larger than a first threshold, if not, directly matching the data of the first data table in the data matching range with the data of the second data table in the data matching range to obtain a matching result, and if so, performing temporary table matching;

and a temporary table matching step, namely establishing a first temporary data table in the source database, establishing a second temporary data table and a third temporary data table in the target database, and matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result.

Further, the operation of matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain the matching result is as follows: and calculating the MD5 value of the data of the first data table in the data matching range, storing the MD5 value of the data of the second data table in the data matching range in a first temporary data table, storing the MD5 value in the second temporary data table, inserting the MD5 value in the first temporary data table into a third temporary data table, and performing left linking or inner linking on the second temporary data table and the third temporary data table to obtain a matching result.

Still further, the matching result includes at least one of: the data in the first data table and the same data in the second data table, the data in the first data table and the different data in the second data table, the data in the second data table which is missing than the data in the first data table and the data in the second data table which is more than the data in the first data table.

Furthermore, after the matching is completed, a diff thread, a missing thread, an extra thread and a match thread are initialized, the diff thread is used for outputting data which is not in the first data table and the second data table, the missing thread is used for outputting data which is missing in the second data table and the first data table, the extra thread is used for outputting data which is more than in the second data table and the first data table, and the match thread is used for outputting the same data in the first data table and the second data table.

Furthermore, for the diff thread, missing thread, extra thread and match thread, initializing the corresponding diff queue, missing queue, extra queue and match queue in the memory to implement the relationship pool between producer and consumer, the memory size of the relationship Chi Suozhan is:

if the number of the first and second antennas is greater than the predetermined number,

greater than or equal to>

，

Then

；

If not, then,

；

wherein ,

=1, 2, 3, 4 denotes a diff queue, missing queue, extra queue and match queue, =1, 2, 3, 4 respectively>

Represents the memory size occupied by the corresponding queue realization producer and consumer relation pool, and/or is selected>

Indicates the amount of data generated by the corresponding queue per unit of time, and>

represents the amount of data consumed by the corresponding queue unit of time, based on the value of the queue>

Indicates the total amount of data that the corresponding queue needs to output, <' > based on the status of the queue>

Representing the total time required for the total amount of data output by the corresponding queue.

The invention also provides a data matching device based on the temporary table of the database, which comprises:

the acquisition unit is used for acquiring a first data table of a source database, a second data table of a target database and a data matching range;

the judging unit is used for judging whether the data quantity of the first data table and the second data table in the data matching range is larger than a first threshold value, if not, directly matching the data of the first data table in the data matching range with the data of the second data table in the data matching range to obtain a matching result, and if so, performing temporary table matching;

and the temporary table matching unit is used for establishing a first temporary data table in the source database, establishing a second temporary data table and a third temporary data table in the target database, and matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result.

Further, the operation of matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain the matching result is as follows: and calculating the MD5 value of the data of the first data table in the data matching range, storing the data in the first temporary data table, calculating the MD5 value of the data of the second data table in the data matching range, storing the data in the second temporary data table, inserting the MD5 value in the first temporary data table into a third temporary data table, and performing left linking or inner linking on the second temporary data table and the third temporary data table to obtain a matching result.

Still further, the matching result includes at least one of: the data in the first data table and the second data table are the same, the data in the first data table and the second data table are different, the second data table is lack of data in the first data table, and the second data table is more data than the first data table.

Further, for the diff thread, missing thread, extra thread and match thread initializing corresponding diff queue, missing queue, extra queue and match queue in the memory for realizing the relationship pool between producer and consumer, the relationship Chi Suozhan memory size is:

greater than or equal to>

，

Then

；

If not, then,

；

wherein ,

=1, 2, 3, 4 denotes diff queue, missing queue, extra queue and match queue, = 4 denotes>

represents the amount of data consumed in a corresponding queue unit of time, based on the number of elapsed time units in the queue>

The present invention also proposes a computer-readable storage medium having stored thereon computer program code which, when executed by a computer, performs any of the methods described above.

The invention has the technical effects that: the invention discloses a data matching method, a device and a storage medium based on a temporary table of a database, wherein the method comprises the following steps: an acquisition step, namely acquiring a first data table of a source database, a second data table of a target database and a data matching range; judging, namely judging whether the data volumes of the first data table and the second data table in the data matching range are both larger than a first threshold, if not, directly matching the data of the first data table in the data matching range with the data of the second data table in the data matching range to obtain a matching result, and if so, performing temporary table matching; and a temporary table matching step, namely establishing a first temporary data table in the source database, establishing a second temporary data table and a third temporary data table in the target database, and matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result. In the invention, the matching range of the data is appointed by a user, so that the matching of all data in all data tables is avoided, the calculated amount during data matching is reduced, and the data matching efficiency is improved. The space is saved, the temporary table can be automatically dropped after the client exits the session, and no data information occupies the space of the database; privacy, the client establishes a temporary table to serve only specific affairs, and the table has special use and privacy and does not need to be shared with other affairs; the invention has high efficiency, the temporary table established by the client has independent operation and read-write performance, therefore, the processing speed and the processing efficiency are higher, in the invention, the MD5 value of the data in the corresponding data matching range of the first and the second data tables is calculated and written into the first and the second temporary data tables, and the MD5 value in the first temporary data table of the source database (namely, a source end) is inserted into the third temporary data table on the target database (namely, a target segment), and the MD5 value matching in the second and the third temporary tables is carried out on the target data, thereby completing the matching of the data in the corresponding data matching range of the first and the second data tables.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.

Fig. 1 is a flowchart of a data matching method based on a temporary table of a database according to an embodiment of the present invention.

Fig. 2 is a block diagram of a data matching apparatus based on a temporary table of a database according to an embodiment of the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 shows a database temporary table-based data matching method of the present invention, which includes:

in the obtaining step S101, a first data table of a source database and a second data table of a target database and a data matching range are obtained, where the data matching range may be two data tables with the same table name specified by a user, or two data tables with different table names, or rows or columns in the two data tables, for example, a 5 th row in the first data table matches a 7 th row in the second data table, or a 3 rd column in the first data table matches a 7 th column in the second data table.

A judging step S102, judging whether the data quantity of the first data table and the data quantity of the second data table in the data matching range are both larger than a first threshold value, if not, directly matching the data of the first data table in the data matching range with the data of the second data table in the data matching range to obtain a matching result, and if so, performing temporary table matching;

a temporary table matching step S103, which is to establish a first temporary data table in the source database, establish a second temporary data table and a third temporary data table in the target database, and perform matching based on the first temporary data table, the second temporary data table, and the third temporary data table to obtain a matching result.

The method comprises the steps of firstly obtaining a first data table of a source database and a second data table of a target database and a data matching range, then judging whether the data quantity of the first data table and the data quantity of the second data table in the data matching range are both larger than a first threshold value, if not, directly matching the data of the first data table in the data matching range with the data of the second data table in the data matching range to obtain a matching result, if so, establishing a first temporary data table in the source database, establishing a second temporary data table and a third temporary data table in the target database, and matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain the matching result. In the invention, the matching range of the data is specified by the user, so that the matching of all data in all data tables is avoided, the calculation amount during data matching is reduced, the data matching efficiency is improved, and the matching range of the data can be set by the user through a GUI (graphical user interface), a command line and the like. In the invention, the matching is directly carried out when the data volume is small, the matching is carried out based on the temporary table when the data volume is large, and the calculation matching is carried out based on the MD5 value when the temporary table is matched, so that the following technical effects are achieved due to the adoption of the temporary matching: the space is saved, the temporary table can be automatically dropped after the client exits the session, and no data information occupies the space of the database; privacy, the client establishes a temporary table to serve only specific affairs, and the table has special use and privacy and does not need to be shared with other affairs; the efficiency is high, the temporary table established by the client has independent operation and read-write performance, so the processing speed and the processing efficiency are higher, which is another important invention point of the invention.

In a further embodiment, the first temporary data table, the second temporary data table and the third temporary data table are set in the memory to be only accessible by the corresponding process which creates them, and other processes cannot access them, that is, the client (the source database, the client of the target database) creates the temporary table to serve only a specific transaction, and the table has special purpose and privacy, and does not need to be shared with other transactions, so that the security of data is improved, which is another important invention point of the present invention.

In a further embodiment, the operation of matching based on the first temporary data table, the second temporary data table, and the third temporary data table to obtain the matching result is: and calculating the MD5 value of the data of the first data table in the data matching range, storing the MD5 value of the data of the second data table in the data matching range in a first temporary data table, inserting the MD5 value in the first temporary data table into a third temporary data table, and performing left linking, inner linking or right linking on the second temporary data table and the third temporary data table to obtain a matching result.

In the invention, the MD5 values of the data in the corresponding data matching ranges of the first and second data tables are calculated and then written into the first and second temporary data tables, the MD5 value in the first temporary data table of the source database (namely, the source end) is inserted into the third temporary data table on the target database (namely, the target section), and the MD5 values in the second and third temporary tables are matched with the target data, so that the matching of the data in the corresponding data matching ranges of the first and second data tables is completed.

In a further embodiment, the matching result comprises at least one of: the data in the first data table and the second data table are the same, the data in the first data table and the second data table are different, the second data table is lack of data in the first data table, and the second data table is more data than the first data table. Based on these matching results, data synchronization between the source end and the target end can be performed.

In a further embodiment, after the matching is completed, a diff thread, a missing thread, an extra thread and a match thread are initialized, the diff thread is used for outputting data which is not identical in the first data table and the second data table, the missing thread is used for outputting data which is missing in the second data table than in the first data table, the extra thread is used for outputting data which is more abundant in the second data table than in the first data table, and the match thread is used for outputting data which is identical in the first data table and the second data table. In the invention, the corresponding threads are initialized and can run in parallel, thereby realizing the output of different matching data results and improving the data output efficiency, which is another important invention point of the invention.

In a further embodiment, for the diff, missing, extra, and match threads, initializing corresponding diff, missing, extra, and match queues in memory for implementing a producer and consumer relationship pool, the relationship Chi Suozhan memory size is:

if the number of the first and second antennas is less than the predetermined number,

greater than or equal to>

，/>

Then

；

If not, then,

；

wherein ,

In order to prevent data loss during output, the invention simulates a producer consumer working mode through the initialized corresponding queue so as to achieve the technical effects of data peak clipping, valley filling and decoupling.

Fig. 2 shows a database temporary table-based data matching apparatus according to the present invention, which includes:

the obtaining unit 201 obtains a first data table of a source database and a second data table of a target database, and a data matching range, where the data matching range may be two data tables with the same table name specified by a user, or two data tables with different table names, or rows or columns in the two data tables, for example, a row 5 in the first data table matches a row 7 in the second data table, or a column 3 in the first data table matches a column 7 in the second data table.

The judging unit 202 is configured to judge whether the data amounts of the first data table and the second data table in the data matching range are both greater than a first threshold, if not, directly match the data of the first data table in the data matching range with the data of the second data table in the data matching range to obtain a matching result, and if so, perform temporary table matching;

the temporary table matching unit 203 establishes a first temporary data table in the source database, establishes a second temporary data table and a third temporary data table in the target database, and performs matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain a matching result.

The method comprises the steps of firstly obtaining a first data table of a source database and a second data table of a target database and a data matching range, then judging whether the data quantity of the first data table and the data quantity of the second data table in the data matching range are both larger than a first threshold value, if not, directly matching the data of the first data table in the data matching range with the data of the second data table in the data matching range to obtain a matching result, if so, establishing a first temporary data table in the source database, establishing a second temporary data table and a third temporary data table in the target database, and matching based on the first temporary data table, the second temporary data table and the third temporary data table to obtain the matching result. In the invention, the matching range of the data is specified by the user, so that the matching of all data in all data tables is avoided, the calculation amount during data matching is reduced, the data matching efficiency is improved, and the matching range of the data can be set by the user through a GUI (graphical user interface), a command line and the like. In the invention, the matching is directly carried out when the data volume is small, the matching is carried out based on the temporary table when the data volume is large, and the calculation matching is carried out based on the MD5 value when the temporary table is matched, so that the following technical effects are achieved due to the adoption of the temporary matching: the space is saved, the temporary table can be automatically dropped after the client exits the session, and no data information occupies the space of the database; privacy, the client establishes a temporary table to serve only specific affairs, and the table has special use and privacy and does not need to be shared with other affairs; the efficiency is high, and the temporary table established by the client has independent operation and read-write performance, so the processing speed and the processing efficiency are higher, which is another important invention point of the invention.

In a further embodiment, the matching result comprises at least one of: the data in the first data table and the same data in the second data table, the data in the first data table and the different data in the second data table, the data in the second data table which is missing than the data in the first data table and the data in the second data table which is more than the data in the first data table. Based on these matching results, data synchronization between the source end and the target end can be performed.

greater than or greater than>

，

Then the

；/>

If not, then the mobile terminal can be switched to the normal mode,

；

wherein ,

Indicates that the corresponding queue realizes the size of the memory occupied by the relationship pool of producer and consumer, and/or the device>

Represents the total amount of data that the corresponding queue needs to output, based on the data size of the queue>

In order to prevent data loss during output, the invention simulates a producer consumer working mode through the initialized corresponding queues so as to achieve the technical effects of data peak clipping, valley filling and decoupling.

An embodiment of the present invention provides a computer storage medium, on which a computer program is stored, which when executed by a processor implements the above-mentioned method, and the computer storage medium can be a hard disk, a DVD, a CD, a flash memory, or the like.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more pieces of software and/or hardware in the practice of the present application.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present application or portions thereof contributing to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the apparatuses according to the embodiments or some parts of the embodiments of the present application.

Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention and it is intended to cover in the claims the invention as defined in the appended claims.

Claims

1. A data matching method based on a temporary table of a database is characterized by comprising the following steps:

the method comprises the steps of obtaining a first data table of a source database, a second data table of a target database and a data matching range;

2. The method according to claim 1, wherein the operation of matching based on the first temporary data table, the second temporary data table, and the third temporary data table to obtain a matching result is: and calculating the MD5 value of the data of the first data table in the data matching range, storing the MD5 value in a first temporary data table, calculating the MD5 value of the data of the second data table in the data matching range, storing the MD5 value in a second temporary data table, inserting the MD5 value in the first temporary data table into a third temporary data table, and performing left linking or inner linking on the second temporary data table and the third temporary data table to obtain a matching result.

3. The method of claim 2, wherein the matching result comprises at least one of: the data in the first data table and the same data in the second data table, the data in the first data table and the different data in the second data table, the data in the second data table which is missing than the data in the first data table and the data in the second data table which is more than the data in the first data table.

4. The method of claim 3, wherein after the matching is completed, a diff thread, a missing thread, an extra thread and a match thread are initialized, the diff thread is used for outputting data in the first data table which is different from that in the second data table, the missing thread is used for outputting data in the second data table which is missing from the first data table, the extra thread is used for outputting data in the second data table which is more than that in the first data table, and the match thread is used for outputting data in the first data table which is the same as that in the second data table.

5. The method of claim 4, wherein for the diff, missing, extra, and match threads initializing corresponding diff, missing, extra, and match queues in memory for implementing producer and consumer relationship pools, the relationship Chi Suozhan memory size is:

greater than or equal to>

，

Then

；

If not, then,

；

wherein ,

Representing the total time required for the total amount of data output by the corresponding queue. />

6. A database temporary table based data matching device, comprising:

the judging unit is used for judging whether the data quantity of the first data table and the data quantity of the second data table in the data matching range are both larger than a first threshold value, if not, the data of the first data table in the data matching range are directly matched with the data of the second data table in the data matching range to obtain a matching result, and if so, temporary table matching is carried out;

7. The apparatus of claim 6, wherein the matching based on the first temporary data table, the second temporary data table, and the third temporary data table is performed by: and calculating the MD5 value of the data of the first data table in the data matching range, storing the MD5 value in a first temporary data table, calculating the MD5 value of the data of the second data table in the data matching range, storing the MD5 value in a second temporary data table, inserting the MD5 value in the first temporary data table into a third temporary data table, and performing left linking or inner linking on the second temporary data table and the third temporary data table to obtain a matching result.

8. The apparatus of claim 7, wherein the matching result comprises at least one of: the data in the first data table and the same data in the second data table, the data in the first data table and the different data in the second data table, the data in the second data table which is missing than the data in the first data table and the data in the second data table which is more than the data in the first data table.

9. The apparatus of claim 8, wherein after the matching is completed, a diff thread, a missing thread, an extra thread and a match thread are initialized, the diff thread is used for outputting data in the first data table which is different from that in the second data table, the missing thread is used for outputting data in the second data table which is missing from the first data table, the extra thread is used for outputting data in the second data table which is more than that in the first data table, and the match thread is used for outputting data in the first data table which is the same as that in the second data table.

10. The apparatus of claim 9, wherein for the diff, missing, extra, and match threads to initialize corresponding diff, missing, extra, and match queues in memory for implementing a producer and consumer relationship pool, the relationship Chi Suozhan memory size is:

greater than or equal to>

，

Then the

；

If not, then,

；

wherein ,

Represents the amount of data generated in the corresponding queue unit of time, based on the queue status of the queue>