CN114116724A - Data verification method, device and equipment and readable storage medium - Google Patents

Data verification method, device and equipment and readable storage medium Download PDF

Info

Publication number
CN114116724A
CN114116724A CN202111446609.8A CN202111446609A CN114116724A CN 114116724 A CN114116724 A CN 114116724A CN 202111446609 A CN202111446609 A CN 202111446609A CN 114116724 A CN114116724 A CN 114116724A
Authority
CN
China
Prior art keywords
source
partition
data
target
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111446609.8A
Other languages
Chinese (zh)
Inventor
陈双琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111446609.8A priority Critical patent/CN114116724A/en
Publication of CN114116724A publication Critical patent/CN114116724A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application discloses a data verification method, a device, equipment and a readable storage medium, and relates to the field of artificial intelligence and medical treatment, wherein the method comprises the following steps: acquiring a source data table and a target data table, and determining the table types of the source data table and the target data table; if the table type of the source data table and the table type of the target data table are both full tables, respectively determining whether the data volume in the source data table and the data volume in the target data table are greater than a data volume threshold value; if so, performing partition processing on the source data table to obtain a source partition table, and performing partition processing on the target data table to obtain a target partition table; checking based on the source partition field in the source partition table and the target partition field in the target partition table; and if the data check between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table. By adopting the embodiment of the application, the data verification efficiency can be improved.

Description

Data verification method, device and equipment and readable storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a data verification method, apparatus, device, and readable storage medium.
Background
The data verification is an important quality guarantee means in the field of big data, and in order to guarantee the validity of data in the data cleaning and processing process and the accuracy of data transferred to a downstream system before massive data scale, rapid data transfer and various data types, the data verification method can be used for rapidly testing the accuracy of the data, has great application in the big data industry, can help a warehouse system to guarantee the validity of massive data, and improves the reliability of a data analysis result of the downstream system. As in the medical field, data verification is required for a large amount of medical data.
In the prior art, when data verification faces massive and complex data resources, each field in different tables needs to be compared and verified manually, so that the data verification efficiency is low.
Disclosure of Invention
The embodiment of the application provides a data verification method, a data verification device, data verification equipment and a readable storage medium, and the data verification efficiency can be improved.
In a first aspect, the present application provides a data verification method, including:
acquiring a source data table and a target data table, and determining the table type of the source data table and the table type of the target data table;
if the table type of the source data table and the table type of the target data table are both full tables, respectively determining whether the data volume in the source data table and the data volume in the target data table are greater than a data volume threshold value;
if the data volume in the source data table and the data volume in the target data table are both larger than the data volume threshold value, performing partition processing on the source data table to obtain at least one source partition table, and performing partition processing on the target data table to obtain at least one target partition table;
performing data verification based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table;
and if the data check between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table, wherein the corresponding target partition table is the partition table corresponding to any source partition table in the at least one target partition table.
In a second aspect, the present application provides a data verification apparatus, including:
the data acquisition module is used for acquiring a source data table and a target data table and determining the table type of the source data table and the table type of the target data table;
the quantity determining module is used for respectively determining whether the data quantity in the source data table and the data quantity in the target data table are larger than a data quantity threshold value if the table type of the source data table and the table type of the target data table are both full tables;
the partition processing module is used for performing partition processing on the source data table to obtain at least one source partition table and performing partition processing on the target data table to obtain at least one target partition table if the data volume in the source data table and the data volume in the target data table are both greater than the data volume threshold;
the data checking module is used for checking data based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table;
and the result determining module is used for determining that the source data table is inconsistent with the target data table if the data check between the source partition field in any one source partition table and the target partition field in the corresponding target partition table is not passed, and the corresponding target partition table is the partition table corresponding to any one source partition table in the at least one target partition table.
With reference to the second aspect, in a possible implementation manner, the partition processing module is specifically configured to:
determining an equal division rule aiming at the source data table based on the data volume in the source data table, and dividing the source data table by adopting the equal division rule to obtain at least one source partition table; alternatively, the first and second electrodes may be,
determining a field division rule for the source data table based on the field type of the field in the source data table, and dividing the source data table by adopting the field division rule to obtain the at least one source partition table, wherein the field division rule indicates that the preset field type is included.
With reference to the second aspect, in a possible implementation manner, the data check includes a field check and a field format check; the data verification module includes:
the first sampling unit is used for sampling the source partition field to obtain at least one source sampling field;
the second sampling unit is used for sampling the target partition field to obtain at least one target sampling field;
a field check unit for performing field check on the at least one source sample field and the at least one target sample field;
the format checking unit is used for carrying out field format checking on the field format of the at least one source sampling field and the field format of the at least one target sampling field if the field checking passes;
and the area checking unit is used for performing data checking on the residual source partition fields in the at least one source partition table and the residual target partition fields in the at least one target partition table if the field format checking passes.
With reference to the second aspect, in a possible implementation manner, the result determining module is specifically configured to:
if the source partition field in any source partition table is not matched with the target partition field in the corresponding target partition table, determining that the source data table is inconsistent with the target data table; alternatively, the first and second electrodes may be,
and if each source partition field in the at least one source partition table is matched with a target partition field in the at least one target partition table, and the field format of one or more source partition fields in the at least one source partition table is not matched with the target partition field format in the at least one target partition table, determining that the source data table is inconsistent with the target data table.
With reference to the second aspect, in a possible implementation manner, the data checking apparatus further includes: a second check module, configured to perform field check on each source field in the source data table and each target field in the target data table if both the data amount in the source data table and the data amount in the target data table are smaller than or equal to the data amount threshold;
if the field check is passed, carrying out field format check on the field format of each source field in the source data table and the field format of each target field in the target data table;
and if the field formats of one or more source fields in the source data table are not matched with the field formats of the target fields in the target data table, determining that the source data table is inconsistent with the target data table.
With reference to the second aspect, in a possible implementation manner, the data checking apparatus further includes: a third verification module to:
if the form type of the source data table and the form type of the target data table are both incremental forms, performing field verification on each source field in the source data table and each target field in the target data table;
if the field check is passed, carrying out field format check on the field format of each source field in the source data table and the field format of each target field in the target data table;
and if the field formats of one or more source fields in the source data table are not matched with the field formats of the target fields in the target data table, determining that the source data table is inconsistent with the target data table.
With reference to the second aspect, in a possible implementation manner, the data checking apparatus further includes: a hierarchy acquisition module to:
acquiring at least one intermediate level data table between the source data table and the target data table, wherein the at least one intermediate level data table comprises a first level data table and a second level data table, the first level data table is obtained by performing data extraction processing on the source data table, the second level data table is obtained by performing data cleaning processing on the first level data table, and the target data table is obtained by performing logic processing on the second level data table;
partitioning the first-level fields in the first-level data table to obtain at least one first partition table;
partitioning a second level field in the second level data table to obtain at least one second partition table;
the result determination module is specifically configured to:
if the data check between the source partition field and the first partition field is not passed, determining that the source data table is inconsistent with the target data table; alternatively, the first and second electrodes may be,
if the data check between the source partition field and the first partition field passes, performing data check on the first partition field in the at least one first partition table and the second partition field in the at least one second partition table;
if the data verification between the first partition field and the second partition field is not passed, determining that the source data table is inconsistent with the target data table; alternatively, the first and second electrodes may be,
and if the data check between the first partition field and the second partition field passes, performing data check on the second partition field in the at least one second partition table and the target partition field in the at least one target partition table, and if the data check between the second partition field and the target partition field does not pass, determining that the source data table and the target data table are inconsistent, wherein the data check comprises field check and field format check.
In a third aspect, the present application provides a computer device comprising: a processor, a memory, a network interface;
the processor is connected with a memory and a network interface, wherein the network interface is used for providing a data communication function, the memory is used for storing a computer program, and the processor is used for calling the computer program so as to enable a computer device comprising the processor to execute the data verification method.
In a fourth aspect, the present application provides a computer-readable storage medium having stored therein a computer program adapted to be loaded and executed by a processor, so as to cause a computer device having the processor to execute the above-mentioned data verification method.
In a fifth aspect, the present application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the data verification method provided in the various alternatives in the first aspect of the present application.
In the embodiment of the application, the table type of a source data table and the table type of a target data table are determined by acquiring the source data table and the target data table; if the table type of the source data table and the table type of the target data table are both full tables, respectively determining whether the data volume in the source data table and the data volume in the target data table are greater than a data volume threshold value; if the data volume in the source data table and the data volume in the target data table are both larger than the data volume threshold value, performing partition processing on the source data table to obtain at least one source partition table, and performing partition processing on the target data table to obtain at least one target partition table; and performing data verification based on the source partition field in at least one source partition table and the target partition field in at least one target partition table, and if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table. The table type of the data table is predetermined, and when the table type of the data table is determined to be a full table and the data amount in the data table is greater than the data amount threshold value, the data in the data table is partitioned to obtain a plurality of partition tables, the data in each partition table can be verified, and whether the data in the source partition table and the data in the target partition table are consistent or not can be determined. If the data in a certain source partition table is determined to be inconsistent with the data in the target partition table, the inconsistency of the source data table and the target data table can be determined, the whole source data table and the whole target data table do not need to be checked, and the data checking efficiency can be improved. In addition, by automatically checking the data in the source data table and the target data table, the fields in the source data table and the target data table do not need to be manually compared one by one, so that the data checking efficiency can be improved, and the data checking accuracy can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flowchart of a data verification method provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of another data verification method provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a data verification apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The technical scheme is suitable for consistency check of data in the source data table and the target data table, and therefore whether the source data table and the target data table have consistency or not is determined. The source data table and the target data table may refer to data tables related to medical fields, such as doctor-patient data tables, and may also be data tables related to other fields. Determining the table type of the source data table and the table type of the target data table by acquiring the source data table and the target data table; if the table type of the source data table and the table type of the target data table are both full tables, respectively determining whether the data volume in the source data table and the data volume in the target data table are greater than a data volume threshold value; if the data volume in the data table and the data volume in the target data table are both larger than the data volume threshold value, performing partition processing on the source data table to obtain at least one source partition table, and performing partition processing on the target data table to obtain at least one target partition table; and performing data verification based on the source partition field in at least one source partition table and the target partition field in at least one target partition table, and if the data verification between the source partition field in any source partition table and the target partition field in the target partition table corresponding to the source partition table in at least one target partition table is not passed, determining that the source data table is inconsistent with the target data table. The table type of the data table is determined in advance, when the table type of the data table is determined to be a full table and the data amount in the data table is larger than the data amount threshold value, the data in the data table is partitioned to obtain a plurality of partition tables, the data in each partition table can be verified, and whether the data in the source partition table and the data in the target partition table are consistent or not can be determined. If the data in the partition table is determined to be inconsistent, the source data table and the target data table can be determined to be inconsistent, the whole source data table and the whole target data table do not need to be checked, and the data checking efficiency can be improved. In addition, by automatically checking the data in the source data table and the target data table, the fields in the source data table and the target data table do not need to be manually compared one by one, so that the data checking efficiency can be improved, and the data checking accuracy can be improved.
Referring to fig. 1, fig. 1 is a schematic flow chart of a data verification method provided in an embodiment of the present application, where the data verification method can be applied to a computer device. The computer device may be an electronic device, including but not limited to a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm computer, a vehicle-mounted device, an Augmented Reality/Virtual Reality (AR/VR) device, a helmet display, a wearable device, a smart speaker, a digital camera, a camera, and other Mobile Internet Devices (MID) having a network access capability; the method can also refer to an independent server, a server cluster consisting of a plurality of servers, or a cloud computing center. As shown in fig. 1, the data verification method includes, but is not limited to, the following steps:
s101, acquiring a source data table and a target data table, and determining a table type of the source data table and a table type of the target data table.
In the embodiment of the application, since the consistency between the source data table and the target data table needs to be checked, the data in the source data table and the data in the target data table need to be checked, and whether the data in the source data table and the data in the target data table are consistent or not is determined, so that whether the source data table and the target data table are consistent or not is determined. The computer device may obtain the source data table and the target data table, determine a table type of the source data table and a table type of the target data table. The target data table may be obtained by processing the source data table, for example, may be obtained by performing data cleaning on the source data table, or obtained by copying the source data table, or obtained in other manners, which is not limited in this embodiment of the present application. In the embodiment of the present application, the source data table and the target data table may refer to data tables related to a medical field, such as a doctor-patient data table, a chronic disease data table, a medical data table, and the like, or data tables related to an educational field, such as a student information data table, or data tables related to other fields.
The computer device can acquire the source data table and the target data table from the data warehouse, and determine the table type of the source data table and the table type of the target data table based on the data synchronization mode of the data warehouse. The data synchronization mode of the data warehouse can comprise a full synchronization mode and an increment synchronization mode, wherein the full synchronization refers to the synchronization of all data in the data table; incremental synchronization synchronizes only the portion of the data table that is subject to change. And if the data synchronization mode of the data warehouse is full synchronization, the table types of the source data table and the target data table are full tables. And if the data synchronization mode of the data warehouse is incremental synchronization, the table types of the source data table and the target data table are incremental tables, and the data warehouse is used for storing various data tables. It will be appreciated that the source data table obtained using full synchronization is the same as the source data table obtained using incremental synchronization, and the destination data table obtained using full synchronization is the same as the destination data table obtained using incremental synchronization.
Alternatively, the computer device may embed all data sources, including the source data table and the target data table, in the data warehouse in advance to form a mapping dictionary, package different data source connection methods in a class through Python, and the subsequent computer device may access the data warehouse by selecting different dictionary keys to obtain the source data table and the target data table. That is, the computer device integrates different data tables into the data warehouse in advance, and accesses the source data table and the target data table by inputting the identifier of the source data table and the identifier of the target data table when the source data table and the target data table need to be acquired, so as to obtain the source data table and the target data table. For example, the identifier of the data table includes, but is not limited to, an account number and a password corresponding to the data table, and the account number and the password are different for each data table. The computer equipment can acquire the source data table by logging in the account and the password of the source data table, and can acquire the target data table by logging in the account and the password of the target data table.
Alternatively, the computer device may also obtain the source data table and the target data table from different file libraries, and the computer device may obtain the source data table and the target data table by obtaining a file name or a storage path of the source data table, obtaining the source data table identical to the file name from a corresponding file library based on the storage path, and obtaining the target data table identical to the file name from a corresponding file library based on the storage path by obtaining the file name or the storage path of the target data table. In the embodiment of the present application, the manner of obtaining the source data table and the target data table is not limited, and the computer device may also obtain the source data table and the target data table in a manner of performing data transmission by other devices, and the like.
S102, if the table type of the source data table and the table type of the target data table are both full tables, respectively determining whether the data volume in the source data table and the data volume in the target data table are larger than a data volume threshold value.
In the embodiment of the application, since the table type of the source data table and the table type of the target data table are both full tables, whether the data volume in the source data table and the data volume in the target data table are greater than the data volume threshold value or not can be respectively determined. The amount of data may refer to the amount of data in the data table. If the table type of the source data table and the table type of the target data table are both full tables, and the data volume in the source data table and the data volume in the target data table are greater than the data volume threshold value, it indicates that the data volumes in the source data table and the target data table are large. Therefore, when the consistency of the source data table and the target data table is verified, if the data in the whole source data table and the data in the whole target data table are verified, the consumed time is long, and the data verification efficiency is low, the data in the source data table and the data in the target data table can be verified after being processed. If the table type of the source data table is a full table and the data volume in the source data table is greater than the data volume threshold value, the source data table is a large table. Because the data amount in the large table is large, the data in the large table can be processed and then checked. Generally, because the target data table is obtained by processing the source data table, and the difference between the data amounts in the two tables is small, if the data amount in the source data table is greater than the data amount threshold, the data amount in the target data table is greater than the data amount threshold; and if the data volume in the source data table is smaller than the data volume threshold value, the data volume in the target data table is smaller than the data volume threshold value.
Optionally, if both the data amount in the source data table and the data amount in the target data table are less than or equal to the data amount threshold, performing data check on each data in the source data table and each data in the target data table, and determining consistency of the source data table and the target data table. That is, if the table type of the source data table is a full table and the data amount in the source data table is less than or equal to the data amount threshold, it indicates that the source data table is a small table and the data amount in the small table is small, and data verification can be directly performed on all data in the data table.
Optionally, if the table type of the source data table and the table type of the target data table are both incremental tables, performing data check on each data in the source data table and each data in the target data table, and determining consistency of the source data table and the target data table. Because the table type of the source data table and the table type of the target data table are incremental tables, which indicates that the data amount in the source data table and the target data table is small, data verification can be performed on all data in the source data table and all data in the target data table, and the consistency of the source data table and the target data table is determined.
S103, if the data volume in the source data table and the data volume in the target data table are both larger than the data volume threshold value, partitioning the source data table to obtain at least one source partition table, and partitioning the target data table to obtain at least one target partition table.
In the embodiment of the application, the data volume in the source data table and the data volume in the target data table are both larger than the data volume threshold, which means that the data volumes in the source data table and the target data table are large. When consistency of the source data table and the target data table is verified, if time consumed by verifying data in the whole source data table and data in the whole target data table is long, and data verification efficiency is low, the source data table and the target data table can be verified after being subjected to partition processing. The data table can be divided into a plurality of partition tables by partitioning the data table, so that the partition tables can be checked to verify the consistency of the data table, and the data in the partition tables is less, so that whether the data in the source data table and the data in the target data table are consistent or not can be determined by verifying whether the data in the partition tables are consistent or not, and the data checking efficiency can be improved.
Optionally, the computer device may divide the source data table based on a preset partition rule to obtain at least one source partition table, where the preset partition rule includes an equal partition rule or a field partition rule. Specifically, the computer device may determine an equal division rule for the source data table based on the data amount in the source data table, and divide the source data table by using the equal division rule to obtain at least one source partition table; or determining a field division rule for the source data table based on the field type of the field in the source data table, dividing the source data table by adopting the field division rule, and dividing the field matched with the preset field type in the source data table into the same source partition table to obtain at least one source partition table, wherein the field division rule indicates the preset field type. The equally dividing rule may refer to equally dividing the data table by N to obtain N partition tables, where N is a positive integer. For example, the greater the amount of data in the source data table, the greater the value of N; the smaller the amount of data in the source data table, the smaller the value of N, and the data tables include a source data table and a target data table. The field partitioning rule may include a preset field type, which may include, for example, a name field, a time field, or other field type. For example, the preset field type is a name field, the source data table includes a plurality of name fields, the name field may refer to a name of a user, and the source data table may include data corresponding to the user. For example, if the source data table includes medical data corresponding to 3 users, the computer device may divide the source data table based on the name field, divide the field matched with the name field into the same source partition table, for example, divide all the medical data corresponding to the user whose name field is zhangsan into one source partition table, divide all the medical data corresponding to the user whose name field is liquan into one source partition table, and so on, thereby dividing the source data table into 3 source partition tables. The data table comprises a source data table and a target data table, and the division rules of the source data table and the target data table can be the same.
Optionally, the method of performing partition processing on the target data table may be the same as the method of performing partition processing on the source data table, and since the partition processing rules of the source data table and the target data table are the same, when at least one source partition table and at least one target partition table are obtained by partitioning, a corresponding relationship between each source partition table and each target partition table may be established, and when data verification is performed on a source partition field in the source partition table and a target partition field in the target partition table, verification may be performed based on the source partition field in the source partition table and the target partition field in the target partition table corresponding to the source partition table in the target partition table, so as to determine whether the source data table and the target data table are consistent. The source data table is divided into at least one source partition table, the target data table is divided into at least one target partition table, data verification can be carried out based on the source partition table and the target partition table during subsequent data verification, and therefore consistency of the source data table and the target data table is determined.
And S104, performing data verification based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table.
In this embodiment of the application, the computer device may perform data verification based on a source partition field in the at least one source partition table and a target partition field in the at least one target partition table to obtain a data verification result, where the data verification result is used to indicate whether the source data table and the target data table are consistent. And if the data check between the source partition field in the at least one source partition table and the target partition field in the at least one target partition table is passed, determining that the source data table is consistent with the target data table.
S105, if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table.
The corresponding target partition table is a partition table corresponding to any one of the source partition tables in the at least one target partition table. Optionally, the computer device may partition the source data table and the target data table in the same partition processing manner to obtain at least one source partition table and at least one target partition table, so that a corresponding relationship between each source partition table and each target partition table may be established, and a target partition table corresponding to the source partition table in the at least one target partition table may be determined based on the corresponding relationship. That is, the computer device may perform data check on the source partition field in the source partition table and the target partition field in the target partition table corresponding to the source partition table, and determine consistency of the source data table and the target data table. That is, the computer device only needs to perform data verification on the fields in the two partition tables (i.e. one source partition table and the target partition table corresponding to the source partition table), so that the data verification efficiency is improved. Or, the target partition field corresponding to the source partition field in the at least one target partition table may also refer to a target partition field in each target partition table in the at least one target partition table, that is, the computer device may perform data verification based on the source partition field in each source partition table and the target partition field in each target partition table in the at least one target partition table, to determine the consistency of the source data table and the target data table. For example, the source data table is divided into x source partition tables, the target data table is divided into y target partition tables, and x and y are positive integers, the computer device can perform data verification on a source partition field in a first source partition table of the x source partition tables and a target partition field in each target partition table of the y target partition tables, so that the accuracy of the data verification is improved.
Optionally, the data check may include a field check and a field format check, where the field check may refer to checking field names in the source data table and field names in the target data table to determine whether the field names in the source data table and the field names in the target data table are consistent; the field format check may refer to checking a data format and field contents of a field in the source data table to determine whether the source data table is consistent with the target data table. The field name may include fields of name, gender, age, etc., the data format of the field may include numeric type, character type or other types, and the field content may include zhang san, man, 30, etc.
Alternatively, the computer device may perform data verification based on the partitioned fields obtained by sampling the fields in the partitioned table, and obtain the data verification result. Specifically, the computer device may sample the source partition field to obtain at least one source sample field; sampling the target partition field to obtain at least one target sampling field; performing field check on at least one source sample field and at least one target sample field; if the field check is passed, carrying out field format check on the field format of at least one source sampling field and the field format of at least one target sampling field; and if the field format check is passed, performing data check on the residual source partition fields in the at least one source partition table and the residual target partition fields in the at least one target partition table to obtain a partition table check result, and determining the data check result based on the partition table check result.
The remaining source partition field in at least one source partition table may refer to a field other than at least one source sampling field in at least one source partition table, for example, the remaining source partition field in one source partition table refers to a source partition field in the source partition table that is not sampled. The remaining target partition field in the at least one target partition table may refer to a field of the at least one target partition table other than the at least one target sampling field, for example, the remaining target partition field in one target partition table refers to a target partition field in the target partition table that is not sampled. That is to say, in the embodiment of the present application, after the source data table is partitioned to obtain the source partition table, the fields in the source partition table may be sampled, and data verification may be performed based on the source sampling fields obtained by sampling and the target sampling fields obtained by sampling, and if the source sampling fields and the target sampling fields have consistency, data verification may be performed on the remaining source partition fields in the source partition table except for the source sampling fields and the remaining target partition fields in the target partition table except for the target sampling fields, so as to obtain a data verification result. If the data check between the source sampling field in any one source partition table and the target sampling field in the target partition table corresponding to the source partition table is not passed or the data check between the residual source partition field and the residual target partition field is not passed, the inconsistency between the source data table and the target data table is determined, the subsequent data check is not needed, and the data check efficiency is improved.
Optionally, when performing field check on at least one source sampling field and at least one target sampling field, the computer device may perform one-to-one check on the source sampling field and the target sampling field, that is, after checking one source sampling field and one target sampling field, checking the next source sampling field and the next target sampling field; or checking the source sampling field and the target sampling field together, namely checking each source sampling field and each target sampling field simultaneously; or, field check may be performed on each source sample field and each target sample field in sequence, which is not limited in this embodiment of the present application. It should be understood that, all the field checks between the fields in the source data table and the fields in the target data table and the field format check between the field formats of the fields in the source data table and the field formats of the fields in the target data table mentioned in the embodiment of the present application may refer to the above one-to-one check, all the checks together or the check in sequence, which is not limited in the embodiment of the present application.
Since the data check includes a field check and a field format check, when the data check is performed on the source sampling field and the target sampling field, the field check may be performed on the source sampling field and the target sampling field first. If the field check of the source sampling field and the field check of the target sampling field are not passed, a data check result is obtained, the data check result indicates that the source data table and the target data table are inconsistent, the field format check of the source sampling field and the target sampling field is not needed, and the data check efficiency can be improved. If the source sampling field and the target sampling field pass the field verification, the field format verification is carried out on the source sampling field and the target sampling field, namely the source sampling field and the target sampling field are subjected to double verification, and the accuracy of data verification is improved. And if the field format check between the source sampling field and the target sampling field is passed, performing data check on the residual source partition field in the at least one source partition table and the residual target partition field in the at least one target partition table, including the field check and the field format check, to obtain a partition table check result, and determining the data check result based on the partition table check result.
Optionally, the method for determining the inconsistency of the source data table and the target data table based on the partition table checking result may include the following cases:
in the first case, if the source partition field in any source partition table is not matched with the target partition field in the target partition table corresponding to the source partition table in at least one target partition table, it is determined that the source data table is inconsistent with the target data table. Wherein, the field matching may refer to that the field names are the same, and the field format matching may refer to that the field formats and the field contents are the same.
In a second case, if each source partition field in the at least one source partition table is matched with a target partition field in the at least one target partition table, and the field format of one or more source partition fields in the at least one source partition table is not matched with the target partition field format in the at least one target partition table, it is determined that the source data table is inconsistent with the target data table.
That is, when data verification is performed on the source partition field and the target partition field, if the verification between the source partition field and the target partition field fails, it is determined that the data verification result indicates that the source data table and the target data table are inconsistent. And if the verification between the source partition field and the target partition field passes and the verification between the field format of the source partition field and the field format of the target partition field does not pass, determining that the data verification result indicates that the source data table and the target data table are inconsistent.
That is to say, in the embodiment of the present application, a plurality of partition tables are obtained by partitioning a data table, then each partition table is sampled to obtain a sampling field, and consistency between a source sampling field and a target sampling field is determined based on consistency between the source sampling field and the target sampling field, and consistency between a field format of the source sampling field and a field format of the target sampling field. By processing each source partition table and each target partition table in the manner, the consistency between each source partition table and each target partition table can be obtained, so that the final data verification result is determined. And if the data verification result between any one partition table is inconsistent, determining that the source data table is inconsistent with the target data table. When the verification of any one step in the prior art is failed, the inconsistency of the source data table and the target data table can be represented, so that the subsequent verification step is stopped, and the data verification efficiency can be improved. If the verification of the steps is passed, the subsequent steps are executed, and the accuracy of data verification can be ensured, so that the accuracy of data verification can be further improved under the condition of improving the data verification efficiency. Optionally, if each source partition field in the at least one source partition table matches with a target partition field in the at least one target partition table, and a field format of each source partition field matches with a field format of a target partition field in the at least one target partition table, performing data verification on the remaining source partition fields and the remaining target partition fields, and if the data verification between the remaining source partition fields and the remaining target partition fields passes, determining that the source data table is consistent with the target data table.
When consistency check is performed on a source data table and a target data table, in the prior art, comparison check is performed on each source field in the source data table and each target field in the target data table one by one, and in the technical scheme of the application, after partitioning and sampling are performed on the source data table and the target data table, data check is performed by adopting the fields obtained by sampling, and because the data volume for performing data check is smaller than the data volume for performing comparison check on each source field in the source data table and each target field in the target data table one by one, the data check efficiency is higher.
Optionally, if the field format of the at least one source sampling field and the field format of the at least one target sampling field pass the verification, data verification may be performed on all source partition fields in the at least one source partition table and all target partition fields in the at least one target partition table to obtain a partition table verification result, and the data verification result is determined based on the partition table verification result. Because the fields in the partition table are sampled firstly and then subjected to data verification, if the data verification is passed, the data verification can be performed on the fields and the field formats in the whole partition table, and the accuracy of the data verification can be ensured.
Optionally, if the table type of the source data table and the table type of the target data table are full tables, and both the data volume in the source data table and the data volume in the target data table are less than or equal to the data volume threshold; or, the table type of the source data table and the table type of the target data table are both incremental tables, that is, when the data amount in the source data table and the data amount in the target data table are small, the computer device may perform data verification on all data in the source data table and all data in the target data table, and determine the consistency of the source data table and the target data table. Specifically, the computer device may perform field check on each source field in the source data table and each target field in the target data table; and if the verification between each source field in the source data table and each target field in the target data table passes, performing field format verification on the field format of each source field in the source data table and the field format of each target field in the target data table, and determining a data verification result between the source data table and the target data table based on the field format verification result.
Optionally, if the target data table is obtained by processing the source data table, an intermediate-level data table between the source data table and the target data table may be obtained, and consistency between the source data table and the target data table is determined by performing data verification on the source data table and the intermediate-level data table and performing data verification on the intermediate-level data table and the target data table. In particular, the computer device may obtain at least one intermediate-level data table between the source data table and the target data table; partitioning a first-level field in a first-level data table to obtain at least one first partition table; partitioning a second level field in a second level data table to obtain at least one second partition table; performing data check on a source partition field in at least one source partition table and a first partition field in at least one first partition table; if the data check between the source partition field and the first partition field passes, performing data check on the first partition field in at least one first partition table and the second partition field in at least one second partition table; and if the data verification between the first partition field and the second partition field passes, performing data verification on the second partition field in the at least one second partition table and the target partition field in the at least one target partition table to obtain a data verification result.
The data check comprises field check and field format check, the at least one middle level data table comprises a first level data table and a second level data table, the first level data table is obtained by performing data extraction processing on a source data table, the second level data table is obtained by performing data cleaning processing on the first level data table, and the target data table is obtained by performing logic processing on the second level data table. Data extraction refers to a process of extracting data from a data source; data cleaning is a process of rechecking and checking data, and aims to delete repeated information, correct existing errors and provide data consistency; the logical processing means logical operation of the cleaned data. It can be understood that if the data verification between the source partition field and the first partition field fails, it is determined that the source data table and the target data table are inconsistent, and a subsequent data verification process is not required to be performed, so as to save data verification efficiency; and if the data check between the source partition field and the first partition field passes and the data check between the first partition field and the second partition field does not pass, determining that the source data table and the target data table are inconsistent without executing a subsequent data check process, so as to save the data check efficiency.
The data verification method comprises the steps of obtaining an intermediate level data table between a source data table and a target data table, performing data verification on the source data table and the intermediate level data table, performing data verification on the intermediate level data table and the target data table, determining the consistency between the source data table and the target data table, performing data verification in a multi-layer verification mode, and determining whether the source data table and the target data table are inconsistent due to abnormal links, so that abnormal data can be quickly determined, and modification or other operations on the data table are facilitated.
Optionally, if the data verification result indicates that the source data table is inconsistent with the target data table, the computer device may obtain an abnormal source field in the source data table and an abnormal target field in the target data table, and output the abnormal source field and the abnormal target field.
Specifically, the computer device can output the abnormal source field and the abnormal target field in the visual interface, for example, data can be converted into dataframe by pandas and written into excel, so that all the abnormal source fields and the abnormal target fields can be clearly and clearly displayed in the visual interface in a form of a table, and the displayed table can be checked manually subsequently, thereby ensuring the accuracy of data checking.
In the embodiment of the application, the table type of a source data table and the table type of a target data table are determined by acquiring the source data table and the target data table; if the table type of the source data table and the table type of the target data table are both full tables, respectively determining whether the data volume in the source data table and the data volume in the target data table are greater than a data volume threshold value; if the data volume in the source data table and the data volume in the target data table are both larger than the data volume threshold value, performing partition processing on the source data table to obtain at least one source partition table, and performing partition processing on the target data table to obtain at least one target partition table; and performing data verification based on the source partition field in at least one source partition table and the target partition field in at least one target partition table, and if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table. The table type of the data table is predetermined, and when the table type of the data table is determined to be a full table and the data amount in the data table is greater than the data amount threshold value, the data in the data table is partitioned to obtain a plurality of partition tables, the data in each partition table can be verified, and whether the data in the source partition table and the data in the target partition table are consistent or not can be determined. If the data in a certain source partition table is determined to be inconsistent with the data in the target partition table, the inconsistency of the source data table and the target data table can be determined, the whole source data table and the whole target data table do not need to be checked, and the data checking efficiency can be improved. In addition, by automatically checking the data in the source data table and the target data table, the fields in the source data table and the target data table do not need to be manually compared one by one, so that the data checking efficiency can be improved, and the data checking accuracy can be improved.
Optionally, please refer to fig. 2, where fig. 2 is a schematic flow chart of another data verification method provided in the embodiment of the present application. The data verification method can be applied to computer equipment; as shown in fig. 2, the data verification method includes, but is not limited to, the following steps:
s201, a source data table and a target data table are obtained, and the table type of the source data table and the table type of the target data table are determined.
S202, whether the table types of the source data table and the target data table are all full tables or not is determined.
If yes, that is, the table type of the source data table and the table type of the target data table are both full tables, step S203 is executed, that is, it is determined whether the data amount in the source data table and the data amount in the target data table are greater than the data amount threshold respectively. If not, that is, the table type of the source data table and the table type of the target data table are both incremental tables, step S206 is executed, that is, a field check is performed on each source field in the source data table and each target field in the target data table.
S203, respectively determining whether the data amount in the source data table and the data amount in the target data table are larger than the data amount threshold value.
If yes, that is, the data amount in the source data table and the data amount in the target data table are both greater than the data amount threshold, step S204 is executed. If not, that is, the data amount in the source data table and the data amount in the target data table are both less than or equal to the data amount threshold, step S206 is executed.
S204, the source data table is subjected to partition processing to obtain at least one source partition table, and the target data table is subjected to partition processing to obtain at least one target partition table.
S205, data verification is carried out based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table, and a data verification result is obtained.
S206, performing field check on each source field in the source data table and each target field in the target data table.
And S207, if the field verification is passed, performing field format verification on the field format of each source field in the source data table and the field format of each target field in the target data table, and determining a data verification result between the source data table and the target data table based on the field format verification result.
In the embodiment of the application, the table type of a source data table and the table type of a target data table are determined by acquiring the source data table and the target data table; if the table type of the source data table and the table type of the target data table are both full tables, respectively determining whether the data volume in the source data table and the data volume in the target data table are greater than a data volume threshold value; if the data volume in the source data table and the data volume in the target data table are both larger than the data volume threshold value, performing partition processing on the source data table to obtain at least one source partition table, and performing partition processing on the target data table to obtain at least one target partition table; and performing data verification based on the source partition field in at least one source partition table and the target partition field in at least one target partition table, and if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table. The table type of the data table is predetermined, and when the table type of the data table is determined to be a full table and the data amount in the data table is greater than the data amount threshold value, the data in the data table is partitioned to obtain a plurality of partition tables, the data in each partition table can be verified, and whether the data in the source partition table and the data in the target partition table are consistent or not can be determined. If the data in a certain source partition table is determined to be inconsistent with the data in the target partition table, the inconsistency of the source data table and the target data table can be determined, the whole source data table and the whole target data table do not need to be checked, and the data checking efficiency can be improved. In addition, by automatically checking the data in the source data table and the target data table, the fields in the source data table and the target data table do not need to be manually compared one by one, so that the data checking efficiency can be improved, and the data checking accuracy can be improved.
The method of the embodiments of the present application is described above, and the apparatus of the embodiments of the present application is described below.
Referring to fig. 3, fig. 3 is a schematic diagram of a structure of a data verification apparatus provided in an embodiment of the present application, where the data verification apparatus may be a computer program (including program code) running in a computer device, for example, the data verification apparatus is an application software; the data verification device can be used for executing corresponding steps in the data verification method provided by the embodiment of the application. The data verification device 30 includes:
a data obtaining module 31, configured to obtain a source data table and a target data table, and determine a table type of the source data table and a table type of the target data table;
a quantity determining module 32, configured to determine whether the data quantity in the source data table and the data quantity in the target data table are greater than a data quantity threshold value if both the table type of the source data table and the table type of the target data table are full tables;
a partitioning processing module 33, configured to perform partitioning processing on the source data table to obtain at least one source partitioning table and perform partitioning processing on the target data table to obtain at least one target partitioning table if both the data amount in the source data table and the data amount in the target data table are greater than the data amount threshold;
a data checking module 34, configured to perform data checking based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table;
and a result determining module 35, configured to determine that the source data table is inconsistent with the target data table if data verification between the source partition field in any one of the source partition tables and the target partition field in the corresponding target partition table fails, where the corresponding target partition table is a partition table corresponding to any one of the at least one target partition table.
Optionally, the partition processing module 33 is specifically configured to:
determining an equal division rule aiming at the source data table based on the data volume in the source data table, and dividing the source data table by adopting the equal division rule to obtain at least one source partition table; alternatively, the first and second electrodes may be,
determining a field division rule for the source data table based on the field type of the field in the source data table, and dividing the source data table by adopting the field division rule to obtain the at least one source partition table, wherein the field division rule indicates the preset field type.
Optionally, the data check includes a field check and a field format check; the data verification module 34 includes:
a first sampling unit 341, configured to sample the source partition field to obtain at least one source sampling field;
a second sampling unit 342, configured to sample the target partition field to obtain at least one target sample field;
a field check unit 343, configured to perform field check on the at least one source sample field and the at least one target sample field;
a format checking unit 344, configured to perform field format checking on a field format of the at least one source sample field and a field format of the at least one target sample field if the field checking passes;
and the region checking unit 345 is configured to perform data checking on the remaining source partition fields in the at least one source partition table and the remaining target partition fields in the at least one target partition table if the field format check passes.
Optionally, the result determining module 35 is specifically configured to:
if the source partition field in any source partition table is not matched with the target partition field in the corresponding target partition table, determining that the source data table is inconsistent with the target data table; alternatively, the first and second electrodes may be,
and if each source partition field in the at least one source partition table is matched with a target partition field in the at least one target partition table, and the field format of one or more source partition fields in the at least one source partition table is not matched with the target partition field format in the at least one target partition table, determining that the source data table is inconsistent with the target data table.
Optionally, the data verification apparatus 30 further includes: a second checking module 36, configured to perform field checking on each source field in the source data table and each target field in the target data table if the data amount in the source data table and the data amount in the target data table are less than or equal to the data amount threshold;
if the field check is passed, carrying out field format check on the field format of each source field in the source data table and the field format of each target field in the target data table;
and if the field formats of one or more source fields in the source data table are not matched with the field formats of the target fields in the target data table, determining that the source data table is inconsistent with the target data table.
Optionally, the data verification apparatus 30 further includes: a third verification module 37 for:
if the form type of the source data table and the form type of the target data table are both incremental forms, performing field verification on each source field in the source data table and each target field in the target data table;
if the field check is passed, carrying out field format check on the field format of each source field in the source data table and the field format of each target field in the target data table;
and if the field formats of one or more source fields in the source data table are not matched with the field formats of the target fields in the target data table, determining that the source data table is inconsistent with the target data table.
Optionally, the data verification apparatus 30 further includes: a hierarchy acquisition module 38 to:
acquiring at least one intermediate level data table between the source data table and the target data table, wherein the at least one intermediate level data table comprises a first level data table and a second level data table, the first level data table is obtained by performing data extraction processing on the source data table, the second level data table is obtained by performing data cleaning processing on the first level data table, and the target data table is obtained by performing logic processing on the second level data table;
partitioning the first-level fields in the first-level data table to obtain at least one first partition table;
partitioning a second level field in the second level data table to obtain at least one second partition table;
the result determining module 35 is specifically configured to:
if the data check between the source partition field and the first partition field is not passed, determining that the source data table is inconsistent with the target data table; alternatively, the first and second electrodes may be,
if the data check between the source partition field and the first partition field passes, performing data check on the first partition field in the at least one first partition table and the second partition field in the at least one second partition table;
if the data verification between the first partition field and the second partition field is not passed, determining that the source data table is inconsistent with the target data table; alternatively, the first and second electrodes may be,
and if the data check between the first partition field and the second partition field passes, performing data check on the second partition field in the at least one second partition table and the target partition field in the at least one target partition table, and if the data check between the second partition field and the target partition field does not pass, determining that the source data table and the target data table are inconsistent, wherein the data check comprises field check and field format check.
It should be noted that, for the content that is not mentioned in the embodiment corresponding to fig. 3, reference may be made to the description of the method embodiment, and details are not described here again.
In the embodiment of the application, the table type of a source data table and the table type of a target data table are determined by acquiring the source data table and the target data table; if the table type of the source data table and the table type of the target data table are both full tables, respectively determining whether the data volume in the source data table and the data volume in the target data table are greater than a data volume threshold value; if the data volume in the source data table and the data volume in the target data table are both larger than the data volume threshold value, performing partition processing on the source data table to obtain at least one source partition table, and performing partition processing on the target data table to obtain at least one target partition table; and performing data verification based on the source partition field in at least one source partition table and the target partition field in at least one target partition table, and if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table. The table type of the data table is predetermined, and when the table type of the data table is determined to be a full table and the data amount in the data table is greater than the data amount threshold value, the data in the data table is partitioned to obtain a plurality of partition tables, the data in each partition table can be verified, and whether the data in the source partition table and the data in the target partition table are consistent or not can be determined. If the data in a certain source partition table is determined to be inconsistent with the data in the target partition table, the inconsistency of the source data table and the target data table can be determined, the whole source data table and the whole target data table do not need to be checked, and the data checking efficiency can be improved. In addition, by automatically checking the data in the source data table and the target data table, the fields in the source data table and the target data table do not need to be manually compared one by one, so that the data checking efficiency can be improved, and the data checking accuracy can be improved.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure. As shown in fig. 4, the computer device 40 may include: the processor 401, the network interface 404 and the memory 405, and the computer device 40 may further include: a user interface 403, and at least one communication bus 402. Wherein a communication bus 402 is used to enable connective communication between these components. The user interface 403 may include a Display (Display) and a Keyboard (Keyboard), and the selectable user interface 403 may also include a standard wired interface and a standard wireless interface. The network interface 404 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 405 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 405 may alternatively be at least one storage device located remotely from the aforementioned processor 401. As shown in fig. 4, the memory 405, which is a type of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.
In the computer device 40 shown in fig. 4, the network interface 404 may provide network communication functions; and the user interface 403 is primarily an interface for providing input to a user; and processor 401 may be used to invoke a device control application stored in memory 405 to implement:
acquiring a source data table and a target data table, and determining the table type of the source data table and the table type of the target data table;
if the table type of the source data table and the table type of the target data table are both full tables, respectively determining whether the data volume in the source data table and the data volume in the target data table are greater than a data volume threshold value;
if the data volume in the source data table and the data volume in the target data table are both larger than the data volume threshold value, performing partition processing on the source data table to obtain at least one source partition table, and performing partition processing on the target data table to obtain at least one target partition table;
performing data verification based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table;
and if the data check between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table, wherein the corresponding target partition table is a partition table corresponding to any source partition table in at least one target partition table.
It should be understood that the computer device 40 described in this embodiment of the present application may perform the description of the data verification method in the embodiment corresponding to fig. 1 and fig. 2, and may also perform the description of the data verification apparatus in the embodiment corresponding to fig. 3, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
In the embodiment of the application, the table type of a source data table and the table type of a target data table are determined by acquiring the source data table and the target data table; if the table type of the source data table and the table type of the target data table are both full tables, respectively determining whether the data volume in the source data table and the data volume in the target data table are greater than a data volume threshold value; if the data volume in the source data table and the data volume in the target data table are both larger than the data volume threshold value, performing partition processing on the source data table to obtain at least one source partition table, and performing partition processing on the target data table to obtain at least one target partition table; and performing data verification based on the source partition field in at least one source partition table and the target partition field in at least one target partition table, and if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table. The table type of the data table is predetermined, and when the table type of the data table is determined to be a full table and the data amount in the data table is greater than the data amount threshold value, the data in the data table is partitioned to obtain a plurality of partition tables, the data in each partition table can be verified, and whether the data in the source partition table and the data in the target partition table are consistent or not can be determined. If the data in a certain source partition table is determined to be inconsistent with the data in the target partition table, the inconsistency of the source data table and the target data table can be determined, the whole source data table and the whole target data table do not need to be checked, and the data checking efficiency can be improved. In addition, by automatically checking the data in the source data table and the target data table, the fields in the source data table and the target data table do not need to be manually compared one by one, so that the data checking efficiency can be improved, and the data checking accuracy can be improved.
Embodiments of the present application also provide a computer-readable storage medium storing a computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method according to the aforementioned embodiments, the computer may be a part of the aforementioned computer device. Such as the processor 401 described above. By way of example, the program instructions may be executed on one computer device, or on multiple computer devices located at one site, or distributed across multiple sites and interconnected by a communication network, which may comprise a blockchain network.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (10)

1. A method for data verification, comprising:
acquiring a source data table and a target data table, and determining the table type of the source data table and the table type of the target data table;
if the table type of the source data table and the table type of the target data table are both full tables, respectively determining whether the data volume in the source data table and the data volume in the target data table are greater than a data volume threshold value;
if the data volume in the source data table and the data volume in the target data table are both larger than the data volume threshold value, performing partition processing on the source data table to obtain at least one source partition table, and performing partition processing on the target data table to obtain at least one target partition table;
performing data verification based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table;
and if the data check between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table, wherein the corresponding target partition table is the partition table corresponding to any source partition table in the at least one target partition table.
2. The method according to claim 1, wherein the partitioning the source data table to obtain at least one source partition table comprises:
determining an equal division rule aiming at the source data table based on the data volume in the source data table, and dividing the source data table by adopting the equal division rule to obtain at least one source division table; alternatively, the first and second electrodes may be,
determining a field division rule aiming at the source data table based on the field type of the field in the source data table, and dividing the source data table by adopting the field division rule to obtain the at least one source partition table, wherein the field division rule indicates the preset field type.
3. The method of claim 1, wherein the data check comprises a field check and a field format check;
the performing data verification based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table includes:
sampling the source partition field to obtain at least one source sampling field;
sampling the target partition field to obtain at least one target sampling field;
performing a field check on the at least one source sample field and the at least one target sample field;
if the field check is passed, carrying out field format check on the field format of the at least one source sample field and the field format of the at least one target sample field;
and if the field format passes the verification, performing data verification on the residual source partition fields in the at least one source partition table and the residual target partition fields in the at least one target partition table.
4. The method of claim 3, wherein determining that the source data table is inconsistent with the target data table if the data check between the source partition field in any one of the source partition tables and the target partition field in the corresponding target partition table fails comprises:
if the source partition field in any source partition table is not matched with the target partition field in the corresponding target partition table, determining that the source data table is inconsistent with the target data table; alternatively, the first and second electrodes may be,
and if each source partition field in the at least one source partition table is matched with a target partition field in the at least one target partition table, and the field format of one or more source partition fields in the at least one source partition table is not matched with the target partition field format in the at least one target partition table, determining that the source data table is inconsistent with the target data table.
5. The method of claim 1, wherein after determining whether the amount of data in the source data table and the amount of data in the target data table are greater than a data amount threshold, the method further comprises:
if the data volume in the source data table and the data volume in the target data table are both smaller than or equal to the data volume threshold, performing field verification on each source field in the source data table and each target field in the target data table;
if the field check is passed, carrying out field format check on the field format of each source field in the source data table and the field format of each target field in the target data table;
and if the field formats of one or more source fields in the source data table are not matched with the field formats of the target fields in the target data table, determining that the source data table is inconsistent with the target data table.
6. The method of claim 1, wherein after determining the table type of the source data table and the table type of the target data table, the method further comprises:
if the table type of the source data table and the table type of the target data table are both incremental tables, performing field verification on each source field in the source data table and each target field in the target data table;
if the field check is passed, carrying out field format check on the field format of each source field in the source data table and the field format of each target field in the target data table;
and if the field formats of one or more source fields in the source data table are not matched with the field formats of the target fields in the target data table, determining that the source data table is inconsistent with the target data table.
7. The method of claim 1, wherein before performing the data check based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table, the method further comprises:
acquiring at least one intermediate level data table between the source data table and the target data table, wherein the at least one intermediate level data table comprises a first level data table and a second level data table, the first level data table is obtained by performing data extraction processing on the source data table, the second level data table is obtained by performing data cleaning processing on the first level data table, and the target data table is obtained by performing logic processing on the second level data table;
partitioning the first-level fields in the first-level data table to obtain at least one first partition table;
partitioning a second level field in the second level data table to obtain at least one second partition table;
if the data check between the source partition field in any source partition table and the target partition field in the corresponding target partition table fails, determining that the source data table is inconsistent with the target data table, including:
if the data check between the source partition field and the first partition field is not passed, determining that the source data table is inconsistent with the target data table; alternatively, the first and second electrodes may be,
if the data check between the source partition field and the first partition field passes, performing data check on a first partition field in the at least one first partition table and a second partition field in the at least one second partition table;
if the data check between the first partition field and the second partition field is not passed, determining that the source data table is inconsistent with the target data table; alternatively, the first and second electrodes may be,
and if the data check between the first partition field and the second partition field passes, performing data check on the second partition field in the at least one second partition table and the target partition field in the at least one target partition table, and if the data check between the second partition field and the target partition field does not pass, determining that the source data table and the target data table are inconsistent, wherein the data check comprises field check sum field format check.
8. A data verification apparatus, comprising:
the data acquisition module is used for acquiring a source data table and a target data table and determining the table type of the source data table and the table type of the target data table;
the quantity determining module is used for respectively determining whether the data quantity in the source data table and the data quantity in the target data table are larger than a data quantity threshold value if the table type of the source data table and the table type of the target data table are both full-quantity tables;
the partition processing module is used for performing partition processing on the source data table to obtain at least one source partition table and performing partition processing on the target data table to obtain at least one target partition table if the data volume in the source data table and the data volume in the target data table are both greater than the data volume threshold value;
the data checking module is used for checking data based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table;
and the result determining module is used for determining that the source data table is inconsistent with the target data table if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table fails, wherein the corresponding target partition table is the partition table corresponding to any source partition table in the at least one target partition table.
9. A computer device, comprising: a processor, a memory, and a network interface;
the processor is coupled to the memory and the network interface, wherein the network interface is configured to provide data communication functionality, the memory is configured to store program code, and the processor is configured to invoke the program code to cause the computer device to perform the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1-7.
CN202111446609.8A 2021-11-29 2021-11-29 Data verification method, device and equipment and readable storage medium Pending CN114116724A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111446609.8A CN114116724A (en) 2021-11-29 2021-11-29 Data verification method, device and equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111446609.8A CN114116724A (en) 2021-11-29 2021-11-29 Data verification method, device and equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN114116724A true CN114116724A (en) 2022-03-01

Family

ID=80368790

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111446609.8A Pending CN114116724A (en) 2021-11-29 2021-11-29 Data verification method, device and equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114116724A (en)

Similar Documents

Publication Publication Date Title
CN109558525B (en) Test data set generation method, device, equipment and storage medium
CN110474900B (en) Game protocol testing method and device
CN111563218B (en) Page repairing method and device
CN114035827A (en) Application program updating method, device, equipment and storage medium
CN109815112B (en) Data debugging method and device based on functional test and terminal equipment
CN114281793A (en) Data verification method, device and system
CN114416877A (en) Data processing method, device and equipment and readable storage medium
KR102227912B1 (en) Optimized data condenser and method
CN114116724A (en) Data verification method, device and equipment and readable storage medium
CN114896161A (en) File construction method and device based on artificial intelligence, computer equipment and medium
CN109977430A (en) A kind of text interpretation method, device and equipment
CN113053531B (en) Medical data processing method, medical data processing device, computer readable storage medium and equipment
CN115292178A (en) Test data searching method, device, storage medium and terminal
CN114942905A (en) Migration data verification method, device, equipment and storage medium
CN114564336A (en) Data consistency checking method, device, equipment and storage medium
CN113704120A (en) Data transmission method, device, equipment and storage medium
CN109491699B (en) Resource checking method, device, equipment and storage medium of application program
CN111857883A (en) Page data checking method and device, electronic equipment and storage medium
CN112163127A (en) Relationship graph construction method and device, electronic equipment and storage medium
US9886472B2 (en) Verification of record based systems
CN113094415A (en) Data extraction method and device, computer readable medium and electronic equipment
CN116010349B (en) Metadata-based data checking method and device, electronic equipment and storage medium
CN116661758B (en) Method, device, electronic equipment and medium for optimizing log framework configuration
CN114443498A (en) Data processing method and device, electronic equipment and storage medium
CN110349025B (en) Method and device for preventing loss of contract assets based on non-cost transaction output

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination