CN116226142A - Data verification method, device, electronic equipment and storage medium - Google Patents

Data verification method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116226142A
CN116226142A CN202211655794.6A CN202211655794A CN116226142A CN 116226142 A CN116226142 A CN 116226142A CN 202211655794 A CN202211655794 A CN 202211655794A CN 116226142 A CN116226142 A CN 116226142A
Authority
CN
China
Prior art keywords
target
field
data
verification
intermediate table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211655794.6A
Other languages
Chinese (zh)
Inventor
王毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN202211655794.6A priority Critical patent/CN116226142A/en
Publication of CN116226142A publication Critical patent/CN116226142A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • G06F16/2386Bulk updating operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data verification method, a data verification device, electronic equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining a target file and a table structure of a target table corresponding to the target file, determining the table structure of an intermediate table based on the table structure of the target table, and adding data of the target file to the intermediate table, so that data verification is performed based on field data included in the intermediate table. Because the data of the target file can be added into the intermediate table, and the data verification is performed based on the field data included in the intermediate table, the corresponding third-party dependence is not required to be introduced for different file types, the file is not required to be analyzed to the memory, and the memory overflow risk in the file analysis process is effectively avoided.

Description

Data verification method, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to a data verification method, a device, an electronic device, and a storage medium.
Background
Under the business scene of large file data collection, data verification is needed, and the verified data can be gathered into a corresponding destination table of the database. For reading the content of a large file, the traditional mode is to analyze the file to load all lines of the file into a memory, or analyze the file to traverse each line of the file and read the line into the memory, and then write corresponding verification logic according to specific business rules to verify the file data. In this process, there is no limitation on the upper limit of the size of the file, so when the file is large enough, the memory used for running the program may be larger than the maximum memory that can be provided, and the memory of the program overflows. In addition, when the file is parsed, because the file types are different, a third party dependence for parsing the file of the corresponding type needs to be introduced to support the file parsing of different types.
Therefore, how to complete data verification of large file data while reducing the use of system memory and reducing the dependence of a third party is a considerable problem.
Disclosure of Invention
The present application aims to solve, at least to some extent, one of the technical problems in the related art.
Therefore, a first object of the present application is to provide a data verification method, which adds data of a target file to an intermediate table, performs data verification based on field data included in the intermediate table, and completes data verification of the data of the target file while reducing use of a system memory and reducing dependence of a third party.
A second object of the present application is to provide a data verification device.
A third object of the present application is to propose an electronic device.
A fourth object of the present application is to propose a computer readable storage medium.
A fifth object of the present application is to propose a computer programme product.
To achieve the above object, an embodiment of a first aspect of the present application provides a data verification method, including:
acquiring a target file and a table structure of a target table corresponding to the target file;
determining a table structure of an intermediate table based on the table structure of the destination table, and adding the data of the target file to the intermediate table;
And performing data verification based on the field data included in the intermediate table.
Optionally, as a first possible implementation manner of the first aspect, the determining, based on a table structure of the destination table, a table structure of an intermediate table includes:
removing the field attribute of each field aiming at least one field with field attribute in the table structure of the destination table to obtain the table structure of the destination table without the field attribute;
and determining the table structure of the destination table without the field attribute as the table structure of an intermediate table.
Optionally, as a second possible implementation manner of the first aspect, the adding data of the target file to the intermediate table includes:
and taking any row of data in the target file as one record of the intermediate table, and adding any row of data in the target file into a corresponding field of a corresponding record of the intermediate table.
Optionally, as a third possible implementation manner of the first aspect, the performing data verification based on field data included in the intermediate table includes:
and aiming at a target field of the intermediate table, performing data verification on field data of the target field based on a verification target of the target field.
Optionally, as a fourth possible implementation manner of the first aspect, the performing, for a target field of the intermediate table, data verification on field data of the target field based on a verification target of the target field includes:
querying the total record number of the intermediate table, and querying the record number meeting a verification target based on the verification target of a target field of the intermediate table aiming at the target field of the intermediate table;
and determining whether the target field passes verification or not based on the total record number and the record number meeting the verification target.
Optionally, as a fifth possible implementation manner of the first aspect, the querying the total record number of the intermediate table, and for a target field of the intermediate table, based on a verification target of the target field, querying the record number that meets the verification target includes:
querying the total record number of the intermediate table by adopting a data query language DQL;
and aiming at a target field of the intermediate table, writing a corresponding DQL based on a verification target of the target field, and inquiring the number of records meeting the verification target.
Optionally, as a sixth possible implementation manner of the first aspect, the determining, based on the total number of records and the number of records that meet the verification target, whether the target field passes the verification includes:
Comparing the number of records meeting the verification target with the total number of records;
determining that the target field passes verification under the condition that the number of records meeting the verification target is consistent with the total number of records;
and under the condition that the number of records meeting the verification target is inconsistent with the total number of records, determining that the target field is not verified.
Optionally, as a seventh possible implementation manner of the first aspect, the method further includes:
and adding the target field to the destination table in the case that the target field passes the verification.
Optionally, as an eighth possible implementation manner of the first aspect, the verification objective of the target field is determined based on a field attribute of the target field in a table structure of the destination table, where the verification objective includes field data non-repetition, field data non-null, field data range compliance, and field data length compliance.
To achieve the above object, an embodiment of a second aspect of the present application provides a data verification device, including:
the acquisition module is used for acquiring a target file and a table structure of a target table corresponding to the target file;
The processing module is used for determining the table structure of an intermediate table based on the table structure of the target table and adding the data of the target file to the intermediate table;
and the verification module is used for carrying out data verification based on the field data included in the intermediate table.
Optionally, as a first possible implementation manner of the second aspect, the processing module includes:
a removing unit, configured to remove, for at least one field having a field attribute in a table structure of the destination table, the field attribute of each field, so as to obtain a table structure of the destination table that does not contain the field attribute;
and the determining unit is used for determining the table structure of the destination table without the field attribute as the table structure of the intermediate table.
Optionally, as a second possible implementation manner of the second aspect, the determining unit is further configured to:
and taking any row of data in the target file as one record of the intermediate table, and adding any row of data in the target file into a corresponding field of a corresponding record of the intermediate table.
Optionally, as a third possible implementation manner of the second aspect, the verification module includes:
and the data verification unit is used for verifying the data of the field data of the target field based on the verification target of the target field aiming at the target field of the intermediate table.
Optionally, as a fourth possible implementation manner of the second aspect, the data verification unit is further configured to:
querying the total record number of the intermediate table, and querying the record number meeting a verification target based on the verification target of a target field of the intermediate table aiming at the target field of the intermediate table;
and determining whether the target field passes verification or not based on the total record number and the record number meeting the verification target.
Optionally, as a fifth possible implementation manner of the second aspect, the data verification unit is further configured to:
querying the total record number of the intermediate table by adopting a data query language DQL;
and aiming at a target field of the intermediate table, writing a corresponding DQL based on a verification target of the target field, and inquiring the number of records meeting the verification target.
Optionally, as a sixth possible implementation manner of the second aspect, the data verification unit is further configured to:
comparing the number of records meeting the verification target with the total number of records;
determining that the target field passes verification under the condition that the number of records meeting the verification target is consistent with the total number of records;
And under the condition that the number of records meeting the verification target is inconsistent with the total number of records, determining that the target field is not verified.
Optionally, as a seventh possible implementation manner of the second aspect, the apparatus further includes:
and the adding module is used for adding the target field to the destination table under the condition that the target field passes the verification.
Optionally, as an eighth possible implementation manner of the second aspect, the verification target of the target field is determined based on a field attribute of the target field in a table structure of the target table, and the verification target includes field data non-repetition, field data non-null, field data range compliance, and field data length compliance.
To achieve the above object, an embodiment of a third aspect of the present application provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data verification method of the first aspect described above.
To achieve the above object, an embodiment of a fourth aspect of the present application proposes a computer-readable storage medium storing computer instructions for causing the computer to execute the data verification method of the foregoing first aspect.
In order to achieve the above object, an embodiment of a fifth aspect of the present application proposes a computer program product comprising a computer program which, when executed by a processor, implements the data verification method of the first aspect described above.
The technical scheme provided by the embodiment of the application comprises the following beneficial effects:
the method comprises the steps of obtaining a target file and a table structure of a target table corresponding to the target file, determining the table structure of an intermediate table based on the table structure of the target table, and adding data of the target file to the intermediate table, so that data verification is performed based on field data included in the intermediate table. Because the data of the target file can be added into the intermediate table, and the data verification is performed based on the field data included in the intermediate table, the corresponding third-party dependence is not required to be introduced for different file types, the file is not required to be analyzed to the memory, and the memory overflow risk in the file analysis process is effectively avoided.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
fig. 1 is a flow chart of a data verification method according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating another data verification method according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating another data verification method according to an embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating another data verification method according to an embodiment of the present disclosure;
FIG. 5 is a flowchart illustrating another data verification method according to an embodiment of the present disclosure;
fig. 6 is a flow chart of a data verification method under a scenario provided in an embodiment of the present application;
FIG. 7 is a schematic diagram of fields and verification targets of fields in an intermediate table in a scenario provided in an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a data verification device according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a data verification device according to an embodiment of the present application;
Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.
Under the business scene of large file data collection, data verification is needed, and the verified data can be gathered into a corresponding destination table of the database. The data verification refers to converting the data into a rule verification SQL statement aiming at a field value according to a specific business rule, and judging whether data does not meet the rule, how much data does not meet the rule and the like according to the queried data quantity. Alternatively, a particular business rule may be that a field value cannot be repeated, whether a field value meets specifications, a field value cannot be null, etc.
In the related art, in a business scenario of large file data collection, data verification is supported for files of different types (Excel (table file format), SHP (shapefile format), CSV (common-Separated Values, comma Separated value file format), SQL (Structured Query Language, structured query language database script file format), and the like), and there is no limit on the upper limit of the size of the file. The specific verification mode is as follows: firstly, reading the content of a file, namely analyzing the file to load all lines of the file into a memory, or analyzing the file to traverse each line of the file and read the line into the memory, and then writing corresponding check logic according to a specific business rule to check the file data. The following disadvantages exist in this process:
1. Because there is no limit on the upper limit of the size of the file, when the file is large enough, the memory used for running the program may be larger than the maximum memory that can be provided, resulting in overflow of the program memory.
2. When any row of contents of a file is checked, all contents of the corresponding row need to be loaded into the memory, so that column information without checking is contained, and the checking efficiency is low.
3. Because of the need to support data verification on different types of files, third party dependencies for resolving corresponding types of files need to be introduced for different file types.
Aiming at the problems, the embodiment of the application provides a data verification method, which is used for adding the data of the target file into an intermediate table and performing data verification based on field data included in the intermediate table, so that the data verification of the data of the target file is completed while the use of a system memory is reduced and the dependence of a third party is reduced.
The data verification method, the data verification device, the electronic equipment and the storage medium according to the embodiment of the application are described below with reference to the accompanying drawings.
Fig. 1 is a flow chart of a data verification method according to an embodiment of the present application.
It should be noted that, the data verification method according to the embodiment of the present invention may be performed by the data verification apparatus provided by the embodiment of the present invention. The data verification device may be an electronic device or may be configured in an electronic device. The electronic device may be any stationary or mobile computing device capable of performing data processing, for example, a mobile computing device such as a notebook computer, a smart phone, a wearable device, or a stationary computing device such as a desktop computer, or a server, or other types of computing devices, which is not limited in this embodiment.
As shown in fig. 1, the data verification method includes the steps of:
step 101, obtaining a target file and a table structure of a destination table corresponding to the target file.
In this embodiment, the target file may be any file to be verified, and the type, size, and the like of the target file are not limited in this embodiment. Alternatively, the type of the target file may be Excel, or may be SHP, or may be CSV, or may be SQL, which is not limited in this embodiment, and the target file may be a large file, or may be a normal file, which is not limited in this embodiment.
In this embodiment, the data verification device may acquire the target file to be verified in various public, legal and compliance manners, for example, the data verification device may acquire the target file to be verified in real time, or may acquire the target file to be verified from other devices through network transmission or physical copying, or may acquire the target file to be verified in other public, legal and compliance manners, which is not limited in this embodiment.
In this embodiment, after the target file to be checked is acquired, the table structure of the destination table corresponding to the target file may be acquired through an ETL (Extract-Transform-Load) tool. The present embodiment of the ETL tool to be used is not limited thereto. Alternatively, the table structure of the destination table corresponding to the target file may be obtained through ketle, or the table structure of the destination table corresponding to the target file may be obtained through information, or the table structure of the destination table corresponding to the target file may be obtained through data stage, which is not limited in this embodiment.
It should be noted that, the ETL is used to describe a process of extracting (extracting), converting (transforming), and loading (Load) data from a source end to a destination end, and the process of the ETL may be simply understood as a process of selecting data from a source table, performing a series of operations such as calculation/join, and updating/insert to the destination table. Optionally, in the extraction stage, the data source problem is mainly solved, for example, which sources extract data, which data are extracted, which way to extract data, etc. In addition, the data extracted in the extraction stage can form a unified data format and is stored in the buffer area (not directly stored in the destination end). During the conversion phase, various processes are performed on the extracted data, such as screening, cleansing, format conversion, merging, splitting, sorting, computing, etc. The cleaning is to remove the data which is not satisfactory, including error data, incomplete data and repeated data. In the loading stage, the processed data is loaded to the destination end, and common loading modes include full loading and incremental loading.
Step 102, based on the table structure of the destination table, determining the table structure of the intermediate table, and adding the data of the target file to the intermediate table.
In this embodiment, the intermediate table may be understood as a table generated based on a table structure of a destination table corresponding to a target file to be detected. The field number, the field name and the field annotation of the intermediate table are corresponding and consistent with the field number, the field name and the field annotation of the destination table.
In the related art, it is required to parse the file first to load all lines of the file into the memory, and then verify the file data, or parse the file first to traverse and read each line of the file into the memory, and then verify the file data. Because there is no limit on the upper limit of the size of the file, there is a risk of memory overflow in the process of analyzing the file, in this embodiment, in order to avoid the risk of memory overflow in the process of analyzing the file, after the table structure of the destination table corresponding to the target file is obtained through the ETL tool, the table structure of the intermediate table corresponding to the target file may be determined based on the table structure of the destination table corresponding to the target file, and the data of the target file may be added to the intermediate table. Therefore, the memory overflow risk in the file analysis process is effectively avoided by converging the data of the target file to the intermediate table.
The type of the target file is not limited in this embodiment, so that, for any type of target file, the table structure of the target table corresponding to the target file can be obtained through the ETL tool, the table structure of the intermediate table is determined based on the table structure of the target table, and the data of the target file is added to the intermediate table, which can solve the technical problem that in the related art, in order to support data verification on files of different types, a third party dependence for analyzing the files of corresponding types needs to be introduced for different file types.
It should be noted that, the table structure of the intermediate table is not identical to the table structure of the destination table, because there may be field attributes in the table structure of the destination table, such as a primary key attribute, a non-null attribute, an index constraint, etc., if there is a corresponding field attribute in the table structure of the intermediate table, all the data in the destination file cannot be added to the intermediate table, and thus all the data in the destination file cannot be checked.
In one possible implementation manner of this embodiment, the ETL tool may be used to create an intermediate table corresponding to the target file based on the table structure of the destination table corresponding to the target file, and extract the data of the target file to the intermediate table. Therefore, the data of the target file can be converged to the intermediate table by utilizing the ETL tool, corresponding third-party dependence is not required to be introduced for different file types, the file is not required to be analyzed to the memory, and the memory overflow risk in the file analysis process is effectively avoided.
And step 103, performing data verification based on the field data included in the intermediate table.
In this embodiment, after the data of the target file is added to the intermediate table, the data verification may be directly performed based on the field data included in the intermediate table, so that the data verification of the data of the target file may be converted into the data verification of the intermediate table, so that no corresponding third party dependency needs to be introduced for different file types, no file needs to be parsed into the memory, and the data verification of the target file data is completed while the use of the system memory is reduced and the third party dependency is reduced.
In one possible implementation manner of this embodiment, the field data included in the intermediate table may have a corresponding verification target, such as field data is not repeated, field data is not null, field data range is compliant, field data length is compliant, and the like, so that data verification may be performed on the field data included in the intermediate table based on the verification target.
According to the data verification method provided by the embodiment, the table structure based on the target file and the table structure of the target table corresponding to the target file is achieved, the table structure of the intermediate table is determined, and the data of the target file is added to the intermediate table, so that the data verification is performed based on the field data included in the intermediate table. Because the data of the target file can be added into the intermediate table, and the data verification is performed based on the field data included in the intermediate table, the corresponding third-party dependence is not required to be introduced for different file types, the file is not required to be analyzed to the memory, and the memory overflow risk in the file analysis process is effectively avoided.
As can be seen from the above analysis, in the embodiment of the present application, the table structure of the intermediate table may be determined based on the table structure of the destination table, and the data of the target file may be added to the intermediate table.
As shown in fig. 2, the data verification method includes the steps of:
step 201, obtaining a target file and a table structure of a destination table corresponding to the target file.
It should be noted that, the execution process of this step may be the same as the execution process of step 101 in the previous embodiment, and the description thereof is omitted herein.
Step 202, removing field attributes of each field for at least one field of field attributes in the table structure of the destination table, so as to obtain the table structure of the destination table without field attributes.
In this embodiment, there may be a field attribute for a field in the table structure of the destination table. For example, assume that there are five fields of field a, field B, field C, field D, and field E in the table structure of the destination table, so that the following exists: the field A has a field attribute that field data is not repeated, the field B has a field attribute that field data is not null, the field C has a field attribute, the field D has a field attribute that field data range is compliant, and the field E has a field attribute that field data length is compliant.
It can be understood that, because the field in the table structure of the destination table may have a field attribute, if the table structure of the destination table is directly used as the table structure of the intermediate table, the corresponding field in the table structure of the intermediate table also has a corresponding field attribute, so that 5 may result in that all data in the destination file cannot be added to the intermediate table, and thus all data in the destination file cannot be added to the intermediate table
The data is subjected to data verification. In order to avoid this, in this embodiment, the table structure of the destination table without the field attribute may be obtained by removing the field attribute of at least one field having the field attribute with respect to at least one field having the field attribute in the table structure of the destination table.
In step 203, the table structure of the destination table without the field attribute is determined as the table structure of the intermediate table.
0 in the present embodiment, the table structure of the destination table obtained without the field attribute may be determined as the table structure of the intermediate table,
at this time, the table structure of the intermediate table is the table structure of the destination table without the field attribute, so that all data in the target file can be added to the intermediate table, and the situation that all data in the target file cannot be added to the intermediate table is effectively avoided.
Step 204, taking any row of data in the target file as one record of the intermediate table, and adding any 5 columns of data in the target file into the corresponding field of the corresponding record of the intermediate table.
In this embodiment, after obtaining the table structure of the intermediate table, the data in the target file may be extracted into the intermediate table by the ETL tool. Specifically, any line of data in the target file can be used as one record of the intermediate table, and any line of data in the target file is added into a corresponding field of the corresponding record of the intermediate table, so that all data in the target file can be converged to the intermediate table.
0 as an example, assume that the target file T is a CSV type file of size 20G, and that in the target file T
At this time, if all data of the target file T are directly loaded into the memory, and then data verification is performed, the memory overflow is easy to occur. In order to avoid the situation, an intermediate table temp_t can be established by means of the automatic table establishment capability of the ETL tool, where the table structure of the intermediate table temp_t includes table fields such as field a, field B, field C, field D, etc., where field a corresponds to column a of the target file T; the field B corresponds to the column B of the target 5 file T; the field C corresponds to the column C of the target file T; the field D corresponds to the column D of the target file T, and so on,
that is, the table field of the intermediate table temp_t corresponds one-to-one to the header of the target file T. Then the ETL tool is used for extracting the data of the target file T into the intermediate table TEMP_T, specifically, each row of data in the target file T is used as one record in the intermediate table TEMP_T, each row of data is stored in the corresponding field of the corresponding record of the intermediate table TEMP_T,
that is, the first record of the intermediate table temp_t corresponds to the first line of data outside the header of the target file T, the data of the field a of the first record of the intermediate table temp_t0 corresponds to the column a of the first line of data outside the header of the target file T, the data of the field B of the first record of the intermediate table temp_t corresponds to the column B of the first line of data outside the header of the target file T, and so on, the data of the target file T may be stored in its entirety in the intermediate table temp_t.
Step 205, performing data verification based on field data included in the intermediate table.
It should be noted that, the execution process of this step may be the same as the execution process of step 103 in the previous embodiment, and the detailed description is omitted herein.
According to the data verification method provided by the embodiment, the field attribute of each field is removed by aiming at least one field with the field attribute in the table structure of the destination table, so that the table structure of the destination table without the field attribute is determined to be the table structure of the intermediate table, any row of data in the target file is further used as one record of the intermediate table, and any row of data in the target file is added to the corresponding field of the corresponding record of the intermediate table. Therefore, all data in the target file can be added to the intermediate table, and data verification is performed based on field data included in the intermediate table, so that corresponding third-party dependence is not required to be introduced for different file types, the file is not required to be analyzed to the memory, and the memory overflow risk in the file analysis process is effectively avoided.
As can be seen from the above analysis, in the embodiment of the present application, the data verification may be performed based on the field data included in the intermediate table, and in order to clearly explain how the data verification is performed based on the field data included in the intermediate table, another data verification method is provided in the embodiment, and fig. 3 is a schematic flow chart of another data verification method provided in the embodiment of the present application.
As shown in fig. 3, the data verification method includes the steps of:
step 301, obtaining a target file and a table structure of a destination table corresponding to the target file.
Step 302, determining a table structure of the intermediate table based on the table structure of the destination table, and adding the data of the target file to the intermediate table.
It should be noted that, the execution process of steps 301 to 302 may be the same as the execution process of steps 102 to 103 in the above embodiment, and the principle is the same and will not be repeated here.
Step 303, performing data verification on field data of the target field based on the verification target of the target field with respect to the target field of the intermediate table.
In this embodiment, the verification target of the target field is determined based on the field attribute of the target field in the table structure of the target table, and optionally, the verification target may include field data non-duplication, field data non-null, field data range compliance, field data length compliance, and the like.
In this embodiment, the data verification may be performed on the field data of the target field based on the verification target of the target field with respect to the target field of the intermediate table. Because only the target field to be checked in the intermediate table is required to be concerned during checking, reading and checking of all data in the intermediate table are not required, memory overflow risks are avoided, and checking efficiency is improved.
In one possible implementation manner of the present embodiment, the total number of records in the intermediate table may be queried first, and then, for the target field in the intermediate table, based on the verification target of the target field, the number of records that satisfy the verification target may be queried, so as to determine whether the target field passes the verification based on the total number of records and the number of records that satisfy the verification target.
According to the data verification method provided by the embodiment, the table structure based on the target file and the table structure of the target table corresponding to the target file is achieved, the table structure of the intermediate table is determined, and the data of the target file is added to the intermediate table, so that the data verification is performed on the field data of the target field based on the verification target of the target field aiming at the target field of the intermediate table. Because the data verification can be performed only on the target field of the intermediate table, all data in the intermediate table is not required to be read and verified, the risk of memory overflow is avoided, and the verification efficiency is improved.
In order to clearly explain how to perform data verification on field data of a target field based on a verification target of the target field for a target field in the middle table in the present application, another data verification method is provided in this embodiment, and fig. 4 is a schematic flow chart of another data verification method provided in the embodiment of the present application.
As shown in fig. 4, the data verification method includes the steps of:
step 401, obtaining a target file and a table structure of a destination table corresponding to the target file.
Step 402, based on the table structure of the destination table, determining the table structure of the intermediate table, and adding the data of the target file to the intermediate table.
It should be noted that the execution process of steps 401-402 may be the same as the execution process of steps 101-102 in the above embodiment, and the principle is not repeated here.
Step 403, using the data query language DQL, the total number of records of the intermediate table is queried.
In this embodiment, the DQL (Data Query Language data query language) can be used to query the total number of records in the intermediate table. For example, when the table name of the intermediate table is temp_t, select count (x) from temp_t may be performed; the statement queries the total number of all records under temp_t, thereby yielding the total number of records for intermediate table temp_t.
Step 404, writing a corresponding DQL for the target field of the intermediate table based on the verification target of the target field, and querying the number of records meeting the verification target.
In this embodiment, the verification target of the target field is determined based on the field attribute of the target field in the table structure of the target table, and optionally, the verification target may include field data non-duplication, field data non-null, field data range compliance, field data length compliance, and the like.
In this embodiment, for the target field of the intermediate table, based on the verification target of the target field, a corresponding DQL may be written to query the number of records that satisfy the verification target. For example, assuming that the table name of the intermediate table is temp_t, the check target of field a is that the field data is not repeated, so that from temp_t can be passed through select count (field a); the statement inquires the record quantity which corresponds to the field A in the intermediate table TEMP_T and meets the check target. Similarly, when the check target of field B of the intermediate table temp_t is that the field data is not null, the field B is not null may be checked by the select count (x) from temp_t window field B; inquiring the record quantity which corresponds to the field B in the intermediate table TEMP_T and meets the check target by the statement; when the check target of the field C of the intermediate table temp_t is the field data range compliance, the correct range 'may be obtained by selecting the from temp_t window field C in'; inquiring the record quantity which corresponds to the field C in the intermediate table TEMP_T and meets the check target by the statement; when the check target of the field D of the intermediate table temp_t is that the field data length is compliant, if the field data length requirement is less than N, the check target may be selected from temp_t window length (field D) < N; the statement queries the number of records in the intermediate table temp_t corresponding to field D that meet the verification target.
As only the record number which meets the verification target and corresponds to the target field to be verified in the intermediate table is required to be read, all data of each record are not required to be read, and the occurrence of memory overflow is avoided.
Step 405 compares the number of records meeting the verification target with the total number of records.
In this embodiment, after the total number of records in the intermediate table and the number of records in the target field that satisfy the verification target are obtained, the number of records in the target field that satisfy the verification target may be compared with the total number of records in the intermediate table to determine whether the target field passes the verification.
In step 406, in the case that the number of records satisfying the verification target is consistent with the total number of records, it is determined that the target field is verified.
It can be understood that when the number of records of the target field meeting the verification target is consistent with the total number of records of the intermediate table, it is indicated that all data of the target field meets the verification target, so that the verification of the target field can be determined under the condition that the number of records of the target field meeting the verification target is consistent with the total number of records of the intermediate table.
In step 407, in the case that the number of records meeting the verification target is inconsistent with the total number of records, it is determined that the target field verification is not passed.
It can be understood that when the number of records meeting the verification target in the target field is inconsistent with the total number of records in the intermediate table, it is indicated that there is data that does not meet the verification target in the target field, so that it can be determined that the verification of the target field is not passed in the case that the number of records meeting the verification target in the target field is inconsistent with the total number of records in the intermediate table. At this time, the difference between the number of records in the target field that satisfy the verification target and the total number of records in the intermediate table is the number of records in the target field that do not satisfy the verification target.
According to the data verification method provided by the embodiment, the total record number of the intermediate table is inquired by adopting the data inquiry language DQL, the corresponding DQL is compiled based on the verification target of the target field aiming at the target field of the intermediate table, and the record number meeting the verification target is inquired, so that the comparison between the record number meeting the verification target and the total record number is realized, and the verification passing of the target field is determined under the condition that the record number meeting the verification target is consistent with the total record number, or the verification failing of the target field is determined under the condition that the record number meeting the verification target is inconsistent with the total record number. Because only the total recorded data of the intermediate table and the recorded number which corresponds to the target field of the intermediate table and meets the verification target are required to be read during verification, all the recorded data of each record are not required to be read and verified, the risk of memory overflow is avoided, and the verification efficiency is improved.
It should be noted that, in the present application, after the target field of the intermediate table passes the verification, the verified target field may also be processed, and for clarity of explanation of this process, another data verification method is provided in this embodiment, and fig. 5 is a schematic flow chart of another data verification method provided in the embodiment of the present application.
As shown in fig. 5, the data verification method includes the steps of:
step 501, obtaining a target file and a table structure of a destination table corresponding to the target file.
Step 502, determining a table structure of an intermediate table based on the table structure of the destination table, and adding data of the target file to the intermediate table.
Step 503, performing data verification based on the field data included in the intermediate table.
It should be noted that the execution process of steps 501-502 may be the same as the execution process of steps 101-103 in the above embodiment, and the principle is not repeated here.
In step 504, in case the target field passes the verification, the target field is added to the destination table.
In this embodiment, the field data of the target field may be added to the destination table in the case that the target field is checked to pass, and thus, collection of the checked data to the destination table may be achieved without collection of the data that does not pass the check.
According to the data verification method provided by the embodiment, the table structure based on the target file and the table structure of the target table corresponding to the target file is achieved, the table structure of the intermediate table is determined, the data of the target file is added to the intermediate table, data verification is performed based on field data included in the intermediate table, and then the target field is added to the target table under the condition that the target field verification passes. Therefore, under the condition that the verification of the target field is passed, the field data of the target field can be gathered to the target table, and the collection of the data is completed.
For clarity of illustration of the above embodiments, examples are now presented.
Fig. 6 is a flow chart of a data verification method under a scenario provided in an embodiment of the present application.
As shown in fig. 6, first, the target file T and the table structure of the destination table corresponding to the target file T are acquired, so that the intermediate table temp_t corresponding to the target file T is created from the table structure of the destination table corresponding to the target file T. Optionally, the table structure of the destination table corresponding to the target file T may be obtained by the ETL tool, and for at least one field of the field attributes in the table structure of the destination table, the field attributes of these fields are removed to create the intermediate table temp_t corresponding to the target file T. The number of fields, the field names and the field notes of the intermediate table temp_t are corresponding and consistent with the number of fields, the field names and the field notes of the destination table.
After the automatic table establishment of the intermediate table temp_t is completed, the data total of the target file T may be extracted to the intermediate table temp_t by the ETL tool. Alternatively, each line of data in the target file T may be taken as one record of the intermediate table temp_t, and each line of data in the target file T may be taken as a field value of the record corresponding to the intermediate table temp_t, where the field description of the intermediate table temp_t is consistent with the header of the column corresponding to the target file T.
After the data of the target file T is extracted to the intermediate table temp_t in its full size, the data in the intermediate table temp_t may be subjected to data verification. Optionally, the total number N of records in the intermediate table temp_t may be queried first, and then, based on different verification targets, the corresponding DQL may be written and executed to obtain an execution result, that is, the number M of records in the intermediate table, which corresponds to each field and satisfies the verification target, and then, the numbers of the number M of records in the intermediate table, which corresponds to each field and satisfies the verification target, and the total number N of records may be compared, and whether each field in the intermediate table passes the verification may be determined based on whether the numbers of M and N are consistent, specifically, if the numbers of M and N are consistent, the field is indicated to pass the data verification; if the values of M and N are inconsistent, the field fails to pass the data verification, and at the moment, the difference value of M and N is the number of records in the field which do not meet the verification target. Wherein the verification target is determined based on field attributes of corresponding fields in the table structure of the destination table, optionally, the verification target may include field data non-duplication, field data non-null, field data range compliance, field data length compliance, and the like.
After the fields pass the data verification, the field data of the fields can be converged to a destination table through an ETL tool, so that the data collection is realized.
As an example, assume that the target file T is a CSV type file with a size of 20G, and the target file T includes a plurality of columns such as column a, column B, column C, and column D, at this time, if all data of the target file T is directly loaded into the memory, and then data verification is performed, a memory overflow situation easily occurs. In order to avoid the situation, an intermediate table temp_t can be established by means of the automatic table establishment capability of the ETL tool, where the table structure of the intermediate table temp_t includes table fields such as field a, field B, field C, field D, etc., where field a corresponds to column a of the target file T; the field B corresponds to a column B of the target file T; the field C corresponds to the column C of the target file T; the field D corresponds to the column D of the target file T, and so on, that is, the table field of the intermediate table temp_t corresponds one-to-one to the header of the target file T. And extracting the data of the target file T into the intermediate table TEMP_T by means of the ETL tool, specifically, taking each row of data in the target file T as one record in the intermediate table TEMP_T, storing each row of data in a corresponding field of the corresponding record of the intermediate table TEMP_T, namely, a first record of the intermediate table TEMP_T, corresponding to the first row of data outside the table head of the target file T, the data of the field A of the first record of the intermediate table TEMP_T corresponds to the column A of the first row of data outside the table head of the target file T, the data of the field B of the first record of the intermediate table TEMP_T corresponds to the column B of the first row of data outside the table head of the target file T, and so on, so as to store the data of the target file T in the intermediate table TEMP_T in full.
Then, the total record number N of the intermediate table temp_t can be queried by executing the DQL (select count) from temp_t, and the corresponding DQL can be written and executed based on the verification targets of the fields in the intermediate table, so as to obtain the execution result of each field, that is, the record number M corresponding to each field and meeting the verification targets. Assuming that the check targets of the fields in the intermediate table temp_t are as shown in fig. 7, the check target of the field a is that the field data is not repeated, the check target of the field B is that the field data is not null, the check target of the field C is that the field data range is compliant, the check target of the field D is that the field data length is compliant (the field data length requirement is less than N), and the rest fields have no check targets, then the select count (field a)) from temp_t may be written and executed; the statement obtains the execution result of the field A, namely the record number which corresponds to the field A and meets the verification target; writing and executing a select count (x) from temp_t sphere field B is not null; the statement obtains the execution result of the field B, namely the number of records which meet the verification target and correspond to the field B; writing and executing a correct range of data of a select count (x) from temp_t sphere field C in'; the statement obtains the execution result of the field C, namely the number of records which meet the verification target and correspond to the field C; writing, executing a select count (x) from temp_t window length (field D) < N; and obtaining an execution result of the field D by the statement, namely the number of records which meet the verification target and correspond to the field D.
Further explaining, since the verification target of the field a is that the field data is not repeated, the data of the field a of any record in the intermediate table temp_t is required to be unable to be repeated with the data of the field a of other records, and select count (field a) is written and executed from temp_t; the statement may query the record number M of the intermediate table temp_t after the data of the field a of all records is deduplicated, where M is the record number of the field a where the data is not duplicated, that is, the data number of the target file T where the column a is not duplicated, so that if the difference between M and the total record number N is not 0, it may be considered that the column a in the target file T includes duplicated data. Similarly, since the check target of the field B is that the field data is not null, the data of the field B requiring any record in the intermediate table temp_t cannot be null, and the select count (x) from temp_t sphere field B is not null is written and executed; the statement may query the number M of records in the intermediate table temp_t where the data in the field B is not empty, where M is the number of records in the field B where the data is not empty, that is, the number of data in the target file T where the column B is not empty, so that if the difference between M and the total number N of records is not 0, it may be considered that the column B in the target file T contains empty data. Because the verification target of the field C is the field data range compliance, the data range of the field C of any record in the intermediate table TEMP_T is required to be compliance, and the correct data range' of the field C in the selection count (x) from TEMP_T window is written and executed; the statement may query the number M of records in the intermediate table temp_t, where M is the number of data in the target file T in which the data range of the column C is compliant, so that if the difference between M and the total number N of records is not 0, it may be considered that the column C in the target file T contains data in which the data range is not compliant. Since the verification target of the field D is that the field data length is compliant (the field data length is required to be less than N), the data length of the field D of any record in the intermediate table temp_t is required to be less than N, and the selection count (x) from temp_t window length (field D) < N; the statement may query the record number M of the intermediate table temp_t, where the data length of the field D is smaller than N, and at this time M is the number of data of the target file T, where the data length of the column D is smaller than N, so if the difference between M and the total record number N is not 0, it may be considered that the column D in the target file T includes data of which the data length is not smaller than N.
It may then be determined whether the data of each field in the intermediate table passes the data check by comparing the values of M and N. Specifically, if the values of M and N are consistent, the field data of the field can be converged to the destination table by indicating that the field passes the data verification; if the values of M and N are inconsistent, the field data of the field cannot be converged to the target table if the field fails the data verification, and at the moment, the difference value of M and N is the number of records in the field which do not meet the verification target.
In summary, the ETL tool gathers the data of the target file into the intermediate table, and verifies the data of the intermediate table, so that corresponding third-party dependence is not required to be introduced for different file types, the file is not required to be analyzed to the memory, and the memory overflow risk in the file analysis process is effectively avoided. Meanwhile, the total recorded data of the intermediate table is only required to be read during verification, and the corresponding DQL is written and executed based on different verification targets to obtain the number of records meeting the verification targets corresponding to each field in the intermediate table, and whether each field in the intermediate table passes the verification is determined by judging whether the number of records meeting the verification targets corresponding to each field in the intermediate table is consistent with the total number of records in the intermediate table, so that all data of each record in the intermediate table is not required to be read and verified, and the verification efficiency is improved. And finally, gathering the data passing the verification from the intermediate table to the target table, so as to realize the collection of the data.
In order to achieve the above embodiment, the present application further provides a data verification device.
Fig. 8 is a schematic structural diagram of a data verification device according to an embodiment of the present application.
As shown in fig. 8, the data verification apparatus includes: an acquisition module 81, a processing module 82 and a verification module 83.
The acquiring module 81 is configured to acquire a target file and a table structure of a destination table corresponding to the target file;
a processing module 82, configured to determine a table structure of the intermediate table based on the table structure of the destination table, and add the data of the target file to the intermediate table;
and a verification module 83, configured to perform data verification based on field data included in the intermediate table.
Further, in one possible implementation of the embodiment of the present application, the processing module 82 includes:
the removing unit is used for removing the field attribute of each field aiming at least one field with the field attribute in the table structure of the destination table so as to obtain the table structure of the destination table without the field attribute;
and a determining unit for determining the table structure of the destination table without the field attribute as the table structure of the intermediate table.
Further, in a possible implementation manner of the embodiment of the present application, the determining unit is further configured to:
And taking any row of data in the target file as one record of the intermediate table, and adding any row of data in the target file into a corresponding field of a corresponding record of the intermediate table.
Further, in one possible implementation manner of the embodiment of the present application, the verification module 83 includes:
and the data verification unit is used for verifying the data of the field data of the target field based on the verification target of the target field aiming at the target field of the intermediate table.
Further, in a possible implementation manner of the embodiment of the present application, the data verification unit is further configured to:
inquiring the total record number of the intermediate table, and inquiring the record number meeting the verification target based on the verification target of the target field aiming at the target field of the intermediate table;
based on the total number of records and the number of records meeting the verification target, determining whether the target field passes the verification.
Further, in a possible implementation manner of the embodiment of the present application, the data verification unit is further configured to:
querying the total record number of the intermediate table by adopting a data query language DQL;
aiming at the target field of the intermediate table, writing a corresponding DQL based on a verification target of the target field, and inquiring the number of records meeting the verification target.
Further, in a possible implementation manner of the embodiment of the present application, the data verification unit is further configured to:
comparing the number of records meeting the verification target with the total number of records;
under the condition that the number of records meeting the verification target is consistent with the total number of records, determining that the verification of the target field is passed;
and under the condition that the number of records meeting the verification target is inconsistent with the total number of records, determining that the verification of the target field is not passed.
Further, in one possible implementation manner of the embodiment of the present application, the verification target of the target field is determined based on the field attribute of the target field in the table structure of the target table, and the verification target includes field non-repetition, field non-null, field data range compliance, and field data length compliance.
It should be noted that the foregoing explanation of the embodiment of the data verification method is also applicable to the data verification device of this embodiment, and will not be repeated herein.
Based on the foregoing embodiments, the embodiments of the present application further provide a possible implementation manner of the data verification device, and fig. 9 is a schematic structural diagram of another data verification device provided in the embodiments of the present application, where on the basis of the foregoing embodiments, the data verification device further includes: the module 84 is added.
The adding module 84 is configured to add the target field to the destination table if the target field passes the check.
According to the data verification device provided by the embodiment, the table structure based on the target file and the table structure of the target table corresponding to the target file is realized, the table structure of the intermediate table is determined, and the data of the target file is added to the intermediate table, so that the data verification is performed based on the field data included in the intermediate table. Because the data of the target file can be added into the intermediate table, and the data verification is performed based on the field data included in the intermediate table, the corresponding third-party dependence is not required to be introduced for different file types, the file is not required to be analyzed to the memory, and the memory overflow risk in the file analysis process is effectively avoided.
In order to achieve the above embodiments, the present application further proposes an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data verification method according to any one of the embodiments of the present application.
Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application. It should be noted that the electronic device shown in fig. 10 is only an example, and should not impose any limitation on the functions and application scope of the embodiments of the present application.
As shown in fig. 10, the electronic device may include: the device comprises a shell 11, a processor 12, a memory 13, a circuit board 14 and a power circuit 15, wherein the circuit board 14 is arranged in a space surrounded by the shell 11, and the processor 12 and the memory 13 are arranged on the circuit board 14; a power supply circuit 15 for supplying power to the respective circuits or devices of the above-described electronic apparatus; the memory 13 is used for storing executable program codes; the processor 12 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 13, for executing the data verification method according to any of the above embodiments of the present application.
The specific implementation of the above steps by the processor 12 and the further implementation of the steps by the processor 12 through the execution of executable program codes may be referred to in the embodiments of fig. 1-7 of the present application, and will not be described herein.
In order to implement the above embodiments, the present application further proposes a computer-readable storage medium storing computer instructions for causing a computer to execute the data verification method according to any one of the above embodiments of the present application.
To achieve the above embodiments, the present application further proposes a computer program product comprising a computer program which, when executed by a processor, implements a data verification method according to any of the above embodiments of the present application.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" is at least two, such as two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device), a portable computer disk cartridge (magnetic device), a Random Access Memory (RAM),
Read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber devices, and portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be printable thereon
Paper or other suitable medium for the program, as the program can be electronically obtained, for example by optically scanning the paper or other medium, then 5 editing, interpreting or otherwise processing if necessary, and then
Stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or executed by a suitable instruction execution system stored in memory
Firmware. If implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following technologies 0, as known in the art: dissociation with logic gate circuit for implementing logic function on data signal
A bulk logic circuit, an application specific integrated circuit with suitable combinational logic gates, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), etc.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the methods of the embodiments described above may be implemented
To be implemented in hardware, the program may be stored on a computer readable storage medium, or the program may be executed, comprising one or a combination of the steps of the method embodiments.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. If the integrated module is used for
The software functional modules may also be stored on a computer-readable 0-th storage medium when implemented and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (13)

1. A method of data verification, comprising:
acquiring a target file and a table structure of a target table corresponding to the target file;
determining a table structure of an intermediate table based on the table structure of the destination table, and adding the data of the target file to the intermediate table;
and performing data verification based on the field data included in the intermediate table.
2. The method of claim 1, wherein the determining a table structure of an intermediate table based on the table structure of the destination table comprises:
removing the field attribute of each field aiming at least one field with field attribute in the table structure of the destination table to obtain the table structure of the destination table without the field attribute;
and determining the table structure of the destination table without the field attribute as the table structure of an intermediate table.
3. The method of claim 1, wherein adding the data of the target file to the intermediate table comprises:
and taking any row of data in the target file as one record of the intermediate table, and adding any row of data in the target file into a corresponding field of a corresponding record of the intermediate table.
4. The method of claim 1, wherein the performing data verification based on field data included in the intermediate table comprises:
and aiming at a target field of the intermediate table, performing data verification on field data of the target field based on a verification target of the target field.
5. The method of claim 4, wherein the performing data verification on the field data of the target field for the target field of the intermediate table based on the verification target of the target field comprises:
querying the total record number of the intermediate table, and querying the record number meeting a verification target based on the verification target of a target field of the intermediate table aiming at the target field of the intermediate table;
and determining whether the target field passes verification or not based on the total record number and the record number meeting the verification target.
6. The method of claim 5, wherein the querying the total number of records of the intermediate table and for a target field of the intermediate table based on a verification target of the target field, querying the number of records that satisfy the verification target comprises:
Querying the total record number of the intermediate table by adopting a data query language DQL;
and aiming at a target field of the intermediate table, writing a corresponding DQL based on a verification target of the target field, and inquiring the number of records meeting the verification target.
7. The method of claim 5, wherein the determining whether the target field passes a check based on the total number of records and the number of records meeting the check target comprises:
comparing the number of records meeting the verification target with the total number of records;
determining that the target field passes verification under the condition that the number of records meeting the verification target is consistent with the total number of records;
and under the condition that the number of records meeting the verification target is inconsistent with the total number of records, determining that the target field is not verified.
8. The method according to any one of claims 4-7, further comprising:
and adding the target field to the destination table in the case that the target field passes the verification.
9. The method of any of claims 1-7, wherein the verification objective of the target field is determined based on a field attribute of the target field in a table structure of the target table, the verification objective comprising field data non-duplication, field data non-null, field data range compliance, field data length compliance.
10. A data verification apparatus, comprising:
the acquisition module is used for acquiring a target file and a table structure of a target table corresponding to the target file;
the processing module is used for determining the table structure of an intermediate table based on the table structure of the target table and adding the data of the target file to the intermediate table;
and the verification module is used for carrying out data verification based on the field data included in the intermediate table.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
12. A computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-9.
CN202211655794.6A 2022-12-22 2022-12-22 Data verification method, device, electronic equipment and storage medium Pending CN116226142A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211655794.6A CN116226142A (en) 2022-12-22 2022-12-22 Data verification method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211655794.6A CN116226142A (en) 2022-12-22 2022-12-22 Data verification method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116226142A true CN116226142A (en) 2023-06-06

Family

ID=86581484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211655794.6A Pending CN116226142A (en) 2022-12-22 2022-12-22 Data verification method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116226142A (en)

Similar Documents

Publication Publication Date Title
US10706054B2 (en) Method and device for searching indexes for sensor tag data
US8418053B2 (en) Division program, combination program and information processing method
US9632916B2 (en) Method and apparatus to semantically connect independent build and test processes
US8122008B2 (en) Joining tables in multiple heterogeneous distributed databases
CN107122368B (en) Data verification method and device and electronic equipment
US9110967B2 (en) Data lineage in data warehousing environments
CN110781231B (en) Database-based batch import method, device, equipment and storage medium
US7386566B2 (en) External metadata processing
CN107992492B (en) Data block storage method, data block reading method, data block storage device, data block reading device and block chain
CN103218365A (en) SS Table file data processing method and system
CN101388018A (en) Computer aided design document management method
CN111045994B (en) File classification retrieval method and system based on KV database
CN105808451A (en) Data caching method and related apparatus
CN109710626B (en) Data warehousing management method and device, electronic equipment and storage medium
CN116226142A (en) Data verification method, device, electronic equipment and storage medium
KR101440475B1 (en) Method for creating index for mixed query process, method for processing mixed query, and recording media for recording index data structure
CN117236304A (en) Method for realizing Excel general import based on template configuration
CN115455059A (en) Method, device and related medium for analyzing user behavior based on underlying data
CN114816247A (en) Logic data acquisition method and device
CN114564501A (en) Database data storage and query methods, devices, equipment and medium
JP6217440B2 (en) Symbolic execution program, symbolic execution method, and symbolic execution device
KR101737575B1 (en) Method and device for verifying data based on sql sentences generated automatically
US20200012739A1 (en) Method, apparatus, and computer-readable medium for data transformation pipeline optimization
CN104050052A (en) Error correction code seeding
JP5546909B2 (en) Data processing system, method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination