CN112417841B - Data verification method - Google Patents
Data verification method Download PDFInfo
- Publication number
- CN112417841B CN112417841B CN202011311055.6A CN202011311055A CN112417841B CN 112417841 B CN112417841 B CN 112417841B CN 202011311055 A CN202011311055 A CN 202011311055A CN 112417841 B CN112417841 B CN 112417841B
- Authority
- CN
- China
- Prior art keywords
- data
- expected
- data acquisition
- calculating
- acquisition object
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/177—Editing, e.g. inserting or deleting of tables; using ruled lines
- G06F40/18—Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of data acquisition, and particularly provides a data verification method, which comprises the following steps: s1, constructing a result table of the same data acquisition object in a similar manner; s2, setting expected data quantity, calculating the data quantity in the current period, and comparing and calculating with the expected data quantity; s3, checking the data quantity of the current data acquisition object for nearly three times, and comparing the current data quantity with the historical data quantity to calculate a gap; s4, selecting important fields in a data table of the current data acquisition object, setting an expected null rate, and calculating the null rate; s5, comparing the null rate of the important fields in the twice-collected data table. Compared with the prior art, the data verification method is beneficial to ensuring that the quality of the acquisition result of each data acquisition object is basically consistent, can be basically consistent with the expected result, and provides good data quality guarantee for subsequent data mining.
Description
Technical Field
The invention relates to the field of data acquisition, and particularly provides a data verification method.
Background
With the advent of the big data age, the impact of data has gradually expanded, and more enterprises need valuable data to support their own business. The establishment of a data warehouse is a long-term process, and the same data acquisition object probably needs long-term continuous acquisition, but how to ensure the consistency of the quality and the expectation of each data acquisition is an important target of data acquisition.
Disclosure of Invention
The invention aims at the defects of the prior art and provides a data verification method with strong practicability.
The technical scheme adopted for solving the technical problems is as follows:
a data verification method, comprising the steps of:
s1, constructing a result table of the same data acquisition object in a similar manner;
s2, setting expected data quantity, calculating the data quantity in the current period, and comparing and calculating with the expected data quantity;
s3, checking the data quantity of the current data acquisition object for nearly three times, and comparing the current data quantity with the historical data quantity to calculate a gap;
s4, selecting important fields in a data table of the current data acquisition object, setting an expected null rate, and calculating the null rate;
s5, comparing the null rate of the important fields in the twice-collected data table.
Further, in step S1, the result table of the same data collection object is built with a fixed table structure and a fixed table name prefix, and time is used as a table name suffix, so that the result table is different except for the table time suffix, and other table basic information is the same.
Preferably, the results tables acquired at different times of the same data acquisition object must be identical in structure.
Further, in step S2, the latest data amount of the data acquisition object is counted and a difference is made between the latest data amount and the expected amount, a difference rate is calculated, then whether the latest data amount is within the range of the expected difference rate is checked, and if not, the data acquisition object is subjected to the complementary acquisition processing.
Further, in step S3, the data amount of the last three collection result tables of the data collection object is calculated and compared with the data amount of the latest collection result table;
alternatively, the total four data volumes are plotted as a line graph to view volatility;
if the gap is too large or the volatility is not within expectations, the data acquisition object is analyzed and processed again.
Further, in step S4, selecting an important field in the acquisition result table of the current data acquisition object, calculating the duty ratio of the number of records with empty selected field content to the total data, namely the null value rate, and comparing with the expected null value rate to see whether the number of records is within the range of the expected null value rate;
if the data is not in the range, searching whether the structuring failure problem occurs in the data acquisition process.
Records in the results table for each acquisition regarding the amount of data and the null rate of key fields must exist and be queriable.
Preferably, records in the results table for each acquisition must exist and be queriable with respect to the amount of data and the null rate of the key fields.
Further, in step S5, on the premise of step S4, the null value rate of the same field in the same result table of the same data acquisition object is checked, and a difference value is calculated with the null value rate in the present period, if the difference value is not in the expected range, the data acquisition personnel needs to process in time.
Compared with the prior art, the data verification method has the following outstanding beneficial effects:
the invention is beneficial to ensuring the quality of the acquisition result of each data acquisition object to be basically consistent with the expected result, and provides good data quality guarantee for the subsequent data mining.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data verification method.
Description of the embodiments
In order to provide a better understanding of the aspects of the present invention, the present invention will be described in further detail with reference to specific embodiments. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A preferred embodiment is given below:
as shown in fig. 1, a data verification method in this embodiment includes the following steps:
s1, constructing a result table of the same data acquisition object in a similar manner:
and establishing a result table of the same data acquisition object in a fixed table structure, and establishing a fixed table name prefix and taking time as a table name suffix as an auxiliary, so that the basic information of other tables except for the table time suffix is the same in each result table. And precondition guarantee is provided for subsequent verification.
The result tables acquired by the same data acquisition object in different periods must be identical in structure.
S2, setting expected data volume, calculating the current data volume, and comparing the current data volume with the expected volume to calculate:
and counting the latest data quantity of the data acquisition object, calculating a difference value with the expected quantity, checking whether the difference value is within the range of the expected difference value, and if not, carrying out processing such as compensation acquisition on the data acquisition object.
S3, checking the data quantity of the current data acquisition object for nearly three times, and comparing the current data quantity with the historical data quantity to calculate a gap:
calculating and comparing the difference value between the data volume of the last three acquisition result tables of the data acquisition object and the data volume of the latest acquisition result table; or a total of four data volumes are plotted as a line graph to see volatility. If the gap is too large or the volatility is not within the expectations, the data acquisition object needs to be analyzed again for processing.
S4, selecting important fields in a data table of the current data acquisition object, setting an expected null rate, and calculating the null rate:
and selecting an important field in a collection result table of the current data collection object, calculating the duty ratio of the number of records with empty selected field content to the total data, namely the null value rate, comparing the duty ratio with the expected null value rate, and checking whether the duty ratio is in the range of the expected null value rate. If not, a search is needed to determine if a structuring failure problem occurs during data acquisition.
S5, comparing the null value rate of important fields in the twice-collected data table:
and on the premise of S4, checking the null value rate of the same field in the same result table of the same data acquisition object, calculating a difference value with the null value rate at the present period, and if the difference value is not in the expected range, timely processing by data acquisition personnel.
The above-mentioned specific embodiments are merely specific examples of the present invention, and the scope of the present invention is not limited to the specific embodiments, and any suitable changes or substitutions made by those skilled in the art, which conform to the technical solutions described in the claims of the present invention, should fall within the scope of the present invention.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (1)
1. A data verification method, comprising the steps of:
s1, constructing a data table of the same data acquisition object in a similar manner;
the data table of the same data acquisition object is established by a fixed table structure and a fixed table name prefix, and time is taken as a table name suffix, so that the data table has the same basic information except for the table time suffix each time;
the data tables acquired by the same data acquisition object in different periods must have the same structure;
s2, setting expected data quantity, calculating the data quantity in the current period, and comparing and calculating with the expected data quantity;
counting the data quantity of a data table which is acquired by a data acquisition object at the present time, making a difference value between the present data quantity and an expected quantity, calculating a difference value rate, checking whether the difference value rate is within the range of the expected difference value rate, and if not, carrying out supplementary acquisition processing on the data acquisition object;
s3, checking the data quantity of the current data acquisition object for three times in the past, and comparing the current data quantity with the historical data quantity to calculate a gap;
calculating and comparing the data volume of the data collection table of the data collection object for the past three times with the data volume of the data collection table at present;
alternatively, the total four data volumes are plotted as a line graph to view volatility;
if the gap is too large or the volatility is not within the expectations, analyzing and processing the data acquisition object again;
s4, selecting important fields in a data table of the current data acquisition object, setting an expected null rate, and calculating the null rate;
selecting an important field in a collection data table of a current data collection object, calculating the duty ratio of the number of records with empty selected field content to the total data, namely the null value rate, comparing the duty ratio with the expected null value rate, and checking whether the duty ratio is in the range of the expected null value rate;
if the data is not in the range, searching whether the structural failure problem occurs in the data acquisition process;
s5, comparing the null rate of important fields in the twice-collected data table;
on the premise of step S4, checking the null value rate of the same field in the same data table of the same data acquisition object, calculating a difference value with the current null value rate, and if the difference value is not in the expected range, timely processing by data acquisition personnel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011311055.6A CN112417841B (en) | 2020-11-20 | 2020-11-20 | Data verification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011311055.6A CN112417841B (en) | 2020-11-20 | 2020-11-20 | Data verification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112417841A CN112417841A (en) | 2021-02-26 |
CN112417841B true CN112417841B (en) | 2023-09-05 |
Family
ID=74778158
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011311055.6A Active CN112417841B (en) | 2020-11-20 | 2020-11-20 | Data verification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112417841B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107908725A (en) * | 2017-11-14 | 2018-04-13 | 中国银行股份有限公司 | A kind of batch data method of calibration, device and system |
CN110019566A (en) * | 2019-03-13 | 2019-07-16 | 平安信托有限责任公司 | Data checking, device, computer equipment and storage medium based on data warehouse |
CN111597088A (en) * | 2020-05-15 | 2020-08-28 | 广州探途网络技术有限公司 | Data warehouse data monitoring method, warehouse system and electronic equipment |
CN111858646A (en) * | 2020-07-21 | 2020-10-30 | 国网浙江省电力有限公司营销服务中心 | Method and system for checking quality data format of electric energy meter |
-
2020
- 2020-11-20 CN CN202011311055.6A patent/CN112417841B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107908725A (en) * | 2017-11-14 | 2018-04-13 | 中国银行股份有限公司 | A kind of batch data method of calibration, device and system |
CN110019566A (en) * | 2019-03-13 | 2019-07-16 | 平安信托有限责任公司 | Data checking, device, computer equipment and storage medium based on data warehouse |
CN111597088A (en) * | 2020-05-15 | 2020-08-28 | 广州探途网络技术有限公司 | Data warehouse data monitoring method, warehouse system and electronic equipment |
CN111858646A (en) * | 2020-07-21 | 2020-10-30 | 国网浙江省电力有限公司营销服务中心 | Method and system for checking quality data format of electric energy meter |
Also Published As
Publication number | Publication date |
---|---|
CN112417841A (en) | 2021-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20120278323A1 (en) | Joining Tables in a Mapreduce Procedure | |
US20090106308A1 (en) | Complexity estimation of data objects | |
EP3082051A1 (en) | Data mining method | |
CN111177134B (en) | Data quality analysis method, device, terminal and medium suitable for mass data | |
CN108415990B (en) | Data quality monitoring method and device, computer equipment and storage medium | |
KR20150010694A (en) | Data partitioning method and apparatus | |
CN114461644A (en) | Data acquisition method and device, electronic equipment and storage medium | |
CN114880405A (en) | Data lake-based data processing method and system | |
CN112417841B (en) | Data verification method | |
CN114049016A (en) | Index similarity judgment method, system, terminal device and computer storage medium | |
CN117611008A (en) | Data quality evaluation method and device | |
CN116611914A (en) | Salary prediction method and device based on grouping statistics | |
CN113806336B (en) | Data quality assessment method and system | |
CN116578612A (en) | Lithium battery finished product detection data asset construction method | |
CN105573984A (en) | Socio-economic indicator identification method and device | |
CN115344755A (en) | Data constraint condition recommendation method and system in data standard | |
CN112614005B (en) | Method and device for processing reworking state of enterprise | |
CN108062395A (en) | A kind of track traffic big data analysis method and system | |
CN108988340B (en) | Method and device for reducing line loss and server | |
CN110990401A (en) | Hotel searching method and system | |
CN111666286A (en) | Method and device for detecting sub-warehouse and sub-table, computer equipment and storage medium | |
CN112734261B (en) | Power distribution network operation index sequence association analysis method and system | |
CN118277372B (en) | Electric power customer data cleaning and managing method | |
CN111695083B (en) | Detection method and detection equipment | |
CN116860890A (en) | Method and device for constructing database through contact way |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |