CN112417841B - Data verification method - Google Patents

Data verification method Download PDF

Info

Publication number
CN112417841B
CN112417841B CN202011311055.6A CN202011311055A CN112417841B CN 112417841 B CN112417841 B CN 112417841B CN 202011311055 A CN202011311055 A CN 202011311055A CN 112417841 B CN112417841 B CN 112417841B
Authority
CN
China
Prior art keywords
data
expected
data acquisition
calculating
acquisition object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011311055.6A
Other languages
Chinese (zh)
Other versions
CN112417841A (en
Inventor
仇越
张庆晓
张东文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Original Assignee
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chaozhou Zhuoshu Big Data Industry Development Co Ltd filed Critical Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority to CN202011311055.6A priority Critical patent/CN112417841B/en
Publication of CN112417841A publication Critical patent/CN112417841A/en
Application granted granted Critical
Publication of CN112417841B publication Critical patent/CN112417841B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of data acquisition, and particularly provides a data verification method, which comprises the following steps: s1, constructing a result table of the same data acquisition object in a similar manner; s2, setting expected data quantity, calculating the data quantity in the current period, and comparing and calculating with the expected data quantity; s3, checking the data quantity of the current data acquisition object for nearly three times, and comparing the current data quantity with the historical data quantity to calculate a gap; s4, selecting important fields in a data table of the current data acquisition object, setting an expected null rate, and calculating the null rate; s5, comparing the null rate of the important fields in the twice-collected data table. Compared with the prior art, the data verification method is beneficial to ensuring that the quality of the acquisition result of each data acquisition object is basically consistent, can be basically consistent with the expected result, and provides good data quality guarantee for subsequent data mining.

Description

Data verification method
Technical Field
The invention relates to the field of data acquisition, and particularly provides a data verification method.
Background
With the advent of the big data age, the impact of data has gradually expanded, and more enterprises need valuable data to support their own business. The establishment of a data warehouse is a long-term process, and the same data acquisition object probably needs long-term continuous acquisition, but how to ensure the consistency of the quality and the expectation of each data acquisition is an important target of data acquisition.
Disclosure of Invention
The invention aims at the defects of the prior art and provides a data verification method with strong practicability.
The technical scheme adopted for solving the technical problems is as follows:
a data verification method, comprising the steps of:
s1, constructing a result table of the same data acquisition object in a similar manner;
s2, setting expected data quantity, calculating the data quantity in the current period, and comparing and calculating with the expected data quantity;
s3, checking the data quantity of the current data acquisition object for nearly three times, and comparing the current data quantity with the historical data quantity to calculate a gap;
s4, selecting important fields in a data table of the current data acquisition object, setting an expected null rate, and calculating the null rate;
s5, comparing the null rate of the important fields in the twice-collected data table.
Further, in step S1, the result table of the same data collection object is built with a fixed table structure and a fixed table name prefix, and time is used as a table name suffix, so that the result table is different except for the table time suffix, and other table basic information is the same.
Preferably, the results tables acquired at different times of the same data acquisition object must be identical in structure.
Further, in step S2, the latest data amount of the data acquisition object is counted and a difference is made between the latest data amount and the expected amount, a difference rate is calculated, then whether the latest data amount is within the range of the expected difference rate is checked, and if not, the data acquisition object is subjected to the complementary acquisition processing.
Further, in step S3, the data amount of the last three collection result tables of the data collection object is calculated and compared with the data amount of the latest collection result table;
alternatively, the total four data volumes are plotted as a line graph to view volatility;
if the gap is too large or the volatility is not within expectations, the data acquisition object is analyzed and processed again.
Further, in step S4, selecting an important field in the acquisition result table of the current data acquisition object, calculating the duty ratio of the number of records with empty selected field content to the total data, namely the null value rate, and comparing with the expected null value rate to see whether the number of records is within the range of the expected null value rate;
if the data is not in the range, searching whether the structuring failure problem occurs in the data acquisition process.
Records in the results table for each acquisition regarding the amount of data and the null rate of key fields must exist and be queriable.
Preferably, records in the results table for each acquisition must exist and be queriable with respect to the amount of data and the null rate of the key fields.
Further, in step S5, on the premise of step S4, the null value rate of the same field in the same result table of the same data acquisition object is checked, and a difference value is calculated with the null value rate in the present period, if the difference value is not in the expected range, the data acquisition personnel needs to process in time.
Compared with the prior art, the data verification method has the following outstanding beneficial effects:
the invention is beneficial to ensuring the quality of the acquisition result of each data acquisition object to be basically consistent with the expected result, and provides good data quality guarantee for the subsequent data mining.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data verification method.
Description of the embodiments
In order to provide a better understanding of the aspects of the present invention, the present invention will be described in further detail with reference to specific embodiments. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A preferred embodiment is given below:
as shown in fig. 1, a data verification method in this embodiment includes the following steps:
s1, constructing a result table of the same data acquisition object in a similar manner:
and establishing a result table of the same data acquisition object in a fixed table structure, and establishing a fixed table name prefix and taking time as a table name suffix as an auxiliary, so that the basic information of other tables except for the table time suffix is the same in each result table. And precondition guarantee is provided for subsequent verification.
The result tables acquired by the same data acquisition object in different periods must be identical in structure.
S2, setting expected data volume, calculating the current data volume, and comparing the current data volume with the expected volume to calculate:
and counting the latest data quantity of the data acquisition object, calculating a difference value with the expected quantity, checking whether the difference value is within the range of the expected difference value, and if not, carrying out processing such as compensation acquisition on the data acquisition object.
S3, checking the data quantity of the current data acquisition object for nearly three times, and comparing the current data quantity with the historical data quantity to calculate a gap:
calculating and comparing the difference value between the data volume of the last three acquisition result tables of the data acquisition object and the data volume of the latest acquisition result table; or a total of four data volumes are plotted as a line graph to see volatility. If the gap is too large or the volatility is not within the expectations, the data acquisition object needs to be analyzed again for processing.
S4, selecting important fields in a data table of the current data acquisition object, setting an expected null rate, and calculating the null rate:
and selecting an important field in a collection result table of the current data collection object, calculating the duty ratio of the number of records with empty selected field content to the total data, namely the null value rate, comparing the duty ratio with the expected null value rate, and checking whether the duty ratio is in the range of the expected null value rate. If not, a search is needed to determine if a structuring failure problem occurs during data acquisition.
S5, comparing the null value rate of important fields in the twice-collected data table:
and on the premise of S4, checking the null value rate of the same field in the same result table of the same data acquisition object, calculating a difference value with the null value rate at the present period, and if the difference value is not in the expected range, timely processing by data acquisition personnel.
The above-mentioned specific embodiments are merely specific examples of the present invention, and the scope of the present invention is not limited to the specific embodiments, and any suitable changes or substitutions made by those skilled in the art, which conform to the technical solutions described in the claims of the present invention, should fall within the scope of the present invention.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (1)

1. A data verification method, comprising the steps of:
s1, constructing a data table of the same data acquisition object in a similar manner;
the data table of the same data acquisition object is established by a fixed table structure and a fixed table name prefix, and time is taken as a table name suffix, so that the data table has the same basic information except for the table time suffix each time;
the data tables acquired by the same data acquisition object in different periods must have the same structure;
s2, setting expected data quantity, calculating the data quantity in the current period, and comparing and calculating with the expected data quantity;
counting the data quantity of a data table which is acquired by a data acquisition object at the present time, making a difference value between the present data quantity and an expected quantity, calculating a difference value rate, checking whether the difference value rate is within the range of the expected difference value rate, and if not, carrying out supplementary acquisition processing on the data acquisition object;
s3, checking the data quantity of the current data acquisition object for three times in the past, and comparing the current data quantity with the historical data quantity to calculate a gap;
calculating and comparing the data volume of the data collection table of the data collection object for the past three times with the data volume of the data collection table at present;
alternatively, the total four data volumes are plotted as a line graph to view volatility;
if the gap is too large or the volatility is not within the expectations, analyzing and processing the data acquisition object again;
s4, selecting important fields in a data table of the current data acquisition object, setting an expected null rate, and calculating the null rate;
selecting an important field in a collection data table of a current data collection object, calculating the duty ratio of the number of records with empty selected field content to the total data, namely the null value rate, comparing the duty ratio with the expected null value rate, and checking whether the duty ratio is in the range of the expected null value rate;
if the data is not in the range, searching whether the structural failure problem occurs in the data acquisition process;
s5, comparing the null rate of important fields in the twice-collected data table;
on the premise of step S4, checking the null value rate of the same field in the same data table of the same data acquisition object, calculating a difference value with the current null value rate, and if the difference value is not in the expected range, timely processing by data acquisition personnel.
CN202011311055.6A 2020-11-20 2020-11-20 Data verification method Active CN112417841B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011311055.6A CN112417841B (en) 2020-11-20 2020-11-20 Data verification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011311055.6A CN112417841B (en) 2020-11-20 2020-11-20 Data verification method

Publications (2)

Publication Number Publication Date
CN112417841A CN112417841A (en) 2021-02-26
CN112417841B true CN112417841B (en) 2023-09-05

Family

ID=74778158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011311055.6A Active CN112417841B (en) 2020-11-20 2020-11-20 Data verification method

Country Status (1)

Country Link
CN (1) CN112417841B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908725A (en) * 2017-11-14 2018-04-13 中国银行股份有限公司 A kind of batch data method of calibration, device and system
CN110019566A (en) * 2019-03-13 2019-07-16 平安信托有限责任公司 Data checking, device, computer equipment and storage medium based on data warehouse
CN111597088A (en) * 2020-05-15 2020-08-28 广州探途网络技术有限公司 Data warehouse data monitoring method, warehouse system and electronic equipment
CN111858646A (en) * 2020-07-21 2020-10-30 国网浙江省电力有限公司营销服务中心 Method and system for checking quality data format of electric energy meter

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908725A (en) * 2017-11-14 2018-04-13 中国银行股份有限公司 A kind of batch data method of calibration, device and system
CN110019566A (en) * 2019-03-13 2019-07-16 平安信托有限责任公司 Data checking, device, computer equipment and storage medium based on data warehouse
CN111597088A (en) * 2020-05-15 2020-08-28 广州探途网络技术有限公司 Data warehouse data monitoring method, warehouse system and electronic equipment
CN111858646A (en) * 2020-07-21 2020-10-30 国网浙江省电力有限公司营销服务中心 Method and system for checking quality data format of electric energy meter

Also Published As

Publication number Publication date
CN112417841A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
US20120278323A1 (en) Joining Tables in a Mapreduce Procedure
US20090106308A1 (en) Complexity estimation of data objects
EP3082051A1 (en) Data mining method
CN111177134B (en) Data quality analysis method, device, terminal and medium suitable for mass data
CN108415990B (en) Data quality monitoring method and device, computer equipment and storage medium
KR20150010694A (en) Data partitioning method and apparatus
CN114461644A (en) Data acquisition method and device, electronic equipment and storage medium
CN114880405A (en) Data lake-based data processing method and system
CN112417841B (en) Data verification method
CN114049016A (en) Index similarity judgment method, system, terminal device and computer storage medium
CN117611008A (en) Data quality evaluation method and device
CN116611914A (en) Salary prediction method and device based on grouping statistics
CN113806336B (en) Data quality assessment method and system
CN116578612A (en) Lithium battery finished product detection data asset construction method
CN105573984A (en) Socio-economic indicator identification method and device
CN115344755A (en) Data constraint condition recommendation method and system in data standard
CN112614005B (en) Method and device for processing reworking state of enterprise
CN108062395A (en) A kind of track traffic big data analysis method and system
CN108988340B (en) Method and device for reducing line loss and server
CN110990401A (en) Hotel searching method and system
CN111666286A (en) Method and device for detecting sub-warehouse and sub-table, computer equipment and storage medium
CN112734261B (en) Power distribution network operation index sequence association analysis method and system
CN118277372B (en) Electric power customer data cleaning and managing method
CN111695083B (en) Detection method and detection equipment
CN116860890A (en) Method and device for constructing database through contact way

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant