CN114168584A - Data quality inspection method and system - Google Patents

Data quality inspection method and system Download PDF

Info

Publication number
CN114168584A
CN114168584A CN202111538017.9A CN202111538017A CN114168584A CN 114168584 A CN114168584 A CN 114168584A CN 202111538017 A CN202111538017 A CN 202111538017A CN 114168584 A CN114168584 A CN 114168584A
Authority
CN
China
Prior art keywords
data
quality inspection
inspection
rules
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111538017.9A
Other languages
Chinese (zh)
Inventor
包世界
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Jurassic Technology Development Co ltd
Original Assignee
Wuhan Jurassic Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Jurassic Technology Development Co ltd filed Critical Wuhan Jurassic Technology Development Co ltd
Priority to CN202111538017.9A priority Critical patent/CN114168584A/en
Publication of CN114168584A publication Critical patent/CN114168584A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Mining

Abstract

The invention discloses a data quality inspection method and a data quality inspection system, which relate to the technical field of petroleum data quality inspection and mainly solve the problems of wrong decision, potential risk and the like possibly caused by data with poor quality; the data quality inspection method comprises the steps of establishing a task, adding rules to a data set through management of a quality inspection base so as to decide a quality inspection standard for data quality inspection, obtaining the rules, a target data set, data of the data set and whether inspection data accord with selected rules through establishing the quality inspection task, obtaining an inspection data result, and comparing the inspection data with the data of the quality base through a comparison module so as to obtain error data, so that the condition that no difference exists between the data which should be collected and the data which are actually collected in the data can be effectively ensured. The expected consistency of the data type and the collected data ensures that the collected data is correct, and the error data can be effectively and quickly processed and repaired.

Description

Data quality inspection method and system
Technical Field
The invention relates to the technical field of petroleum data quality inspection, in particular to a data quality inspection method and a data quality inspection system.
Background
With the increase of data volume, technologies related to big data are becoming mature, and include a series of links such as data acquisition, data storage, data transmission, data processing, and data mining. Implementation cost is reduced due to technology alternation, more and more enterprises begin to utilize data innovation services to provide data services, and the original service drives gradual transition type data to drive service growth.
Because the petroleum data and the business are in strict corresponding relation, the development of any business can generate certain data, and meanwhile, any data is the record and description of the business.
The development of petroleum business is developed according to certain business specifications and management rules, the development of any business is designed in advance, correspondingly, the generation of any business data is also designed in advance, and from the viewpoint, the content of all petroleum data is required, and the requirement on the content is designed in advance. From a certain angle, the data is the shadow of the business, and is accompanied with the business generation and application.
While poor quality data can expose an organization to risk. It can lead to wrong decisions, unsatisfactory customers, unsatisfactory data users, fines due to non-compliance, hidden costs (rework), bad reputation, unsatisfactory staff and lack of interoperability.
Disclosure of Invention
The present invention is directed to a data quality inspection method and system, which solve the problems set forth in the background art.
In order to achieve the purpose, the invention provides the following technical scheme:
a data quality inspection method, comprising:
s1, managing a quality inspection library;
s2, managing quality inspection tasks;
s21, establishing a task, namely establishing a quality inspection task through the established data set, wherein when the data set is selected, the quality inspection rule comes from two parts;
the first part is from rules defined at the time of data set definition, there are six rules: the method comprises the following steps of (1) performing single non-null inspection on an achievement field value, performing uniqueness inspection on the achievement field value, performing annex constraint inspection on the achievement field, performing inspection on the achievement field value field data format and performing source inspection on the achievement field;
the second part is from established data set rules in the quality inspection library, and there are four rules: the method comprises the following steps of (1) result field record integrity checking, result field business logic checking, result source unique checking and result field value combination non-null checking;
s22, executing tasks including identifying rules in the task, identifying a data set of the task rules, obtaining data of the data set, checking whether the data accord with specific rules, recording the checking result into a quality inspection result if the data are normal, and recording the checking result into a quality inspection record if the data are abnormal;
s23, quality control record management;
s24, data repair management, which is to examine the data set to check the data, the type of data error and identify the error data, to repair the error data to meet the defined rule;
s3, data quality statistics, wherein the statistics of all task executions are performed, the result quality problems are summarized, and error data are checked;
and S4, the data recovery log is generated by checking the state of the data after recovery, checking the comparison before and after modification.
As a further scheme of the invention: the management of the quality inspection library adds rules to the data set, and the rules comprise the following four rules: the method comprises the following steps of result field record integrity checking, result field business logic checking, result source unique checking and result field value combination non-null checking.
As a still further scheme of the invention: the data set adding rule has two adding modes, wherein the first adding mode is result field integrity check, and whether the number of data records meets the specified number obtained by referring to a calculation formula set by a data set field is checked; the second addition mode is the result field business logic check, which checks whether the value of the data field meets the value obtained by the calculation formula, and the correctness check of the business logic of the result structured field value.
As a still further scheme of the invention: the quality inspection record management comprises:
the quality inspection reporting function can acquire the result of executing the task, count the problem summary of the quality inspection rules and the problem summary of the data set, and check the problem data detail of the data set, so that a client can know which data have problems and which quality inspection rules are carried out on the data;
the task detail function can specifically define which data sets and tasks, collect data set problems and check the statistical quantity of abnormal data of the data sets;
and the execution log function is used for displaying the data set executed by the task and displaying the problem data quantity of the data set.
A data quality inspection system comprising the data quality inspection method comprises a data set rule adding module, a data acquisition module, a quality inspection module and a comparison module;
the data set rule adding module is used for setting the inspection rule of the data set;
the data acquisition module captures data by adopting a random adoption mode;
the quality inspection module inspects the data randomly acquired by the data acquisition module and obtains inspection data;
and the comparison module is used for comparing the inspection data obtained by the quality inspection module with the quality database data to obtain error data, and repairing the error data to generate a data repair log.
Compared with the prior art, the invention has the beneficial effects that:
the method and the system have the advantages that the rules are added to the data set through the management of the quality inspection base, so that the quality inspection standard is determined for the data quality inspection, the rules and the target data set in the tasks are obtained through establishing the quality inspection task, the data of the data set is obtained, whether the inspection data meet the selected rules or not is detected, the inspection data result is obtained, the inspection data and the data of the quality base are compared through the comparison module, so that the error data is obtained, the labor input and the process intervention are greatly reduced, the efficiency is improved, and the error is reduced.
Drawings
Fig. 1 is a flow chart of a data quality inspection method.
Fig. 2 is a schematic diagram of a data quality inspection system.
Detailed Description
In the description of the present invention, it is to be understood that the terms "longitudinal," "lateral," "upper," "lower," "top," "bottom," "inner," "outer," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the present invention and simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated in a particular manner, and thus are not to be construed as limiting the present invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The technical solution of the present patent will be described in further detail with reference to the following embodiments.
As shown in fig. 1-2, a data quality inspection method includes:
s1, managing a quality inspection library;
s2, managing quality inspection tasks;
s21, establishing a task, namely establishing a quality inspection task through the established data set, wherein when the data set is selected, the quality inspection rule comes from two parts;
the first part is from rules defined at the time of data set definition, there are six rules: the method comprises the following steps of (1) performing single non-null inspection on an achievement field value, performing uniqueness inspection on the achievement field value, performing annex constraint inspection on the achievement field, performing inspection on the achievement field value field data format and performing source inspection on the achievement field;
the second part is from established data set rules in the quality inspection library, and there are four rules: the method comprises the following steps of (1) result field record integrity checking, result field business logic checking, result source unique checking and result field value combination non-null checking;
s22, executing tasks including identifying rules in the task, identifying a data set of the task rules, obtaining data of the data set, checking whether the data accord with specific rules, recording the checking result into a quality inspection result if the data are normal, and recording the checking result into a quality inspection record if the data are abnormal;
s23, quality control record management;
s24, data repair management, which is to examine the data set to check the data, the type of data error and identify the error data, to repair the error data to meet the defined rule;
s3, data quality statistics, wherein the statistics of all task executions are performed, the result quality problems are summarized, and error data are checked;
and S4, the data recovery log is generated by checking the state of the data after recovery, checking the comparison before and after modification.
Specifically, the quality inspection rule is determined through the quality inspection library, the quality inspection standard is set, the quality inspection task is established after the data setting of the quality inspection library is completed, and the quality inspection rule of the data set is selected through one of ten rules.
And selecting a data set, collecting the obtained quality inspection rules for inspection, and judging and post-processing inspection results.
As a further scheme of the invention: the management of the quality inspection library adds rules to the data set, and the rules comprise the following four rules: the method comprises the following steps of result field record integrity checking, result field business logic checking, result source unique checking and result field value combination non-null checking.
Specifically, the adding rule of the data set comprises the integrity of field records, field business logic, whether the source is unique or not and the check of field value combination.
As a still further scheme of the invention: the data set adding rule has two adding modes, wherein the first adding mode is result field integrity check, and whether the number of data records meets the specified number obtained by referring to a calculation formula set by a data set field is checked; the second addition mode is the result field business logic check, which checks whether the value of the data field meets the value obtained by the calculation formula, and the correctness check of the business logic of the result structured field value.
As a still further scheme of the invention: the quality inspection record management comprises:
the quality inspection reporting function can acquire the result of executing the task, count the problem summary of the quality inspection rules and the problem summary of the data set, and check the problem data detail of the data set, so that a client can know which data have problems and which quality inspection rules are carried out on the data;
the task detail function can specifically define which data sets and tasks, collect data set problems and check the statistical quantity of abnormal data of the data sets;
and the execution log function is used for displaying the data set executed by the task and displaying the problem data quantity of the data set.
A data quality inspection system comprising the data quality inspection method comprises a data set rule adding module, a data acquisition module, a quality inspection module and a comparison module;
the data set rule adding module is used for setting the inspection rule of the data set;
the data acquisition module captures data by adopting a random adoption mode;
the quality inspection module inspects the data randomly acquired by the data acquisition module and obtains inspection data;
and the comparison module is used for comparing the inspection data obtained by the quality inspection module with the quality database data to obtain error data, and repairing the error data to generate a data repair log.
Specifically, the data set rule adding module establishes a rule for the data set, the rule is used as a subsequent quality inspection standard, data acquisition of the data set is randomized, acquisition frequency and number are reduced, and inspection speed is increased.
The working principle of the embodiment of the invention is as follows:
as shown in fig. 1-2, the invention determines a quality inspection standard for data quality inspection by adding a rule to a data set through management of a quality inspection library, obtains a rule in the task, a target data set, data of the data set and whether the inspection data meets a selected rule by establishing a quality inspection task, obtains an inspection data result, and compares the inspection data with the data of the quality library through a comparison module to obtain error data, thereby effectively ensuring that no difference exists between the data which should be collected and the data which are actually collected in the data. The expected consistency of the data type and the collected data ensures that the collected data is correct, and the error data can be effectively and quickly processed and repaired. The input of very big reduction manpower and process intervention promote efficiency, reduce the error.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (5)

1. A data quality inspection method, comprising:
s1, managing a quality inspection library;
s2, managing quality inspection tasks;
s21, establishing a task, namely establishing a quality inspection task through the established data set, wherein the quality inspection rule is from two parts;
the first part is from rules defined at the time of data set definition, there are six rules: the method comprises the following steps of (1) performing single non-null inspection on an achievement field value, performing uniqueness inspection on the achievement field value, performing annex constraint inspection on the achievement field, performing inspection on the achievement field value field data format and performing source inspection on the achievement field;
the second part is from established data set rules in the quality inspection library, and there are four rules: the method comprises the following steps of (1) result field record integrity checking, result field business logic checking, result source unique checking and result field value combination non-null checking;
s22, executing tasks including identifying rules in the task, identifying a data set of the task rules, obtaining data of the data set, checking whether the data accord with specific rules, recording the checking result into a quality inspection result if the data are normal, and recording the checking result into a quality inspection record if the data are abnormal;
s23, quality control record management;
s24, data repair management, which is to examine the data set to check the data, the type of data error and identify the error data, to repair the error data to meet the defined rule;
s3, data quality statistics, wherein the statistics of all task executions are performed, the result quality problems are summarized, and error data are checked;
and S4, the data recovery log is generated by checking the state of the data after recovery, checking the comparison before and after modification.
2. The data quality inspection method according to claim 1, wherein the management of the quality inspection library adds rules to the data set, including the following four rules: the method comprises the following steps of result field record integrity checking, result field business logic checking, result source unique checking and result field value combination non-null checking.
3. The data quality inspection method according to claim 1, wherein the data set addition rule has two addition modes, the first addition mode is a result field integrity check, and whether the number of data records meets a specified number of the calculation formula set by referring to the data set field; the second addition mode is the result field business logic check, which checks whether the value of the data field meets the value obtained by the calculation formula, and the correctness check of the business logic of the result structured field value.
4. The data quality inspection method according to claim 1, wherein the quality inspection record management includes:
the quality inspection reporting function can acquire the result of executing the task, count the problem summary of the quality inspection rules and the problem summary of the data set, and check the problem data detail of the data set, so that a client can know which data have problems and which quality inspection rules are carried out on the data;
the task detail function can specifically define which data sets and tasks, collect data set problems and check the statistical quantity of abnormal data of the data sets;
and the execution log function is used for displaying the data set executed by the task and displaying the problem data quantity of the data set.
5. A data quality inspection system comprising the data quality inspection method of any one of claims 1 to 4, comprising a data set rule adding module, a data acquisition module, a quality inspection module, and a comparison module;
the data set rule adding module is used for setting the inspection rule of the data set;
the data acquisition module captures data by adopting a random adoption mode;
the quality inspection module inspects the data randomly acquired by the data acquisition module and obtains inspection data;
and the comparison module is used for comparing the inspection data obtained by the quality inspection module with the quality database data to obtain error data, and repairing the error data to generate a data repair log.
CN202111538017.9A 2021-12-15 2021-12-15 Data quality inspection method and system Pending CN114168584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111538017.9A CN114168584A (en) 2021-12-15 2021-12-15 Data quality inspection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111538017.9A CN114168584A (en) 2021-12-15 2021-12-15 Data quality inspection method and system

Publications (1)

Publication Number Publication Date
CN114168584A true CN114168584A (en) 2022-03-11

Family

ID=80486889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111538017.9A Pending CN114168584A (en) 2021-12-15 2021-12-15 Data quality inspection method and system

Country Status (1)

Country Link
CN (1) CN114168584A (en)

Similar Documents

Publication Publication Date Title
US11829365B2 (en) Systems and methods for data quality monitoring
US7757125B2 (en) Defect resolution methodology and data defects quality/risk metric model extension
CN110088744B (en) Database maintenance method and system
CN112036704B (en) Power equipment fault management system
CN104364664A (en) An algorithm and structure for creation, definition, and execution of an SPC rule decision tree
CN114281877A (en) Data management system and method
CN109933533B (en) Visual data testing method, device and equipment and readable storage medium
CN113806343B (en) Evaluation method and system for Internet of vehicles data quality
US8560105B2 (en) Automated logistics support system incorporating a product integrity analysis system
CN114168584A (en) Data quality inspection method and system
CN114500178B (en) Self-operation intelligent Internet of things gateway
CN109359748A (en) Management method, terminal and the computer readable storage medium of Maintenance plan
CN112579699A (en) Quality monitoring method, system and storage medium for service data processing link
CN112579352A (en) Quality monitoring result generation method, storage medium and quality monitoring system of service data processing link
CN113986586A (en) Switch accompanying equipment health degree management system and management method
CN113469559A (en) Quality bit design and display method and system based on data quality inspection
CN113361949A (en) Performance management system based on big data analysis
CN110703183A (en) Intelligent electric energy meter fault data analysis method and system
CN113037550B (en) Service fault monitoring method, system and computer readable storage medium
CN111209131B (en) Method and system for determining faults of heterogeneous system based on machine learning
CN117311295B (en) Production quality improving method and system based on wireless network equipment
CN117076327A (en) Automatic interface detection and repair method and system
CN116991724A (en) Interface testing method and device based on monitoring log, electronic equipment and storage medium
CN115525465A (en) Fault point prediction method and system based on multiple failure analysis
CN118035061A (en) Deep learning-based server fault prediction and automatic processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination