CN108334636A - Data Quality Assessment Methodology - Google Patents

Data Quality Assessment Methodology Download PDF

Info

Publication number
CN108334636A
CN108334636A CN201810173861.8A CN201810173861A CN108334636A CN 108334636 A CN108334636 A CN 108334636A CN 201810173861 A CN201810173861 A CN 201810173861A CN 108334636 A CN108334636 A CN 108334636A
Authority
CN
China
Prior art keywords
data
rule
dimensions
data quality
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810173861.8A
Other languages
Chinese (zh)
Inventor
唐雪飞
吴云东
汪林川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHENGDU COMSYS INFORMATION TECHNOLOGY Co Ltd
Original Assignee
CHENGDU COMSYS INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHENGDU COMSYS INFORMATION TECHNOLOGY Co Ltd filed Critical CHENGDU COMSYS INFORMATION TECHNOLOGY Co Ltd
Priority to CN201810173861.8A priority Critical patent/CN108334636A/en
Publication of CN108334636A publication Critical patent/CN108334636A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of Data Quality Assessment Methodologies, are related to data quality management field.Setting in five dimensions for embodying the quality of data is each accounted for comparing by the present invention, and for each dimension configuration assessment rule, after configuration favorable comment estimates rule, rule is run one by one, regular score is obtained, to obtain the score value of each dimension, adduction obtains mass fraction, so that the form that data quality accessment passes through score is concisely shown, increases user experience.

Description

Data Quality Assessment Methodology
Technical field
The invention belongs to data quality management fields, and in particular to a kind of Data Quality Assessment Methodology.
Background technology
With the reach of science, sorts of systems application is more and more complicated, and data to be treated are also more and more, at data Reason is during the construction use that various systems are applied in occupation of considerable position.
And can often ignore the importance of the quality of data during system Construction, do not take enough measures to data Quality is effectively detected, and gradually deeply applying with system and data is caused, and data quality problem is little by little exposed, Such as validity, accuracy, the consistency etc. of data.The worst result be exactly user's sensory system and data be it is incredible, It finally abandons using system, also just loses the meaning of construction system in this way.
Invention content
To solve the above-mentioned problems, the present invention provides a kind of data quality accessment sides that can be assessed the quality of data Method.
A kind of Data Quality Assessment Methodology, includes the following steps:
Step 1, database is imported;
Step 2, the data in garbled data library;
Step 3, configuration assessment rule;
Step 4, the assessment rule of running configuration is assessed, and obtains mass fraction.
Further, the step 3 includes following below scheme:
Step 31, five dimension accountings of the quality of data are determined;
Step 32, the rule of five dimensions is configured.
Further, in the step 31, five dimensions are the completeness of data, consistency, promptness, and validity is complete Five dimensions of whole property.
Further, the step 32 includes following:
Step 321, selection needs the rule assessed in five dimensions;
Step 322, the weight and threshold value of each rule are configured.
Further, the step 4 includes following below scheme:
Step 41, each rule is run one by one, obtains the score value of each dimension;
Step 42, the score value of five dimensions sums it up to obtain mass fraction.
Beneficial effects of the present invention:Setting in five dimensions for embodying the quality of data is each accounted for comparing by the present invention, and needle Assessment rule is configured to each dimension after configuration favorable comment estimates rule to run rule one by one, obtain regular score, to The score value of each dimension is obtained, adduction obtains mass fraction, the form that data quality accessment passes through score is made concisely to open up Reveal and, increases user experience.
Description of the drawings
Fig. 1 is the flow chart of the present invention.
Fig. 2 is the flow chart of step 3 in Fig. 1.
Fig. 3 is the flow chart of step 32 in Fig. 2.
Fig. 4 is the flow chart of step 4 in Fig. 1.
Specific implementation mode
The embodiment of the present invention is described further below in conjunction with the accompanying drawings.
Referring to Fig. 1, the present invention provides a kind of Data Quality Assessment Methodology, detailed process is as follows:
Step 1, database is imported.
In the present embodiment, access needs to carry out the database of data quality accessment, and quality is carried out to the data in database Assessment.
Step 2, the data in garbled data library.
In the present embodiment, according to five dimensions, especially where which data has an impact the quality of data in analytical database A little tables have an impact the quality of data, and preferential choose influences the quality of data big data progress data quality accessment.
Step 3, configuration assessment rule.
Referring to Fig. 2, step 3 is realized by following below scheme:
Step 31, five dimension accountings of the quality of data are determined.
In the present embodiment, five dimensions are respectively validity, consistency, promptness, completeness and integrality.Determine this five Accounting of a dimension in assessment rule.
Wherein, validity is mainly reflected in the content and quantity of data, such as the length of data, digit and field association Deng;
Consistency be mainly reflected in the history number of data compare, threshold fluctuations etc.;
Promptness is mainly reflected in update, addition and renewal frequency of data etc.;
Completeness is mainly reflected in the inspection of data filling rate and uniqueness inspection;
Integrality is mainly reflected in the data value consistency across table, the data type consistency across table and the association of parent table sublist Property etc..
Step 32, the rule of five dimensions is configured.
Referring to Fig. 3, step 32 is realized by following below scheme:
Step 321, selection needs the rule assessed in five dimensions.
In the present embodiment, the assessment rule of five dimensions is configured.I.e. in each dimension, selection is to the quality of data The specific rules assessed assess the quality of data of data by the rule configured.
Step 322, the weight and threshold value of each rule are configured.
In the present embodiment, to the rule selected in each dimension, its weight and threshold value in assessment rule is configured, composition is commented Estimate rule.
Step 4, the assessment rule of running configuration is assessed, and obtains mass fraction.
Referring to Fig. 4, step 4 is realized by following below scheme:
Step 41, each rule is run one by one, obtains the score value of each dimension.
In the present embodiment, the rule in the good assessment rule of running configuration, assesses the quality of data, obtains every one by one The score of a rule.The primary rule of often operation assesses the quality of data, all obtains corresponding score.
Work as P=THR, R=90%*W.
As P > THR or P < THR, then
If obtainingThen value is 1.
If obtainingThen value is 0.
Wherein, R is regular score, and P is actual ratio, and THR is rule threshold, and W is regular weight.
The each single item rule for running each dimension, obtains the score of each single item rule, further according to regular score, weight, dimension Degree accounting etc. obtains the score value of each dimension.
Step 42, the score value of five dimensions sums it up to obtain mass fraction.
In the present embodiment, the score value of each dimension is added to obtain total mass fraction, mass fraction is the quality of data It embodies, the score value the high, and it is better to assess.
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair Bright principle, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.This field Those of ordinary skill can make according to the technical disclosures disclosed by the invention various does not depart from the other each of essence of the invention The specific variations and combinations of kind, these variations and combinations are still within the scope of the present invention.

Claims (5)

1. a kind of Data Quality Assessment Methodology, which is characterized in that include the following steps:
Step 1, database is imported;
Step 2, the data in garbled data library;
Step 3, configuration assessment rule;
Step 4, the assessment rule of running configuration is assessed, and obtains mass fraction.
2. Data Quality Assessment Methodology as described in claim 1, which is characterized in that the step 3 includes following below scheme:
Step 31, five dimension accountings of the quality of data are determined;
Step 32, the rule of five dimensions is configured.
3. Data Quality Assessment Methodology as claimed in claim 2, which is characterized in that in the step 31, five dimensions are number According to completeness, consistency, promptness, validity, five dimensions of integrality.
4. Data Quality Assessment Methodology as claimed in claim 2, which is characterized in that the step 32 includes following below scheme:
Step 321, selection needs the rule assessed in five dimensions;
Step 322, the weight and threshold value of each rule are configured.
5. Data Quality Assessment Methodology as claimed in claim 4, which is characterized in that the step 4 includes following below scheme:
Step 41, each rule is run one by one, obtains the score value of each dimension;
Step 42, the score value of five dimensions sums it up to obtain mass fraction.
CN201810173861.8A 2018-03-02 2018-03-02 Data Quality Assessment Methodology Pending CN108334636A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810173861.8A CN108334636A (en) 2018-03-02 2018-03-02 Data Quality Assessment Methodology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810173861.8A CN108334636A (en) 2018-03-02 2018-03-02 Data Quality Assessment Methodology

Publications (1)

Publication Number Publication Date
CN108334636A true CN108334636A (en) 2018-07-27

Family

ID=62930194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810173861.8A Pending CN108334636A (en) 2018-03-02 2018-03-02 Data Quality Assessment Methodology

Country Status (1)

Country Link
CN (1) CN108334636A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657991A (en) * 2018-12-21 2019-04-19 江苏满运软件科技有限公司 Metadata quality appraisal procedure, device, electronic equipment, storage medium
CN112506904A (en) * 2020-12-02 2021-03-16 深圳市酷开网络科技股份有限公司 Data quality evaluation method and device, terminal equipment and storage medium
CN114742417A (en) * 2022-04-15 2022-07-12 北京科杰科技有限公司 Data quality evaluation method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105741196A (en) * 2016-03-01 2016-07-06 万达信息股份有限公司 Four-dimension-based data quality monitoring and evaluating method
CN106503206A (en) * 2016-10-26 2017-03-15 国家电网公司 A kind of general data quality appraisal procedure based on entropy assessment
CN107545043A (en) * 2017-08-09 2018-01-05 国政通科技股份有限公司 A kind of data application method and device based on data quality checking
CN107679751A (en) * 2017-09-30 2018-02-09 山东浪潮云服务信息科技有限公司 A kind of appraisal procedure and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105741196A (en) * 2016-03-01 2016-07-06 万达信息股份有限公司 Four-dimension-based data quality monitoring and evaluating method
CN106503206A (en) * 2016-10-26 2017-03-15 国家电网公司 A kind of general data quality appraisal procedure based on entropy assessment
CN107545043A (en) * 2017-08-09 2018-01-05 国政通科技股份有限公司 A kind of data application method and device based on data quality checking
CN107679751A (en) * 2017-09-30 2018-02-09 山东浪潮云服务信息科技有限公司 A kind of appraisal procedure and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657991A (en) * 2018-12-21 2019-04-19 江苏满运软件科技有限公司 Metadata quality appraisal procedure, device, electronic equipment, storage medium
CN112506904A (en) * 2020-12-02 2021-03-16 深圳市酷开网络科技股份有限公司 Data quality evaluation method and device, terminal equipment and storage medium
CN112506904B (en) * 2020-12-02 2024-05-07 深圳市酷开网络科技股份有限公司 Data quality evaluation method, device, terminal equipment and storage medium
CN114742417A (en) * 2022-04-15 2022-07-12 北京科杰科技有限公司 Data quality evaluation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108334636A (en) Data Quality Assessment Methodology
CN107491813A (en) A kind of long-tail group recommending method based on multiple-objection optimization
Diaz et al. Nondeterministic networking methods
Cajueiro et al. The rescaled variance statistic and the determination of the Hurst exponent
Fernandez-Viagas et al. A new set of high-performing heuristics to minimise flowtime in permutation flowshops
CN103309894B (en) Based on search implementation method and the system of user property
CN106919373A (en) A kind of program code method for evaluating quality
CN108573041A (en) Probability matrix based on weighting trusting relationship decomposes recommendation method
CN104750760A (en) Application software recommending method and device
CN105243455A (en) Grid distribution network planning evaluation method
Sidenius Double barrier options: valuation by path counting
CN107133835A (en) A kind of method and device for analyzing commercial quality
De Marco et al. Shapes of implied volatility with positive mass at zero
CN104731773A (en) Text sentiment analysis method and text sentiment analysis system
Ibrahim et al. Pricing extendible options using the fast Fourier transform
Carr et al. Simulating Bermudan interest rate derivatives
CN105718581A (en) Interest point recommendation algorithm with space attenuation function introduced
Wei Ruin probability in the presence of interest earnings and tax payments
De Marco et al. Shapes of implied volatility with positive mass at zero
CN106570003A (en) Data pushing method and apparatus
CN105630807B (en) Method and device for analyzing incidence relation between unknown road and known road
Grable et al. The Sharpe Ratio and Negative Excess Returns: The Problem and Solution.
CN108280766A (en) Trading activity Risk Identification Method and device
Kim The effects of ICT on CO 2 emissions along with economic growth, trade openness and financial development in Korea
CN105279676A (en) Method for gaining surrender part of profits through sharing amounts or points of consumption by multiple persons

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180727

RJ01 Rejection of invention patent application after publication