CN108334636A - Data Quality Assessment Methodology - Google Patents
Data Quality Assessment Methodology Download PDFInfo
- Publication number
- CN108334636A CN108334636A CN201810173861.8A CN201810173861A CN108334636A CN 108334636 A CN108334636 A CN 108334636A CN 201810173861 A CN201810173861 A CN 201810173861A CN 108334636 A CN108334636 A CN 108334636A
- Authority
- CN
- China
- Prior art keywords
- data
- rule
- dimensions
- data quality
- quality
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of Data Quality Assessment Methodologies, are related to data quality management field.Setting in five dimensions for embodying the quality of data is each accounted for comparing by the present invention, and for each dimension configuration assessment rule, after configuration favorable comment estimates rule, rule is run one by one, regular score is obtained, to obtain the score value of each dimension, adduction obtains mass fraction, so that the form that data quality accessment passes through score is concisely shown, increases user experience.
Description
Technical field
The invention belongs to data quality management fields, and in particular to a kind of Data Quality Assessment Methodology.
Background technology
With the reach of science, sorts of systems application is more and more complicated, and data to be treated are also more and more, at data
Reason is during the construction use that various systems are applied in occupation of considerable position.
And can often ignore the importance of the quality of data during system Construction, do not take enough measures to data
Quality is effectively detected, and gradually deeply applying with system and data is caused, and data quality problem is little by little exposed,
Such as validity, accuracy, the consistency etc. of data.The worst result be exactly user's sensory system and data be it is incredible,
It finally abandons using system, also just loses the meaning of construction system in this way.
Invention content
To solve the above-mentioned problems, the present invention provides a kind of data quality accessment sides that can be assessed the quality of data
Method.
A kind of Data Quality Assessment Methodology, includes the following steps:
Step 1, database is imported;
Step 2, the data in garbled data library;
Step 3, configuration assessment rule;
Step 4, the assessment rule of running configuration is assessed, and obtains mass fraction.
Further, the step 3 includes following below scheme:
Step 31, five dimension accountings of the quality of data are determined;
Step 32, the rule of five dimensions is configured.
Further, in the step 31, five dimensions are the completeness of data, consistency, promptness, and validity is complete
Five dimensions of whole property.
Further, the step 32 includes following:
Step 321, selection needs the rule assessed in five dimensions;
Step 322, the weight and threshold value of each rule are configured.
Further, the step 4 includes following below scheme:
Step 41, each rule is run one by one, obtains the score value of each dimension;
Step 42, the score value of five dimensions sums it up to obtain mass fraction.
Beneficial effects of the present invention:Setting in five dimensions for embodying the quality of data is each accounted for comparing by the present invention, and needle
Assessment rule is configured to each dimension after configuration favorable comment estimates rule to run rule one by one, obtain regular score, to
The score value of each dimension is obtained, adduction obtains mass fraction, the form that data quality accessment passes through score is made concisely to open up
Reveal and, increases user experience.
Description of the drawings
Fig. 1 is the flow chart of the present invention.
Fig. 2 is the flow chart of step 3 in Fig. 1.
Fig. 3 is the flow chart of step 32 in Fig. 2.
Fig. 4 is the flow chart of step 4 in Fig. 1.
Specific implementation mode
The embodiment of the present invention is described further below in conjunction with the accompanying drawings.
Referring to Fig. 1, the present invention provides a kind of Data Quality Assessment Methodology, detailed process is as follows:
Step 1, database is imported.
In the present embodiment, access needs to carry out the database of data quality accessment, and quality is carried out to the data in database
Assessment.
Step 2, the data in garbled data library.
In the present embodiment, according to five dimensions, especially where which data has an impact the quality of data in analytical database
A little tables have an impact the quality of data, and preferential choose influences the quality of data big data progress data quality accessment.
Step 3, configuration assessment rule.
Referring to Fig. 2, step 3 is realized by following below scheme:
Step 31, five dimension accountings of the quality of data are determined.
In the present embodiment, five dimensions are respectively validity, consistency, promptness, completeness and integrality.Determine this five
Accounting of a dimension in assessment rule.
Wherein, validity is mainly reflected in the content and quantity of data, such as the length of data, digit and field association
Deng;
Consistency be mainly reflected in the history number of data compare, threshold fluctuations etc.;
Promptness is mainly reflected in update, addition and renewal frequency of data etc.;
Completeness is mainly reflected in the inspection of data filling rate and uniqueness inspection;
Integrality is mainly reflected in the data value consistency across table, the data type consistency across table and the association of parent table sublist
Property etc..
Step 32, the rule of five dimensions is configured.
Referring to Fig. 3, step 32 is realized by following below scheme:
Step 321, selection needs the rule assessed in five dimensions.
In the present embodiment, the assessment rule of five dimensions is configured.I.e. in each dimension, selection is to the quality of data
The specific rules assessed assess the quality of data of data by the rule configured.
Step 322, the weight and threshold value of each rule are configured.
In the present embodiment, to the rule selected in each dimension, its weight and threshold value in assessment rule is configured, composition is commented
Estimate rule.
Step 4, the assessment rule of running configuration is assessed, and obtains mass fraction.
Referring to Fig. 4, step 4 is realized by following below scheme:
Step 41, each rule is run one by one, obtains the score value of each dimension.
In the present embodiment, the rule in the good assessment rule of running configuration, assesses the quality of data, obtains every one by one
The score of a rule.The primary rule of often operation assesses the quality of data, all obtains corresponding score.
Work as P=THR, R=90%*W.
As P > THR or P < THR, then
If obtainingThen value is 1.
If obtainingThen value is 0.
Wherein, R is regular score, and P is actual ratio, and THR is rule threshold, and W is regular weight.
The each single item rule for running each dimension, obtains the score of each single item rule, further according to regular score, weight, dimension
Degree accounting etc. obtains the score value of each dimension.
Step 42, the score value of five dimensions sums it up to obtain mass fraction.
In the present embodiment, the score value of each dimension is added to obtain total mass fraction, mass fraction is the quality of data
It embodies, the score value the high, and it is better to assess.
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair
Bright principle, it should be understood that protection scope of the present invention is not limited to such specific embodiments and embodiments.This field
Those of ordinary skill can make according to the technical disclosures disclosed by the invention various does not depart from the other each of essence of the invention
The specific variations and combinations of kind, these variations and combinations are still within the scope of the present invention.
Claims (5)
1. a kind of Data Quality Assessment Methodology, which is characterized in that include the following steps:
Step 1, database is imported;
Step 2, the data in garbled data library;
Step 3, configuration assessment rule;
Step 4, the assessment rule of running configuration is assessed, and obtains mass fraction.
2. Data Quality Assessment Methodology as described in claim 1, which is characterized in that the step 3 includes following below scheme:
Step 31, five dimension accountings of the quality of data are determined;
Step 32, the rule of five dimensions is configured.
3. Data Quality Assessment Methodology as claimed in claim 2, which is characterized in that in the step 31, five dimensions are number
According to completeness, consistency, promptness, validity, five dimensions of integrality.
4. Data Quality Assessment Methodology as claimed in claim 2, which is characterized in that the step 32 includes following below scheme:
Step 321, selection needs the rule assessed in five dimensions;
Step 322, the weight and threshold value of each rule are configured.
5. Data Quality Assessment Methodology as claimed in claim 4, which is characterized in that the step 4 includes following below scheme:
Step 41, each rule is run one by one, obtains the score value of each dimension;
Step 42, the score value of five dimensions sums it up to obtain mass fraction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810173861.8A CN108334636A (en) | 2018-03-02 | 2018-03-02 | Data Quality Assessment Methodology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810173861.8A CN108334636A (en) | 2018-03-02 | 2018-03-02 | Data Quality Assessment Methodology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108334636A true CN108334636A (en) | 2018-07-27 |
Family
ID=62930194
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810173861.8A Pending CN108334636A (en) | 2018-03-02 | 2018-03-02 | Data Quality Assessment Methodology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108334636A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109657991A (en) * | 2018-12-21 | 2019-04-19 | 江苏满运软件科技有限公司 | Metadata quality appraisal procedure, device, electronic equipment, storage medium |
CN112506904A (en) * | 2020-12-02 | 2021-03-16 | 深圳市酷开网络科技股份有限公司 | Data quality evaluation method and device, terminal equipment and storage medium |
CN114742417A (en) * | 2022-04-15 | 2022-07-12 | 北京科杰科技有限公司 | Data quality evaluation method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105741196A (en) * | 2016-03-01 | 2016-07-06 | 万达信息股份有限公司 | Four-dimension-based data quality monitoring and evaluating method |
CN106503206A (en) * | 2016-10-26 | 2017-03-15 | 国家电网公司 | A kind of general data quality appraisal procedure based on entropy assessment |
CN107545043A (en) * | 2017-08-09 | 2018-01-05 | 国政通科技股份有限公司 | A kind of data application method and device based on data quality checking |
CN107679751A (en) * | 2017-09-30 | 2018-02-09 | 山东浪潮云服务信息科技有限公司 | A kind of appraisal procedure and device |
-
2018
- 2018-03-02 CN CN201810173861.8A patent/CN108334636A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105741196A (en) * | 2016-03-01 | 2016-07-06 | 万达信息股份有限公司 | Four-dimension-based data quality monitoring and evaluating method |
CN106503206A (en) * | 2016-10-26 | 2017-03-15 | 国家电网公司 | A kind of general data quality appraisal procedure based on entropy assessment |
CN107545043A (en) * | 2017-08-09 | 2018-01-05 | 国政通科技股份有限公司 | A kind of data application method and device based on data quality checking |
CN107679751A (en) * | 2017-09-30 | 2018-02-09 | 山东浪潮云服务信息科技有限公司 | A kind of appraisal procedure and device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109657991A (en) * | 2018-12-21 | 2019-04-19 | 江苏满运软件科技有限公司 | Metadata quality appraisal procedure, device, electronic equipment, storage medium |
CN112506904A (en) * | 2020-12-02 | 2021-03-16 | 深圳市酷开网络科技股份有限公司 | Data quality evaluation method and device, terminal equipment and storage medium |
CN112506904B (en) * | 2020-12-02 | 2024-05-07 | 深圳市酷开网络科技股份有限公司 | Data quality evaluation method, device, terminal equipment and storage medium |
CN114742417A (en) * | 2022-04-15 | 2022-07-12 | 北京科杰科技有限公司 | Data quality evaluation method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108334636A (en) | Data Quality Assessment Methodology | |
CN107491813A (en) | A kind of long-tail group recommending method based on multiple-objection optimization | |
Diaz et al. | Nondeterministic networking methods | |
Cajueiro et al. | The rescaled variance statistic and the determination of the Hurst exponent | |
Fernandez-Viagas et al. | A new set of high-performing heuristics to minimise flowtime in permutation flowshops | |
CN103309894B (en) | Based on search implementation method and the system of user property | |
CN106919373A (en) | A kind of program code method for evaluating quality | |
CN108573041A (en) | Probability matrix based on weighting trusting relationship decomposes recommendation method | |
CN104750760A (en) | Application software recommending method and device | |
CN105243455A (en) | Grid distribution network planning evaluation method | |
Sidenius | Double barrier options: valuation by path counting | |
CN107133835A (en) | A kind of method and device for analyzing commercial quality | |
De Marco et al. | Shapes of implied volatility with positive mass at zero | |
CN104731773A (en) | Text sentiment analysis method and text sentiment analysis system | |
Ibrahim et al. | Pricing extendible options using the fast Fourier transform | |
Carr et al. | Simulating Bermudan interest rate derivatives | |
CN105718581A (en) | Interest point recommendation algorithm with space attenuation function introduced | |
Wei | Ruin probability in the presence of interest earnings and tax payments | |
De Marco et al. | Shapes of implied volatility with positive mass at zero | |
CN106570003A (en) | Data pushing method and apparatus | |
CN105630807B (en) | Method and device for analyzing incidence relation between unknown road and known road | |
Grable et al. | The Sharpe Ratio and Negative Excess Returns: The Problem and Solution. | |
CN108280766A (en) | Trading activity Risk Identification Method and device | |
Kim | The effects of ICT on CO 2 emissions along with economic growth, trade openness and financial development in Korea | |
CN105279676A (en) | Method for gaining surrender part of profits through sharing amounts or points of consumption by multiple persons |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180727 |
|
RJ01 | Rejection of invention patent application after publication |