CN104484448A - Assessment method for relational data quality - Google Patents
Assessment method for relational data quality Download PDFInfo
- Publication number
- CN104484448A CN104484448A CN201410827598.1A CN201410827598A CN104484448A CN 104484448 A CN104484448 A CN 104484448A CN 201410827598 A CN201410827598 A CN 201410827598A CN 104484448 A CN104484448 A CN 104484448A
- Authority
- CN
- China
- Prior art keywords
- field
- assessed
- data
- database
- analysis rule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of computers, in particular to an assessment method for relational data quality. The assessment method comprises the following steps: step one, sending a database quality assessment request to an assessment system terminal by a user terminal; step two, responding the request by the assessment system terminal; step three, configuring a relational database data source; step four, assessing data; step five, storing an assessed value and recording immediate assessment time; step six, outputting the assessed value. The method can realize quality assessment of relational data, and the data can be analyzed more accurately, so that the data are further utilized.
Description
Technical field
The present invention relates to computer realm, in particular to a kind of appraisal procedure of relational data quality.
Background technology
The informatization of the current industry-by-industry of China is like a raging fire has carried out a lot of year, have accumulated abundant data.These data are multifarious, and storage organization is also ever-changing.But the quality of data of these data is general not high, and cause the analysis result based on these data inaccurate, data cannot well be utilized.Because these data are stored in inside database, the managerial personnel of information departments are difficult to the quality of data really grasping these data.
At present, for relevant database to store the Data Quality Analysis method of data content few, the analysis especially for the data content that specifically should store national standard data is in space state especially.The data of such storage code, such as sex, there is special standard in country, must span within certain numeral, but the situation of reality be but most data completely and national standard code be not inconsistent.Cause data nonstandard, disunity, cannot well be used.
Summary of the invention
In order to overcome the defect existed in above-mentioned background technology, the technical problem to be solved in the present invention is to provide a kind of appraisal procedure of relational data quality.
For solving the problems of the technologies described above, the appraisal procedure of relational data quality of the present invention, comprises the steps:
Step 1, user terminal sends database quality evaluation request to evaluating system terminal;
Step 2, the request of evaluating system terminal response;
Step 3, configuration relation type database data source, comprises the steps:
3.1, input the information of evaluated database, described information comprises the IP address of database, database user name, password or port, and by described information storage to evaluating system;
3.2, foundation links with evaluated database;
3.3, obtain the structure of table corresponding to described evaluated database and field, any one field that described evaluating system can select any one to show is configured;
3.4, initialization field analysis rule;
Step 4, assessment data, comprises the steps:
4.1, select the data to be assessed in evaluated database, described data to be assessed comprise several tables to be assessed, comprise one or more field to be assessed under described list structure;
4.2, the analysis rule that described evaluating system configures field to be assessed carries out assessment to field to be assessed and draws assessed value, described assessed value can be classified according to the difference analyzing field type, and described analysis rule comprises and the mating of national standard code, and comprises the steps:
4.2.1, read the national standard code that described field to be assessed is corresponding, described field to be assessed is mated with corresponding national standard code;
4.2.2, wherein, when described field to be assessed and national standard code matches meet, weighted value corresponding for described field to be assessed is added to corresponding assessed value;
Step 5, stores assessed value and records the immediate assessment time;
Step 6, exports assessed value.
Further, described analysis rule also comprises field length comparison, comprises the steps: the length pre-setting criteria field, the length of described field to be assessed and described criteria field is compared; Wherein, when the length of described field length to be assessed and described criteria field meets, weighted value corresponding for described field to be assessed is added to corresponding assessed value.
Further, described analysis rule also comprise field disappearance detect, described in comprise the steps: described field to be assessed to contrast detection one by one, the scope of described detection comprises all record information such as numeral, word, pattern; Wherein, when described field to be assessed does not lack, weighted value corresponding for described field to be assessed is added to corresponding assessed value.
Further, described analysis rule also comprises similar fields match, comprises the steps: described field to be assessed to contrast detection one by one, and the scope of described detection comprises numeral, word, pattern or all relative recording information such as extremely to combine; Wherein, when described field to be assessed is same class field, described same class field comprises the field being only numeral or being only word or being only figure or combining with same form, weighted value corresponding for described Repeating Field to be assessed is added to corresponding assessed value.
Further, described analysis rule can mate field to be assessed by conbined usage.
Data inside database can be assessed according to the analysis rule preset by the appraisal procedure of the quality of data of the present invention, can analyze data more accurately, so that data further utilize.Concrete, the analysis rule used comprises and the mating of national standard code, field length comparison, field disappearance detects and similar fields match, school of field data and above-mentioned analysis rule being compared is examined, form matching result, the comparison of data is evaluation measures of a kind of simple and effective, the object of comparing can comprise numeral, word, pattern or and all relative recording information such as to combine, the result that coupling is formed can show the integrality of data, the degree of correlation of data, the synchronism of data, the rationality of data, add up according to weight the quality of assessment data further.
Accompanying drawing explanation
In order to be illustrated more clearly in inventive embodiments or technical scheme of the prior art, briefly introduce to the accompanying drawing used required in the embodiment of the present invention or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those skilled in the art, under the prerequisite not paying creative work, also can obtain other accompanying drawing according to these accompanying drawings;
Fig. 1 is the process flow diagram of the appraisal procedure embodiment of a kind of relational data quality of the present invention;
Fig. 2 is the process flow diagram in configuration relation type database data source in the appraisal procedure embodiment of a kind of relational data quality of the present invention;
Fig. 3 is the process flow diagram of the appraisal procedure embodiment Chinese Home standard code data assessment of a kind of relational data quality of the present invention;
Fig. 4 is the process flow diagram of another kind of appraisal procedure in the appraisal procedure embodiment of a kind of relational data of the present invention quality.
Embodiment
Figure 1 shows that the process flow diagram of the appraisal procedure embodiment of a kind of relational data quality of the present invention, comprise the steps: step 1, user terminal sends database quality evaluation request to evaluating system terminal; Step 2, the request of evaluating system terminal response; Step 3, configuration relation type database data source; Step 4, assessment data; Step 5, stores assessed value and records the immediate assessment time; Step 6, exports assessed value.Said method can the quality evaluation of implementation relation type data, can analyze data more accurately, so that data further utilize.
Figure 2 shows that the process flow diagram in configuration relation type database data source in the appraisal procedure embodiment of a kind of relational data quality of the present invention, comprise the steps: step 3.1, input the information of evaluated database, information comprises the IP address of database, database user name, password or port, and by information storage to evaluating system; Step 3.2, foundation links with evaluated database; Step 3.3, obtain the structure of table corresponding to evaluated database and field, any one field that evaluating system can select any one to show is configured; Step 3.4, initialization field analysis rule.
Figure 3 shows that the process flow diagram of the appraisal procedure embodiment Chinese Home standard code data assessment of a kind of relational data quality of the present invention, comprise the steps: step 401, select the data to be assessed in evaluated database, data to be assessed comprise several tables to be assessed, comprise one or more field to be assessed under list structure; Step 402, the analysis rule that evaluating system configures field to be assessed carries out assessment to field to be assessed and draws assessed value, assessed value can be classified according to the difference analyzing field type, and analysis rule is and the mating of national standard code, reads the national standard code that field to be assessed is corresponding; Step 403, mates field to be assessed with corresponding national standard code; Step 404, wherein, when field to be assessed and national standard code matches meet, adds to corresponding assessed value by weighted value corresponding for field to be assessed.
Such as, for this national standard code of sex, following layout can be carried out:
Code classification title: sex;
Code codomain: 1,2,3,4;
Such as this national standard code national, can following layout be carried out:
Code classification title: national;
Code codomain: 1,2,3,4,5 ... 56;
By that analogy, by with the national standard code maintenance obtained.Be put into inside system for subsequent step.Use is shown T_GGZY and is stored by these data, and for sex, this table core field is as follows:
Safeguarding various standard, when certain needs certain table of certain database answers the field of stored country codes to carry out data content quality analysis, when checking whether it stores the country code of standard actually, by the standard code of specifying is matched in this field, the connection to this database that system utilizes the first step to set up, be connected on this database, the data volume meeting national standard utilizing following stsndard SQL to calculate this field to store:
Select count (*) PXTGL from is by checklist t, T_GGZY t1
Where to_char (t. is by check field) in t1.ZYBH
And t1.DMFLMC=' specifies national standard code specific name '
Concrete, if want to assess the sex field in certain table, the data that sex field is stored and the analysis of national standard code compliance data, only needing to select sex this national standard code, is exactly then 0 by the value of standard code, and 1,2,9 these four records mate this field of sex one by one, and by comparison, system can find that the field of some sex is not within this scope of national standard code sex.Now system will for those not field rejectings within this scope of national standard code sex for that, and remaining field added in the assessed value of sex assessment according to weight, assessed value is higher, and the quality of sex field is higher.
Figure 4 shows that the process flow diagram of another kind of appraisal procedure in the appraisal procedure embodiment of a kind of relational data of the present invention quality, the analysis rule in this appraisal procedure also comprises:
Analysis rule also comprises field length comparison, comprises the steps: the length pre-setting criteria field, the length of field to be assessed and criteria field is compared; Wherein, when the length of field length to be assessed and criteria field meets, weighted value corresponding for field to be assessed is added to corresponding assessed value.
Analysis rule also comprises field disappearance and detects, and comprise the steps: field to be assessed to contrast detection one by one, the scope of detection comprises all record information such as numeral, word, pattern; Wherein, when field to be assessed does not lack, weighted value corresponding for field to be assessed is added to corresponding assessed value.
Analysis rule also comprises similar fields match, comprises the steps: field to be assessed to contrast detection one by one, and the scope of detection comprises numeral, word, pattern or all relative recording information such as extremely to combine; Wherein, when field to be assessed is same class field, same class field comprises the field being only numeral or being only word or being only figure or combining with same form, weighted value corresponding for Repeating Field to be assessed is added to corresponding assessed value.
This appraisal procedure is assessed for the field data of diversification, be provided with and the mating of national standard code, field length comparison, field disappearance detects and similar fields match four kinds of analysis rules, above-mentioned analysis rule all has the using value of its reality, specific as follows, field length: for the field data in database, especially the I.D. field under PEOPLE table, the length standard having it set due to I.D. is 18 words, the incongruent field of field length is all rejected, the integrality of the lower Various types of data of PEOPLE table can be detected, especially a class field length has the data of normative reference, the weight of length matching field is added to the assessed value of corresponding I.D..Field lack: disappearance refer to this field without any relative recording information, in space state, the reason formed as space state comprises: the disappearance of data itself, data are a variety of causes such as the mistake of appearance or the asynchronous of Data Update when inputting, the data of this type of space state its not there is any reference value, this type of data screening can be rejected by the school inspection coupling lacked by field, leave the data with reference significance and calculate the assessed value relevant with its weighted value, so that follow-up Correlative data analysis, analysis directions can comprise the analysis of causes of disappearance.Similar fields match: do not have specific criteria length field data mainly for those, such as, name field under PEOPLE table, this field is the combination based on middle word, the selection of length and middle word does not all have standard, once there is numeral or figure in this field, then illustrate that the data of this field do not have reference value, the data that stay and there is reference significance can be rejected and calculate the assessed value relevant with its weighted value.Further, analysis rule can also be carried out and use alternately, the such as conbined usage of field length and similar fields match, for the name field under PEOPLE table, the field of certain length name can be matched, so that the analysis of follow-up data.The conbined usage of analysis rule can make the assessment of data more flexible, and applicable surface is more extensive.Meanwhile, this evaluating system can also constantly supplement Analysis be regular.
Analytical procedure is as follows, and user terminal sends database quality evaluation request to evaluating system terminal; The request of evaluating system terminal response; Configuration relation type database data source.Because system has the function of assessment storage, data for analysis and assessment only need transfer its historical data, so first detect field to be assessed, whether it once went through all over crossing analysis rule to detect it, if so, then directly historical data assessment result is exported.If do not have, then analyze one by one according to analysis rule, similar shown in its analytical approach and Fig. 3, therefore do not repeat, produce four kinds of assessed value A under four kinds of analysis rules, B, C and D respectively, the assessed value obtained can be set up suitable data model according to the demand of reality and carry out further data analysis.
More than show and describe ultimate principle of the present invention, principal character and advantage of the present invention.The technician of the industry should understand; the present invention is not restricted to the described embodiments; what describe in above-described embodiment and instructions just illustrates principle of the present invention; the present invention also has various changes and modifications without departing from the spirit and scope of the present invention, and these changes and improvements all fall in the claimed scope of the invention.Application claims protection domain is defined by appending claims and equivalent thereof.
Claims (5)
1. an appraisal procedure for relational data quality, comprises the steps:
Step 1, user terminal sends database quality evaluation request to evaluating system terminal;
Step 2, the request of evaluating system terminal response;
Step 3, configuration relation type database data source, comprises the steps:
3.1, input the information of evaluated database, described information comprises the IP address of database, database user name, password or port, and by described information storage to evaluating system;
3.2, foundation links with evaluated database;
3.3, obtain the structure of table corresponding to described evaluated database and field, any one field that described evaluating system can select any one to show is configured;
3.4, initialization field analysis rule;
Step 4, assessment data, comprises the steps:
4.1, select the data to be assessed in evaluated database, described data to be assessed comprise several tables to be assessed, comprise one or more field to be assessed under described list structure;
4.2, the analysis rule that described evaluating system configures field to be assessed carries out assessment to field to be assessed and draws assessed value, described assessed value can be classified according to the difference analyzing field type, and described analysis rule comprises and the mating of national standard code, and comprises the steps:
4.2.1, read the national standard code that described field to be assessed is corresponding, described field to be assessed is mated with corresponding national standard code;
4.2.2, wherein, when described field to be assessed and national standard code matches meet, weighted value corresponding for described field to be assessed is added to corresponding assessed value;
Step 5, stores assessed value and records the immediate assessment time;
Step 6, exports assessed value.
2. the appraisal procedure of a kind of relational data quality according to claim 1: described analysis rule also comprises field length comparison, comprise the steps: the length pre-setting criteria field, the length of described field to be assessed and described criteria field is compared; Wherein, when the length of described field length to be assessed and described criteria field meets, weighted value corresponding for described field to be assessed is added to corresponding assessed value.
3. the appraisal procedure of a kind of relational data quality according to claim 1 and 2: described analysis rule also comprises field disappearance and detects, described field to be assessed is contrasted detection by described comprising the steps: one by one, and the scope of described detection comprises all record information such as numeral, word, pattern; Wherein, when described field to be assessed does not lack, weighted value corresponding for described field to be assessed is added to corresponding assessed value.
4. the appraisal procedure of a kind of relational data quality according to claim 3: described analysis rule also comprises similar fields match, comprise the steps: described field to be assessed to contrast detection one by one, the scope of described detection comprises numeral, word, pattern or all relative recording information such as extremely to combine; Wherein, when described field to be assessed is same class field, described same class field comprises the field being only numeral or being only word or being only figure or combining with same form, weighted value corresponding for described Repeating Field to be assessed is added to corresponding assessed value.
5. the appraisal procedure of a kind of relational data quality according to claim 4: described analysis rule can mate field to be assessed by conbined usage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410827598.1A CN104484448A (en) | 2014-12-26 | 2014-12-26 | Assessment method for relational data quality |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410827598.1A CN104484448A (en) | 2014-12-26 | 2014-12-26 | Assessment method for relational data quality |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104484448A true CN104484448A (en) | 2015-04-01 |
Family
ID=52758989
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410827598.1A Pending CN104484448A (en) | 2014-12-26 | 2014-12-26 | Assessment method for relational data quality |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104484448A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106873484A (en) * | 2017-02-27 | 2017-06-20 | 今创科技有限公司 | A kind of track traffic meteorology monitoring method, device and system |
CN108694172A (en) * | 2017-04-05 | 2018-10-23 | 北京京东尚科信息技术有限公司 | Information output method and device |
CN109299062A (en) * | 2018-07-02 | 2019-02-01 | 北京市天元网络技术股份有限公司 | A kind of quality evaluating method and system towards document category digital resource metadata |
CN110309131A (en) * | 2019-04-12 | 2019-10-08 | 北京星网锐捷网络技术有限公司 | The method for evaluating quality and device of massive structured data |
CN110362563A (en) * | 2019-07-19 | 2019-10-22 | 北京明略软件系统有限公司 | The processing method and processing device of tables of data, storage medium, electronic device |
CN112084269A (en) * | 2018-12-25 | 2020-12-15 | 北京锐安科技有限公司 | Data quality calculation method and device, storage medium and server |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007070205A1 (en) * | 2005-12-14 | 2007-06-21 | Microsoft Corporation | Data independent relevance evaluation utilizing cognitive concept relationship |
US20100042581A1 (en) * | 2005-11-22 | 2010-02-18 | At&T Intellectual Property Ii, L.P. | Join paths across multiple databases |
US20120066260A1 (en) * | 2006-02-01 | 2012-03-15 | Oracle International Corporation | System And Method For Building Decision Trees In A Database |
CN103631868A (en) * | 2013-11-04 | 2014-03-12 | 中国电子科技集团公司第十五研究所 | Data management system compatible with relational database |
CN103699693A (en) * | 2014-01-10 | 2014-04-02 | 中国南方电网有限责任公司 | Metadata-based data quality management method and system |
-
2014
- 2014-12-26 CN CN201410827598.1A patent/CN104484448A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100042581A1 (en) * | 2005-11-22 | 2010-02-18 | At&T Intellectual Property Ii, L.P. | Join paths across multiple databases |
WO2007070205A1 (en) * | 2005-12-14 | 2007-06-21 | Microsoft Corporation | Data independent relevance evaluation utilizing cognitive concept relationship |
US20120066260A1 (en) * | 2006-02-01 | 2012-03-15 | Oracle International Corporation | System And Method For Building Decision Trees In A Database |
CN103631868A (en) * | 2013-11-04 | 2014-03-12 | 中国电子科技集团公司第十五研究所 | Data management system compatible with relational database |
CN103699693A (en) * | 2014-01-10 | 2014-04-02 | 中国南方电网有限责任公司 | Metadata-based data quality management method and system |
Non-Patent Citations (2)
Title |
---|
武伟: "交通运输数据标准符合性检测研究及系统开发", 《中国优秀硕士学位论文全文数据库 工程科技II辑》 * |
腾东兴等: "一种面向关系型数据的可视质量分析方法", 《中国期刊全文数据库 软件学报》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106873484A (en) * | 2017-02-27 | 2017-06-20 | 今创科技有限公司 | A kind of track traffic meteorology monitoring method, device and system |
CN108694172A (en) * | 2017-04-05 | 2018-10-23 | 北京京东尚科信息技术有限公司 | Information output method and device |
CN108694172B (en) * | 2017-04-05 | 2021-12-31 | 北京京东尚科信息技术有限公司 | Information output method and device |
CN109299062A (en) * | 2018-07-02 | 2019-02-01 | 北京市天元网络技术股份有限公司 | A kind of quality evaluating method and system towards document category digital resource metadata |
CN112084269A (en) * | 2018-12-25 | 2020-12-15 | 北京锐安科技有限公司 | Data quality calculation method and device, storage medium and server |
CN112084269B (en) * | 2018-12-25 | 2024-05-14 | 北京锐安科技有限公司 | Data quality calculation method, device, storage medium and server |
CN110309131A (en) * | 2019-04-12 | 2019-10-08 | 北京星网锐捷网络技术有限公司 | The method for evaluating quality and device of massive structured data |
CN110362563A (en) * | 2019-07-19 | 2019-10-22 | 北京明略软件系统有限公司 | The processing method and processing device of tables of data, storage medium, electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104484448A (en) | Assessment method for relational data quality | |
Gao et al. | Big data validation and quality assurance--issuses, challenges, and needs | |
Zygmont et al. | Robust factor analysis in the presence of normality violations, missing data, and outliers: Empirical questions and possible solutions | |
JP6233411B2 (en) | Fault analysis apparatus, fault analysis method, and computer program | |
CN110472068A (en) | Big data processing method, equipment and medium based on heterogeneous distributed knowledge mapping | |
Yu et al. | A statistical framework of data-driven bottleneck identification in manufacturing systems | |
Malhotra et al. | Measurement equivalence using generalizability theory: An examination of manufacturing flexibility dimensions | |
CN104765665A (en) | Method and device for testing hard disks | |
US20180189416A1 (en) | Method and apparatus for visualizing relations between incident resources | |
CN105956410B (en) | A kind of Universal-purpose quick detection method of IEC61850 full models | |
Ahn et al. | A resampling approach for interval‐valued data regression | |
CN105354256A (en) | Data pagination query method and apparatus | |
US20220197950A1 (en) | Eliminating many-to-many joins between database tables | |
Baas et al. | When peer reviewers go rogue-Estimated prevalence of citation manipulation by reviewers based on the citation patterns of 69,000 reviewers | |
CN108900554A (en) | Http protocol asset detecting method, system, equipment and computer media | |
Li et al. | On fixed point theory of monotone mappings with respect to a partial order introduced by a vector functional in cone metric spaces | |
Courtney et al. | Dealing with non‐normality: an introduction and step‐by‐step guide using R | |
CN105183916A (en) | Device and method for managing unstructured data | |
CN109408502A (en) | A kind of data standard processing method, device and its storage medium | |
CN104537561A (en) | Automatic economic activities classification device in organizing institution bar codes | |
Castellani Ribeiro et al. | An urban data profiler | |
CN104240107B (en) | Community data screening system and method thereof | |
CN108052441A (en) | A kind of test method, system, device and the storage medium of hard disk performance level | |
WO2016119508A1 (en) | Method for recognizing large-scale objects based on spark system | |
García‐Hernández et al. | On the functional expression of frequency–magnitude distributions: A comprehensive statistical examination |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150401 |
|
RJ01 | Rejection of invention patent application after publication |