CN104484448A - Assessment method for relational data quality - Google Patents

Assessment method for relational data quality Download PDF

Info

Publication number
CN104484448A
CN104484448A CN201410827598.1A CN201410827598A CN104484448A CN 104484448 A CN104484448 A CN 104484448A CN 201410827598 A CN201410827598 A CN 201410827598A CN 104484448 A CN104484448 A CN 104484448A
Authority
CN
China
Prior art keywords
field
assessed
data
database
analysis rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410827598.1A
Other languages
Chinese (zh)
Inventor
叶建锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZHEJIANG XIETONG DATA SYSTEM CO Ltd
Original Assignee
ZHEJIANG XIETONG DATA SYSTEM CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHEJIANG XIETONG DATA SYSTEM CO Ltd filed Critical ZHEJIANG XIETONG DATA SYSTEM CO Ltd
Priority to CN201410827598.1A priority Critical patent/CN104484448A/en
Publication of CN104484448A publication Critical patent/CN104484448A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of computers, in particular to an assessment method for relational data quality. The assessment method comprises the following steps: step one, sending a database quality assessment request to an assessment system terminal by a user terminal; step two, responding the request by the assessment system terminal; step three, configuring a relational database data source; step four, assessing data; step five, storing an assessed value and recording immediate assessment time; step six, outputting the assessed value. The method can realize quality assessment of relational data, and the data can be analyzed more accurately, so that the data are further utilized.

Description

A kind of appraisal procedure of relational data quality
Technical field
The present invention relates to computer realm, in particular to a kind of appraisal procedure of relational data quality.
Background technology
The informatization of the current industry-by-industry of China is like a raging fire has carried out a lot of year, have accumulated abundant data.These data are multifarious, and storage organization is also ever-changing.But the quality of data of these data is general not high, and cause the analysis result based on these data inaccurate, data cannot well be utilized.Because these data are stored in inside database, the managerial personnel of information departments are difficult to the quality of data really grasping these data.
At present, for relevant database to store the Data Quality Analysis method of data content few, the analysis especially for the data content that specifically should store national standard data is in space state especially.The data of such storage code, such as sex, there is special standard in country, must span within certain numeral, but the situation of reality be but most data completely and national standard code be not inconsistent.Cause data nonstandard, disunity, cannot well be used.
Summary of the invention
In order to overcome the defect existed in above-mentioned background technology, the technical problem to be solved in the present invention is to provide a kind of appraisal procedure of relational data quality.
For solving the problems of the technologies described above, the appraisal procedure of relational data quality of the present invention, comprises the steps:
Step 1, user terminal sends database quality evaluation request to evaluating system terminal;
Step 2, the request of evaluating system terminal response;
Step 3, configuration relation type database data source, comprises the steps:
3.1, input the information of evaluated database, described information comprises the IP address of database, database user name, password or port, and by described information storage to evaluating system;
3.2, foundation links with evaluated database;
3.3, obtain the structure of table corresponding to described evaluated database and field, any one field that described evaluating system can select any one to show is configured;
3.4, initialization field analysis rule;
Step 4, assessment data, comprises the steps:
4.1, select the data to be assessed in evaluated database, described data to be assessed comprise several tables to be assessed, comprise one or more field to be assessed under described list structure;
4.2, the analysis rule that described evaluating system configures field to be assessed carries out assessment to field to be assessed and draws assessed value, described assessed value can be classified according to the difference analyzing field type, and described analysis rule comprises and the mating of national standard code, and comprises the steps:
4.2.1, read the national standard code that described field to be assessed is corresponding, described field to be assessed is mated with corresponding national standard code;
4.2.2, wherein, when described field to be assessed and national standard code matches meet, weighted value corresponding for described field to be assessed is added to corresponding assessed value;
Step 5, stores assessed value and records the immediate assessment time;
Step 6, exports assessed value.
Further, described analysis rule also comprises field length comparison, comprises the steps: the length pre-setting criteria field, the length of described field to be assessed and described criteria field is compared; Wherein, when the length of described field length to be assessed and described criteria field meets, weighted value corresponding for described field to be assessed is added to corresponding assessed value.
Further, described analysis rule also comprise field disappearance detect, described in comprise the steps: described field to be assessed to contrast detection one by one, the scope of described detection comprises all record information such as numeral, word, pattern; Wherein, when described field to be assessed does not lack, weighted value corresponding for described field to be assessed is added to corresponding assessed value.
Further, described analysis rule also comprises similar fields match, comprises the steps: described field to be assessed to contrast detection one by one, and the scope of described detection comprises numeral, word, pattern or all relative recording information such as extremely to combine; Wherein, when described field to be assessed is same class field, described same class field comprises the field being only numeral or being only word or being only figure or combining with same form, weighted value corresponding for described Repeating Field to be assessed is added to corresponding assessed value.
Further, described analysis rule can mate field to be assessed by conbined usage.
Data inside database can be assessed according to the analysis rule preset by the appraisal procedure of the quality of data of the present invention, can analyze data more accurately, so that data further utilize.Concrete, the analysis rule used comprises and the mating of national standard code, field length comparison, field disappearance detects and similar fields match, school of field data and above-mentioned analysis rule being compared is examined, form matching result, the comparison of data is evaluation measures of a kind of simple and effective, the object of comparing can comprise numeral, word, pattern or and all relative recording information such as to combine, the result that coupling is formed can show the integrality of data, the degree of correlation of data, the synchronism of data, the rationality of data, add up according to weight the quality of assessment data further.
Accompanying drawing explanation
In order to be illustrated more clearly in inventive embodiments or technical scheme of the prior art, briefly introduce to the accompanying drawing used required in the embodiment of the present invention or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those skilled in the art, under the prerequisite not paying creative work, also can obtain other accompanying drawing according to these accompanying drawings;
Fig. 1 is the process flow diagram of the appraisal procedure embodiment of a kind of relational data quality of the present invention;
Fig. 2 is the process flow diagram in configuration relation type database data source in the appraisal procedure embodiment of a kind of relational data quality of the present invention;
Fig. 3 is the process flow diagram of the appraisal procedure embodiment Chinese Home standard code data assessment of a kind of relational data quality of the present invention;
Fig. 4 is the process flow diagram of another kind of appraisal procedure in the appraisal procedure embodiment of a kind of relational data of the present invention quality.
Embodiment
Figure 1 shows that the process flow diagram of the appraisal procedure embodiment of a kind of relational data quality of the present invention, comprise the steps: step 1, user terminal sends database quality evaluation request to evaluating system terminal; Step 2, the request of evaluating system terminal response; Step 3, configuration relation type database data source; Step 4, assessment data; Step 5, stores assessed value and records the immediate assessment time; Step 6, exports assessed value.Said method can the quality evaluation of implementation relation type data, can analyze data more accurately, so that data further utilize.
Figure 2 shows that the process flow diagram in configuration relation type database data source in the appraisal procedure embodiment of a kind of relational data quality of the present invention, comprise the steps: step 3.1, input the information of evaluated database, information comprises the IP address of database, database user name, password or port, and by information storage to evaluating system; Step 3.2, foundation links with evaluated database; Step 3.3, obtain the structure of table corresponding to evaluated database and field, any one field that evaluating system can select any one to show is configured; Step 3.4, initialization field analysis rule.
Figure 3 shows that the process flow diagram of the appraisal procedure embodiment Chinese Home standard code data assessment of a kind of relational data quality of the present invention, comprise the steps: step 401, select the data to be assessed in evaluated database, data to be assessed comprise several tables to be assessed, comprise one or more field to be assessed under list structure; Step 402, the analysis rule that evaluating system configures field to be assessed carries out assessment to field to be assessed and draws assessed value, assessed value can be classified according to the difference analyzing field type, and analysis rule is and the mating of national standard code, reads the national standard code that field to be assessed is corresponding; Step 403, mates field to be assessed with corresponding national standard code; Step 404, wherein, when field to be assessed and national standard code matches meet, adds to corresponding assessed value by weighted value corresponding for field to be assessed.
Such as, for this national standard code of sex, following layout can be carried out:
Code classification title: sex;
Code codomain: 1,2,3,4;
Such as this national standard code national, can following layout be carried out:
Code classification title: national;
Code codomain: 1,2,3,4,5 ... 56;
By that analogy, by with the national standard code maintenance obtained.Be put into inside system for subsequent step.Use is shown T_GGZY and is stored by these data, and for sex, this table core field is as follows:
Safeguarding various standard, when certain needs certain table of certain database answers the field of stored country codes to carry out data content quality analysis, when checking whether it stores the country code of standard actually, by the standard code of specifying is matched in this field, the connection to this database that system utilizes the first step to set up, be connected on this database, the data volume meeting national standard utilizing following stsndard SQL to calculate this field to store:
Select count (*) PXTGL from is by checklist t, T_GGZY t1
Where to_char (t. is by check field) in t1.ZYBH
And t1.DMFLMC=' specifies national standard code specific name '
Concrete, if want to assess the sex field in certain table, the data that sex field is stored and the analysis of national standard code compliance data, only needing to select sex this national standard code, is exactly then 0 by the value of standard code, and 1,2,9 these four records mate this field of sex one by one, and by comparison, system can find that the field of some sex is not within this scope of national standard code sex.Now system will for those not field rejectings within this scope of national standard code sex for that, and remaining field added in the assessed value of sex assessment according to weight, assessed value is higher, and the quality of sex field is higher.
Figure 4 shows that the process flow diagram of another kind of appraisal procedure in the appraisal procedure embodiment of a kind of relational data of the present invention quality, the analysis rule in this appraisal procedure also comprises:
Analysis rule also comprises field length comparison, comprises the steps: the length pre-setting criteria field, the length of field to be assessed and criteria field is compared; Wherein, when the length of field length to be assessed and criteria field meets, weighted value corresponding for field to be assessed is added to corresponding assessed value.
Analysis rule also comprises field disappearance and detects, and comprise the steps: field to be assessed to contrast detection one by one, the scope of detection comprises all record information such as numeral, word, pattern; Wherein, when field to be assessed does not lack, weighted value corresponding for field to be assessed is added to corresponding assessed value.
Analysis rule also comprises similar fields match, comprises the steps: field to be assessed to contrast detection one by one, and the scope of detection comprises numeral, word, pattern or all relative recording information such as extremely to combine; Wherein, when field to be assessed is same class field, same class field comprises the field being only numeral or being only word or being only figure or combining with same form, weighted value corresponding for Repeating Field to be assessed is added to corresponding assessed value.
This appraisal procedure is assessed for the field data of diversification, be provided with and the mating of national standard code, field length comparison, field disappearance detects and similar fields match four kinds of analysis rules, above-mentioned analysis rule all has the using value of its reality, specific as follows, field length: for the field data in database, especially the I.D. field under PEOPLE table, the length standard having it set due to I.D. is 18 words, the incongruent field of field length is all rejected, the integrality of the lower Various types of data of PEOPLE table can be detected, especially a class field length has the data of normative reference, the weight of length matching field is added to the assessed value of corresponding I.D..Field lack: disappearance refer to this field without any relative recording information, in space state, the reason formed as space state comprises: the disappearance of data itself, data are a variety of causes such as the mistake of appearance or the asynchronous of Data Update when inputting, the data of this type of space state its not there is any reference value, this type of data screening can be rejected by the school inspection coupling lacked by field, leave the data with reference significance and calculate the assessed value relevant with its weighted value, so that follow-up Correlative data analysis, analysis directions can comprise the analysis of causes of disappearance.Similar fields match: do not have specific criteria length field data mainly for those, such as, name field under PEOPLE table, this field is the combination based on middle word, the selection of length and middle word does not all have standard, once there is numeral or figure in this field, then illustrate that the data of this field do not have reference value, the data that stay and there is reference significance can be rejected and calculate the assessed value relevant with its weighted value.Further, analysis rule can also be carried out and use alternately, the such as conbined usage of field length and similar fields match, for the name field under PEOPLE table, the field of certain length name can be matched, so that the analysis of follow-up data.The conbined usage of analysis rule can make the assessment of data more flexible, and applicable surface is more extensive.Meanwhile, this evaluating system can also constantly supplement Analysis be regular.
Analytical procedure is as follows, and user terminal sends database quality evaluation request to evaluating system terminal; The request of evaluating system terminal response; Configuration relation type database data source.Because system has the function of assessment storage, data for analysis and assessment only need transfer its historical data, so first detect field to be assessed, whether it once went through all over crossing analysis rule to detect it, if so, then directly historical data assessment result is exported.If do not have, then analyze one by one according to analysis rule, similar shown in its analytical approach and Fig. 3, therefore do not repeat, produce four kinds of assessed value A under four kinds of analysis rules, B, C and D respectively, the assessed value obtained can be set up suitable data model according to the demand of reality and carry out further data analysis.
More than show and describe ultimate principle of the present invention, principal character and advantage of the present invention.The technician of the industry should understand; the present invention is not restricted to the described embodiments; what describe in above-described embodiment and instructions just illustrates principle of the present invention; the present invention also has various changes and modifications without departing from the spirit and scope of the present invention, and these changes and improvements all fall in the claimed scope of the invention.Application claims protection domain is defined by appending claims and equivalent thereof.

Claims (5)

1. an appraisal procedure for relational data quality, comprises the steps:
Step 1, user terminal sends database quality evaluation request to evaluating system terminal;
Step 2, the request of evaluating system terminal response;
Step 3, configuration relation type database data source, comprises the steps:
3.1, input the information of evaluated database, described information comprises the IP address of database, database user name, password or port, and by described information storage to evaluating system;
3.2, foundation links with evaluated database;
3.3, obtain the structure of table corresponding to described evaluated database and field, any one field that described evaluating system can select any one to show is configured;
3.4, initialization field analysis rule;
Step 4, assessment data, comprises the steps:
4.1, select the data to be assessed in evaluated database, described data to be assessed comprise several tables to be assessed, comprise one or more field to be assessed under described list structure;
4.2, the analysis rule that described evaluating system configures field to be assessed carries out assessment to field to be assessed and draws assessed value, described assessed value can be classified according to the difference analyzing field type, and described analysis rule comprises and the mating of national standard code, and comprises the steps:
4.2.1, read the national standard code that described field to be assessed is corresponding, described field to be assessed is mated with corresponding national standard code;
4.2.2, wherein, when described field to be assessed and national standard code matches meet, weighted value corresponding for described field to be assessed is added to corresponding assessed value;
Step 5, stores assessed value and records the immediate assessment time;
Step 6, exports assessed value.
2. the appraisal procedure of a kind of relational data quality according to claim 1: described analysis rule also comprises field length comparison, comprise the steps: the length pre-setting criteria field, the length of described field to be assessed and described criteria field is compared; Wherein, when the length of described field length to be assessed and described criteria field meets, weighted value corresponding for described field to be assessed is added to corresponding assessed value.
3. the appraisal procedure of a kind of relational data quality according to claim 1 and 2: described analysis rule also comprises field disappearance and detects, described field to be assessed is contrasted detection by described comprising the steps: one by one, and the scope of described detection comprises all record information such as numeral, word, pattern; Wherein, when described field to be assessed does not lack, weighted value corresponding for described field to be assessed is added to corresponding assessed value.
4. the appraisal procedure of a kind of relational data quality according to claim 3: described analysis rule also comprises similar fields match, comprise the steps: described field to be assessed to contrast detection one by one, the scope of described detection comprises numeral, word, pattern or all relative recording information such as extremely to combine; Wherein, when described field to be assessed is same class field, described same class field comprises the field being only numeral or being only word or being only figure or combining with same form, weighted value corresponding for described Repeating Field to be assessed is added to corresponding assessed value.
5. the appraisal procedure of a kind of relational data quality according to claim 4: described analysis rule can mate field to be assessed by conbined usage.
CN201410827598.1A 2014-12-26 2014-12-26 Assessment method for relational data quality Pending CN104484448A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410827598.1A CN104484448A (en) 2014-12-26 2014-12-26 Assessment method for relational data quality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410827598.1A CN104484448A (en) 2014-12-26 2014-12-26 Assessment method for relational data quality

Publications (1)

Publication Number Publication Date
CN104484448A true CN104484448A (en) 2015-04-01

Family

ID=52758989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410827598.1A Pending CN104484448A (en) 2014-12-26 2014-12-26 Assessment method for relational data quality

Country Status (1)

Country Link
CN (1) CN104484448A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106873484A (en) * 2017-02-27 2017-06-20 今创科技有限公司 A kind of track traffic meteorology monitoring method, device and system
CN108694172A (en) * 2017-04-05 2018-10-23 北京京东尚科信息技术有限公司 Information output method and device
CN109299062A (en) * 2018-07-02 2019-02-01 北京市天元网络技术股份有限公司 A kind of quality evaluating method and system towards document category digital resource metadata
CN110309131A (en) * 2019-04-12 2019-10-08 北京星网锐捷网络技术有限公司 The method for evaluating quality and device of massive structured data
CN110362563A (en) * 2019-07-19 2019-10-22 北京明略软件系统有限公司 The processing method and processing device of tables of data, storage medium, electronic device
CN112084269A (en) * 2018-12-25 2020-12-15 北京锐安科技有限公司 Data quality calculation method and device, storage medium and server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007070205A1 (en) * 2005-12-14 2007-06-21 Microsoft Corporation Data independent relevance evaluation utilizing cognitive concept relationship
US20100042581A1 (en) * 2005-11-22 2010-02-18 At&T Intellectual Property Ii, L.P. Join paths across multiple databases
US20120066260A1 (en) * 2006-02-01 2012-03-15 Oracle International Corporation System And Method For Building Decision Trees In A Database
CN103631868A (en) * 2013-11-04 2014-03-12 中国电子科技集团公司第十五研究所 Data management system compatible with relational database
CN103699693A (en) * 2014-01-10 2014-04-02 中国南方电网有限责任公司 Metadata-based data quality management method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100042581A1 (en) * 2005-11-22 2010-02-18 At&T Intellectual Property Ii, L.P. Join paths across multiple databases
WO2007070205A1 (en) * 2005-12-14 2007-06-21 Microsoft Corporation Data independent relevance evaluation utilizing cognitive concept relationship
US20120066260A1 (en) * 2006-02-01 2012-03-15 Oracle International Corporation System And Method For Building Decision Trees In A Database
CN103631868A (en) * 2013-11-04 2014-03-12 中国电子科技集团公司第十五研究所 Data management system compatible with relational database
CN103699693A (en) * 2014-01-10 2014-04-02 中国南方电网有限责任公司 Metadata-based data quality management method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
武伟: "交通运输数据标准符合性检测研究及系统开发", 《中国优秀硕士学位论文全文数据库 工程科技II辑》 *
腾东兴等: "一种面向关系型数据的可视质量分析方法", 《中国期刊全文数据库 软件学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106873484A (en) * 2017-02-27 2017-06-20 今创科技有限公司 A kind of track traffic meteorology monitoring method, device and system
CN108694172A (en) * 2017-04-05 2018-10-23 北京京东尚科信息技术有限公司 Information output method and device
CN108694172B (en) * 2017-04-05 2021-12-31 北京京东尚科信息技术有限公司 Information output method and device
CN109299062A (en) * 2018-07-02 2019-02-01 北京市天元网络技术股份有限公司 A kind of quality evaluating method and system towards document category digital resource metadata
CN112084269A (en) * 2018-12-25 2020-12-15 北京锐安科技有限公司 Data quality calculation method and device, storage medium and server
CN112084269B (en) * 2018-12-25 2024-05-14 北京锐安科技有限公司 Data quality calculation method, device, storage medium and server
CN110309131A (en) * 2019-04-12 2019-10-08 北京星网锐捷网络技术有限公司 The method for evaluating quality and device of massive structured data
CN110362563A (en) * 2019-07-19 2019-10-22 北京明略软件系统有限公司 The processing method and processing device of tables of data, storage medium, electronic device

Similar Documents

Publication Publication Date Title
CN104484448A (en) Assessment method for relational data quality
Gao et al. Big data validation and quality assurance--issuses, challenges, and needs
Zygmont et al. Robust factor analysis in the presence of normality violations, missing data, and outliers: Empirical questions and possible solutions
JP6233411B2 (en) Fault analysis apparatus, fault analysis method, and computer program
CN110472068A (en) Big data processing method, equipment and medium based on heterogeneous distributed knowledge mapping
Yu et al. A statistical framework of data-driven bottleneck identification in manufacturing systems
Malhotra et al. Measurement equivalence using generalizability theory: An examination of manufacturing flexibility dimensions
CN104765665A (en) Method and device for testing hard disks
US20180189416A1 (en) Method and apparatus for visualizing relations between incident resources
CN105956410B (en) A kind of Universal-purpose quick detection method of IEC61850 full models
Ahn et al. A resampling approach for interval‐valued data regression
CN105354256A (en) Data pagination query method and apparatus
US20220197950A1 (en) Eliminating many-to-many joins between database tables
Baas et al. When peer reviewers go rogue-Estimated prevalence of citation manipulation by reviewers based on the citation patterns of 69,000 reviewers
CN108900554A (en) Http protocol asset detecting method, system, equipment and computer media
Li et al. On fixed point theory of monotone mappings with respect to a partial order introduced by a vector functional in cone metric spaces
Courtney et al. Dealing with non‐normality: an introduction and step‐by‐step guide using R
CN105183916A (en) Device and method for managing unstructured data
CN109408502A (en) A kind of data standard processing method, device and its storage medium
CN104537561A (en) Automatic economic activities classification device in organizing institution bar codes
Castellani Ribeiro et al. An urban data profiler
CN104240107B (en) Community data screening system and method thereof
CN108052441A (en) A kind of test method, system, device and the storage medium of hard disk performance level
WO2016119508A1 (en) Method for recognizing large-scale objects based on spark system
García‐Hernández et al. On the functional expression of frequency–magnitude distributions: A comprehensive statistical examination

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150401

RJ01 Rejection of invention patent application after publication