CN109299062A - A kind of quality evaluating method and system towards document category digital resource metadata - Google Patents

A kind of quality evaluating method and system towards document category digital resource metadata Download PDF

Info

Publication number
CN109299062A
CN109299062A CN201810707861.1A CN201810707861A CN109299062A CN 109299062 A CN109299062 A CN 109299062A CN 201810707861 A CN201810707861 A CN 201810707861A CN 109299062 A CN109299062 A CN 109299062A
Authority
CN
China
Prior art keywords
metadata
evaluation index
digital resource
data
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810707861.1A
Other languages
Chinese (zh)
Inventor
胡中贵
刘海日
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing MetarNet Technologies Co Ltd
Original Assignee
Beijing MetarNet Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing MetarNet Technologies Co Ltd filed Critical Beijing MetarNet Technologies Co Ltd
Priority to CN201810707861.1A priority Critical patent/CN109299062A/en
Publication of CN109299062A publication Critical patent/CN109299062A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of quality evaluating method and system towards document category digital resource metadata, the described method includes: S1, according to the self attributes of target literature class digital resource, the quality evaluation system of metadata in the target literature class digital resource is constructed;S2 carries out every verification to each metadata according to each evaluation index in the quality evaluation system;S3 calculates the total score of the metadata according to every weight for verifying result corresponding score and each evaluation index.The present invention is realized to the quality evaluation towards document category digital resource metadata, and quality evaluation precision is high.

Description

A kind of quality evaluating method and system towards document category digital resource metadata
Technical field
The invention belongs to library science technical fields, more particularly, to one kind towards document category digital resource metadata Quality evaluating method and system.
Background technique
Development, the propulsion of global IT application sequence lasts, the number of document category data resource are constantly progressive with science and technology Amount and the scale of construction increase at an unprecedented rate.Metadata as describe these data resources significant data, how to be comprehensively The inspection of system and the quality of evaluation resource metadata are directly concerning data subsequent use.
The quality evaluation of document category digital resource metadata is lacked at present a set of relatively complete, comprehensive, flexible , the quality evaluating method that can be landed, the relevant evaluation method of most existing is only expounded in theoretic, only It is introduced from the dimension of assay, the specific evaluation rule for file, record and field attribute is not provided, to text The landing for offering the quality evaluating method of class digital resource metadata is implemented to lack substantive directive significance.
Summary of the invention
To overcome the above-mentioned existing quality evaluating method towards document category digital resource metadata only theoretically to carry out The problem of illustrating, can not landing implementation at least is partially solved the above problem, and the present invention provides one kind towards document category number The quality evaluating method and system of word resource metadata.
According to the first aspect of the invention, a kind of quality evaluating method towards document category digital resource metadata is provided, Include:
S1 constructs first number in the target literature class digital resource according to the self attributes of target literature class digital resource According to quality evaluation system;
S2 carries out every verification to each metadata according to each evaluation index in the quality evaluation system;
S3 calculates the metadata according to every weight for verifying result corresponding score and each evaluation index Total score.
Specifically, the quality evaluation system includes in integrality, correctness, consistency, uniqueness and timeliness One or more evaluation indexes;
Correspondingly, the step S2 is specifically included:
According to the Integrity Assessment index, whether the data entity verified in the metadata is lacked, data file is No missing, whether field contents one of lack or more during whether data record lacks, whether data structure lacks and record Kind;
According to the correctness evaluation index, verifies the legitimacy of the metadata, validity, with the presence or absence of messy code and is It is no there are one of unified value substitution or a variety of;
According to the Conformance Assessment index, the mathematical logic consistency and/or content format one of the metadata are verified Cause property;
According to the uniqueness evaluation index, the data record uniqueness and/or determinant attribute value of the metadata are verified Uniqueness;
According to the timeliness index, the data content novelty and/or chained address validity of the metadata are verified.
Specifically, between the step S1 and S3 further include:
According to rank belonging to the self attributes, classify to the corresponding metadata of the self attributes;
Correspondingly, the step S2 further include:
According to the corresponding evaluation index of every one kind metadata, every one kind metadata is verified;
Wherein, every one kind metadata and the preparatory associated storage of the evaluation index.
Specifically, the rank according to belonging to the self attributes classifies to the corresponding metadata of the self attributes The step of specifically include:
According to rank belonging to the self attributes, the corresponding metadata of the self attributes is divided into file-level member number According to, record one of grade metadata and field level metadata or a variety of.
Specifically, according to the corresponding evaluation index of every one kind metadata, every one kind metadata is verified The step of specifically include:
According to the correctness, integrality and timeliness evaluation index, the file directory of the file-level meta-data is verified Legitimacy, file designation legitimacy, quantity of documents integrality and file reach one of timeliness or a variety of;
According to the correctness, integrality and timeliness evaluation index, the document classification of the record grade metadata is verified One of legitimacy, document location legitimacy, file designation legitimacy, quantity of documents integrality and file generated timeliness or It is a variety of;
According to the integrality, correctness, consistency, uniqueness and timeliness evaluation index, the field level member is verified Record integrality, field integrality, field type legitimacy, field length legitimacy, the field format legitimacy, field of data Whether rightness of business, field timeliness, field accuracy, field have whether messy code, field unified value substitution occur, data are patrolled Collect consistency, content format consistency, record uniqueness, determinant attribute value uniqueness, data content novelty and link address One of validity is a variety of.
Specifically, the step S3 further include:
According to the corresponding score of result and the corresponding weight of each evaluation index is verified, every one kind metadata is calculated Score.
Specifically, the step S3 is specifically included:
The weight as a result, the corresponding score of verification result evaluation index corresponding with the verification result is verified for any It is multiplied;
The corresponding multiplied result of all verification results is added, the total score of the metadata is obtained.
A kind of QA system towards document category digital resource metadata is provided according to a second aspect of the present invention, is wrapped It includes:
Module is constructed, for the self attributes according to target literature class digital resource, constructs the target literature class number The quality evaluation system of metadata in resource;
Verify module, for according to each evaluation index in the quality evaluation system, to each metadata into Row is verified;
Computing module, for calculating the member according to the weight for verifying result corresponding score and each evaluation index The total score of data.
According to the third aspect of the invention we, a kind of quality evaluation equipment towards document category digital resource metadata is provided, Include:
At least one processor, at least one processor and bus;Wherein,
The processor and memory complete mutual communication by the bus;
The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program to refer to Order is able to carry out foregoing method.
According to the fourth aspect of the invention, a kind of non-transient computer readable storage medium is provided, for storing such as preceding institute State the computer program of method.
The present invention provides a kind of quality evaluating method and system towards document category digital resource metadata, and this method passes through Quality evaluation system is constructed according to the self attributes of target literature class digital resource, according to the quality evaluation system In each evaluation index, every verification is carried out to each metadata, according to the total score for verifying result and calculating metadata, to realize To the quality evaluation towards document category digital resource metadata, and quality evaluation precision is high.
Detailed description of the invention
Fig. 1 is the quality evaluating method overall flow provided in an embodiment of the present invention towards document category digital resource metadata Schematic diagram;
Fig. 2 is the QA system overall structure provided in an embodiment of the present invention towards document category digital resource metadata Schematic diagram;
Fig. 3 is the quality evaluation equipment overall structure provided in an embodiment of the present invention towards document category digital resource metadata Schematic diagram.
Specific embodiment
With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below Example is not intended to limit the scope of the invention for illustrating the present invention.
A kind of quality evaluating method towards document category digital resource metadata is provided in one embodiment of the invention, Fig. 1 is the quality evaluating method overall flow schematic diagram provided in an embodiment of the present invention towards document category digital resource metadata, This method comprises: S1 constructs metadata in target literature class digital resource according to the self attributes of target literature class digital resource Quality evaluation system;
Wherein, target literature class digital resource is the document category digital resource for needing to carry out quality evaluation.Target literature class The self attributes of digital resource include catalogue, name, quantity, classification and position of target literature class digital resource etc..Metadata Also known as broker data or relaying data predominantly describe the information of data attribute for the data for describing data, such as refer to for supporting Show the functions such as storage location, historical data, data-gathering and file record.Quality evaluation system includes multiple for carrying out The evaluation index of quality evaluation.
S2 carries out every verification to each metadata according to each evaluation index in quality evaluation system;
Wherein, it verifies and refers to judge whether each metadata reaches each evaluation index in target literature class digital resource.It verifies Based on verifying with computer check, supplemented by desk checking is verified, to obtain inspection result, survey report is exported.Pass through meter Calculation machine program verifies most of metadata to according to evaluation index, obtains initial results, manually performs a small amount of inspection And summarize check results.According to each evaluation index in quality evaluation system, every verification is carried out to each metadata.It examines As a result reference can be provided for the improvement and promotion of target literature class digital resource quality, such as is returned on the basis of inspection result In target literature of tracing back class digital resource metadata there are the problem of, thus faster find metadata various problems.
S3 calculates the total score of metadata according to every weight for verifying result corresponding score and evaluation index.
Wherein, each verifies the corresponding preparatory associated storage of score of result, each is verified, and result is corresponding to be commented The preparatory associated storage of the weight of valence index, according to every weight for verifying result corresponding score and each evaluation index, Computing Meta The total score of data.User can quickly distinguish the quality of metadata quality according to the total score of metadata.
The present embodiment by according to the self attributes of target literature class digital resource construct quality evaluation system, according to Each evaluation index in the quality evaluation system carries out every verification to each metadata, according to verification result Computing Meta The total score of data, to realize to the quality evaluation towards document category digital resource metadata, and quality evaluation precision is high.
On the basis of the above embodiments, in the present embodiment quality evaluation system include integrality, it is correctness, consistent One of property, uniqueness and timeliness or a variety of evaluation indexes;Correspondingly, step S2 is specifically included: according to Integrity Assessment Index, whether the data entity verified in metadata lacks, whether data file lacks, whether data record lacks, data knot Structure whether lack and record in field contents one of whether lack or a variety of;According to correctness evaluation index, first number is verified According to legitimacy, validity, with the presence or absence of messy code and with the presence or absence of unified value substitution one of or it is a variety of;It is commented according to consistency Valence index verifies the mathematical logic consistency and/or content format consistency of metadata;According to uniqueness evaluation index, verify The data record uniqueness and/or determinant attribute value uniqueness of metadata;According to timeliness index, verify in the data of metadata Hold novelty and/or chained address validity.
Specifically, the quality evaluation system of document category data resource metadata include integrality, it is correctness, consistent One of property, uniqueness and timeliness or a variety of evaluation indexes.Integrity Assessment index is the most basic guarantor of metadata quality Whether barrier, including data entity lack, as not having this type data in existing data;Whether data file lacks, such as file It can not obtain or damage;Whether data record lacks, as data record strip number be worth than expected it is on the low side;Whether data structure lacks It loses, as whether the field attribute of data record lacks;Whether field contents lack in record, if field contents are one in null value Kind is a variety of.
Correctness evaluation index is the consistent degree for describing data and real object, i.e. whether the field contents in record are deposited In exception or mistake.Correctness evaluation property index includes the legitimacy of metadata, validity, with the presence or absence of messy code and whether there is One of unified value substitution is a variety of.Wherein, the legitimacy of metadata refers to whether description field content meets data type It is required that, length requirement, call format or other business needs etc., if whether field type legal, whether date format legal, word Whether segment length is legal etc..For example, the postcode one of China is set to 6 integers and cannot be letter, E-mail address is centainly wrapped Containing character "@" etc..The validity of metadata refer to description data whether within the scope of reasonable codomain or meet user definition Condition.For example, the age is generally integer of the value between 1 to 120, time general value is after 1900 Christian eras and phase Pass event has logic precedence relationship, and the content of enumeration type field cannot have the value except the field enumerated value range.Member Data refer to whether description character type-word section content has messy code with the presence or absence of Confused-code.Metadata is substituted with the presence or absence of unified value Refer to description field content whether by system batch fill in into identical value, be usually predominantly for character type field contents, than Such as, " author " content of all records all extends this as " * ", and " authors' working unit " of all records all fills in "None" etc..Further include Other situations of metadata error.When the correctness evaluation index to source data is verified, for obvious exception Error value is difficult to find, generally requires machine check and combine with artificial verification.
The Conformance Assessment index of metadata includes mathematical logic consistency and/or content format consistency.Wherein, data Logical consistency refers to whether the value of the identical attribute of business meaning is consistent in different systems or data set.Mathematical logic one It is the incidence relation examined in data set between data value that cause property, which detects simplest method, is shared in confirmation data set Whether data attribute has identical value.Result is verified to be indicated with the quantity for the entity for not being able to maintain data consistency.Content Uniform format refers to whether same field content format in different records is unified, primarily directed to the character for having fixed format Type-word section.Such as: date of birth, some records fill in 1980-12, and some records extend this as 1978/12 etc..Content format is unified Property evaluation result is only unified or disunity, i.e., 100 points or 0 are in two kinds of situation.
The uniqueness evaluation index description data record and key data values of metadata are not repeated definition and use Property, including data record uniqueness and/or determinant attribute value uniqueness.Wherein, data record uniqueness refers to that content is complete Whether identical record occurs repeatedly.Determinant attribute value uniqueness, which refers to, does not allow duplicate determinant attribute value to occur repeatedly, than Identification card number such as two employees is identical.
The timeliness index of metadata refers to the information novelty of description data record and the validity of chained address, including Data content novelty and/or chained address validity.Wherein, data content novelty refers to that the metadata of upload or update is It is no to be changed for generation or content in this update cycle.Journal article, data are updated with year granularity as the end of the year 2017 are newly-increased In " publish year " field whether have 2016 before record (containing 2016).Chained address validity refer to for comprising external or The content of internal links address verifies whether that related service information can be obtained by chained address.The present embodiment is based on quality One of integrality, correctness, consistency, uniqueness and timeliness in assessment indicator system or a variety of evaluation indexes are right Metadata is verified.
On the basis of the above embodiments, in the present embodiment between step S1 and S3 further include: according to belonging to self attributes Rank, classify to the corresponding metadata of self attributes;Correspondingly, step S2 further include: according to every a kind of metadata pair The evaluation index answered verifies every a kind of metadata;Wherein, every a kind of metadata and the preparatory associated storage of evaluation index.
Specifically, rank belonging to the self attributes of document category digital resource includes file-level, record rank and field One of rank is a variety of.The rank according to belonging to self attributes classifies to the corresponding metadata of self attributes, itself Rank belonging to attribute can be increased or decreased according to actual needs, so that metadata carries out the flexibility of checksum verification.
On the basis of the above embodiments, the rank according to belonging to the self attributes in the present embodiment, to it is described itself The step of corresponding metadata of attribute is classified specifically includes: according to rank belonging to the self attributes, will it is described itself The corresponding metadata of attribute is divided into file-level meta-data, record one of grade metadata and field level metadata or a variety of.
On the basis of the above embodiments, right according to the corresponding evaluation index of every one kind metadata in the present embodiment The step of every one kind metadata is verified specifically includes: according to legitimacy, integrality and timeliness evaluation index, verifying File directory legitimacy, file designation legitimacy, quantity of documents integrality and the file of the file-level meta-data reach timely One of property is a variety of;According to correctness, integrality and timeliness evaluation index, the file of the record grade metadata is verified One in legitimacy, document location legitimacy, file designation legitimacy, quantity of documents integrality and the file generated timeliness of classifying Kind is a variety of;According to integrality, correctness, consistency, uniqueness and timeliness evaluation index, the field level metadata is verified Record integrality, field integrality, field type legitimacy, field length legitimacy, field format legitimacy, field business Whether legitimacy, field timeliness, field accuracy, field have whether messy code, field unified value substitution, mathematical logic one occur Cause property, content format consistency, record uniqueness, determinant attribute value uniqueness, data content novelty and link address are effective One of property is a variety of.
Specifically, the content verified to file-level meta-data is as shown in table 1, is verified record grade metadata Content is as shown in table 2, and the content verified field level metadata is as shown in table 3.Classification is not verified, and verification is finer, from And make quality evaluation more accurate.
1 file-level meta-data of table verifies content details table
Table 2 records grade metadata and verifies content details table
On the basis of the above embodiments, step S3 described in the present embodiment further include: according to corresponding point of result of verification Number weight corresponding with each evaluation index calculates the score of every a kind of metadata.
Specifically, for any sort metadata, by the corresponding score of each verification result of such metadata multiplied by each verification As a result it is added after the weight of corresponding evaluation index, obtains the score of such metadata.
On the basis of the various embodiments described above, step S3 described in the present embodiment is specifically included: for it is any verify as a result, The corresponding score of verification result is multiplied with the weight of the corresponding evaluation index of verification result;All verifications are tied The corresponding multiplied result of fruit is added, and the total score of the metadata is obtained.
3 field level metadata of table verifies content details table
A kind of quality evaluation system towards document category digital resource metadata is provided in another embodiment of the present invention System, Fig. 2 are the QA system overall structure signal provided in an embodiment of the present invention towards document category digital resource metadata Figure, the system include building module 1, verify module 2 and computing module 3;Wherein:
Module 1 is constructed for the self attributes according to target literature class digital resource, constructs target literature class digital resource The quality evaluation system of middle metadata;
Wherein, target literature class digital resource is the document category digital resource for needing to carry out quality evaluation.Target literature class The self attributes of digital resource include catalogue, name, quantity, classification and position of target literature class digital resource etc..Metadata Also known as broker data or relaying data predominantly describe the information of data attribute for the data for describing data, such as refer to for supporting Show the functions such as storage location, historical data, data-gathering and file record.Quality evaluation system includes multiple for carrying out The evaluation index of quality evaluation.Module 1 is constructed according to the self attributes of target literature class digital resource, constructs target literature class number The quality evaluation system of metadata in word resource.
It verifies module 2 to be used for according to each evaluation index in quality evaluation system, every core is carried out to each metadata It looks into;
Wherein, it verifies and refers to judge whether each metadata reaches each evaluation index in target literature class digital resource.It verifies Based on verifying with computer check, supplemented by desk checking is verified, to obtain inspection result, survey report is exported.Pass through meter Calculation machine program verifies most of metadata to according to evaluation index, obtains initial results, manually performs a small amount of inspection And summarize check results.Module 2 is verified according to each evaluation index in quality evaluation system, each metadata is carried out every It verifies.Inspection result can provide reference for the improvement and promotion of target literature class digital resource quality, such as in inspection result On the basis of recall target literature class digital resource in metadata there are the problem of, thus faster discovery the various of metadata ask Topic.
Computing module 3 is used to calculate metadata according to every weight for verifying result corresponding score and evaluation index Total score.
Wherein, each verifies the corresponding preparatory associated storage of score of result, each is verified, and result is corresponding to be commented The preparatory associated storage of the weight of valence index, computing module 3 is according to every power for verifying result corresponding score and each evaluation index Weight, calculates the total score of metadata.User can quickly distinguish the quality of metadata quality according to the total score of metadata.
The present embodiment by according to the self attributes of target literature class digital resource construct quality evaluation system, according to Each evaluation index in the quality evaluation system carries out every verification to each metadata, according to verification result Computing Meta The total score of data, to realize to the quality evaluation towards document category digital resource metadata, and quality evaluation precision is high.
On the basis of the above embodiments, in the present embodiment quality evaluation system include integrality, it is correctness, consistent One of property, uniqueness and timeliness or a variety of evaluation indexes;Correspondingly, it verifies module to be specifically used for: be commented according to integrality Valence index, whether the data entity verified in metadata lacks, whether data file lacks, whether data record lacks, data Structure whether lack and record in field contents one of whether lack or a variety of;According to correctness evaluation index, member is verified The legitimacies of data, validity, with the presence or absence of messy code and with the presence or absence of one of unified value substitution or a variety of;According to consistency Evaluation index verifies the mathematical logic consistency and/or content format consistency of metadata;According to uniqueness evaluation index, core Look into the data record uniqueness and/or determinant attribute value uniqueness of metadata;According to timeliness index, the data of metadata are verified Content freshness and/or chained address validity.
On the basis of the above embodiments, categorization module in the present embodiment, is used for: the rank according to belonging to self attributes, Classify to the corresponding metadata of self attributes;Correspondingly, verify module to be also used to: a kind of metadata is corresponding comments according to every Valence index verifies every a kind of metadata;Wherein, every a kind of metadata and the preparatory associated storage of evaluation index.
On the basis of the above embodiments, categorization module is specifically used in the present embodiment: according to belonging to the self attributes Rank, the corresponding metadata of the self attributes is divided into file-level meta-data, record grade metadata and field level metadata One of or it is a variety of.
On the basis of the above embodiments, in the present embodiment verify module also particularly useful for: according to legitimacy, integrality and One of timeliness evaluation index is a variety of, verifies the file directory legitimacy of the file-level meta-data, file designation is closed Method, quantity of documents integrality and file reach one of timeliness or a variety of;It is commented according to correctness, integrality and timeliness One of valence index is a variety of, verifies document classification legitimacy, the document location legitimacy, file of the record grade metadata Name one of legitimacy, quantity of documents integrality and file generated timeliness or a variety of;According to integrality, correctness, one One of cause property, uniqueness and timeliness evaluation index are a variety of, verify record integrality, the word of the field level metadata Section integrality, field type legitimacy, field length legitimacy, field format legitimacy, field rightness of business, field timeliness Whether property, field accuracy, field have whether messy code, field unified value substitution, mathematical logic consistency, content format one occur One of cause property, record uniqueness, determinant attribute value uniqueness, data content novelty and link address validity are more Kind.
On the basis of the above embodiments, computing module is also used in the present embodiment: according to the corresponding score of verification result Weight corresponding with each evaluation index calculates the score of every a kind of metadata.
On the basis of the above embodiments, computing module is specifically used in the present embodiment: for any verification as a result, the core The corresponding score of the fruit that comes to an end is multiplied with the weight of the corresponding evaluation index of verification result;By all verification results pair The multiplied result answered is added, and the total score of the metadata is obtained.
The present embodiment provides a kind of quality evaluation equipment towards document category digital resource metadata, Fig. 3 is that the present invention is real Apply example offer the quality evaluation equipment overall structure diagram towards document category digital resource metadata, the equipment include: to A few processor 31, at least one processor 32 and bus 33;Wherein,
Processor 31 and memory 32 pass through bus 33 and complete mutual communication;
Memory 32 is stored with the program instruction that can be executed by processor 31, and the processor calls described program to instruct energy Enough execute method provided by above-mentioned each method embodiment, for example, S1, according to itself belonging to for target literature class digital resource Property, construct the quality evaluation system of metadata in target literature class digital resource;S2, according in quality evaluation system Each evaluation index, every verification is carried out to each metadata;S3, according to the every corresponding score of result and evaluation index verified Weight calculates the total score of metadata.
The present embodiment provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage medium Computer instruction is stored, the computer instruction makes the computer execute method provided by above-mentioned each method embodiment, example Such as include: S1, according to the self attributes of target literature class digital resource, constructs the matter of metadata in target literature class digital resource Measure assessment indicator system;S2 carries out every verification to each metadata according to each evaluation index in quality evaluation system; S3 calculates the total score of metadata according to every weight for verifying result corresponding score and evaluation index.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes: ROM, RAM, magnetic disk or light The various media that can store program code such as disk.
Quality evaluation equipment embodiment towards document category digital resource metadata described above is only schematic , wherein the unit as illustrated by the separation member may or may not be physically separated, it is aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.Those of ordinary skill in the art are without paying creative labor, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.
Finally, the present processes are only preferable embodiment, it is not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent replacement, improvement and so on should be included in protection of the invention Within the scope of.

Claims (10)

1. a kind of quality evaluating method towards document category digital resource metadata characterized by comprising
S1 constructs metadata in the target literature class digital resource according to the self attributes of target literature class digital resource Quality evaluation system;
S2 carries out every verification to each metadata according to each evaluation index in the quality evaluation system;
S3, according to every weight for verifying result corresponding score and each evaluation index, calculate the metadata must Point.
2. the method according to claim 1, wherein the quality evaluation system include integrality, it is correct One of property, consistency, uniqueness and timeliness or a variety of evaluation indexes;
Correspondingly, the step S2 is specifically included:
According to the Integrity Assessment index, whether the data entity verified in the metadata is lacked, whether data file lacks It loses, whether field contents one of lack or a variety of during whether data record lacks, whether data structure lacks and record;
According to the correctness evaluation index, verifies the legitimacy of the metadata, validity, with the presence or absence of messy code and whether deposits Unified value substitution one of or it is a variety of;
According to the Conformance Assessment index, the mathematical logic consistency and/or content format consistency of the metadata are verified;
According to the uniqueness evaluation index, data record uniqueness and/or the determinant attribute value for verifying the metadata are unique Property;
According to the timeliness index, the data content novelty and/or chained address validity of the metadata are verified.
3. according to the method described in claim 2, it is characterized in that, between the step S1 and S3 further include:
According to rank belonging to the self attributes, classify to the corresponding metadata of the self attributes;
Correspondingly, the step S2 further include:
According to the corresponding evaluation index of every one kind metadata, every one kind metadata is verified;
Wherein, every one kind metadata and the preparatory associated storage of the evaluation index.
4. according to the method described in claim 3, it is characterized in that, the rank according to belonging to the self attributes, to it is described from The step of corresponding metadata of body attribute is classified specifically includes:
According to rank belonging to the self attributes, the corresponding metadata of the self attributes is divided into file-level meta-data, note Record one of grade metadata and field level metadata or a variety of.
5. according to the method described in claim 4, it is characterized in that, according to the corresponding evaluation index of every one kind metadata, The step of verifying every one kind metadata specifically includes:
According to the correctness, integrality and timeliness evaluation index, the file directory for verifying the file-level meta-data is legal Property, file designation legitimacy, quantity of documents integrality and file reach one of timeliness or a variety of;
According to the correctness, integrality and timeliness evaluation index, the document classification for verifying the record grade metadata is legal One of property, document location legitimacy, file designation legitimacy, quantity of documents integrality and file generated timeliness are more Kind;
According to the integrality, correctness, consistency, uniqueness and timeliness evaluation index, the field level metadata is verified Record integrality, field integrality, field type legitimacy, field length legitimacy, field format legitimacy, field business Whether legitimacy, field timeliness, field accuracy, field have whether messy code, field unified value substitution, mathematical logic one occur Cause property, content format consistency, record uniqueness, determinant attribute value uniqueness, data content novelty and link address are effective One of property is a variety of.
6. according to the method described in claim 3, it is characterized in that, the step S3 further include:
According to the corresponding score of result and the corresponding weight of each evaluation index is verified, obtaining for every one kind metadata is calculated Point.
7. wanting any method of 1-6 according to right, which is characterized in that the step S3 is specifically included:
For any verification as a result, the weight of the corresponding score of verification result evaluation index corresponding with the verification result carries out It is multiplied;
The corresponding multiplied result of all verification results is added, the total score of the metadata is obtained.
8. a kind of QA system towards document category digital resource metadata characterized by comprising
It constructs module and constructs the target literature class digital resource for the self attributes according to target literature class digital resource The quality evaluation system of middle metadata;
Module is verified, for carrying out core to each metadata according to each evaluation index in the quality evaluation system It looks into;
Computing module, for calculating the metadata according to the weight for verifying result corresponding score and each evaluation index Total score.
9. a kind of quality evaluation equipment towards document category digital resource metadata characterized by comprising
At least one processor, at least one processor and bus;Wherein,
The processor and memory complete mutual communication by the bus;
The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program to instruct energy Enough methods executed as described in claim 1 to 7 is any.
10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Computer instruction is stored up, the computer instruction makes the computer execute the method as described in claim 1 to 7 is any.
CN201810707861.1A 2018-07-02 2018-07-02 A kind of quality evaluating method and system towards document category digital resource metadata Pending CN109299062A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810707861.1A CN109299062A (en) 2018-07-02 2018-07-02 A kind of quality evaluating method and system towards document category digital resource metadata

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810707861.1A CN109299062A (en) 2018-07-02 2018-07-02 A kind of quality evaluating method and system towards document category digital resource metadata

Publications (1)

Publication Number Publication Date
CN109299062A true CN109299062A (en) 2019-02-01

Family

ID=65167844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810707861.1A Pending CN109299062A (en) 2018-07-02 2018-07-02 A kind of quality evaluating method and system towards document category digital resource metadata

Country Status (1)

Country Link
CN (1) CN109299062A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298564A (en) * 2019-06-17 2019-10-01 迪普佰奥生物科技(上海)有限公司 Biomedical product evaluation method, device, medium, terminal based on artificial intelligence
CN111026742A (en) * 2019-12-05 2020-04-17 东莞中国科学院云计算产业技术创新与育成中心 Data quality evaluation method and device, computer equipment and storage medium
CN111897803A (en) * 2020-08-17 2020-11-06 国网辽宁省电力有限公司信息通信分公司 Database integrity evaluation method for power industry business system
CN112559510A (en) * 2021-01-18 2021-03-26 山东省齐鲁大数据研究院 Method and system for evaluating open data quality
CN113127482A (en) * 2019-12-31 2021-07-16 奇安信科技集团股份有限公司 Data quality analysis method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530347A (en) * 2013-10-09 2014-01-22 北京东方网信科技股份有限公司 Internet resource quality assessment method and system based on big data mining
CN104484448A (en) * 2014-12-26 2015-04-01 浙江协同数据系统有限公司 Assessment method for relational data quality
CN107368957A (en) * 2017-07-04 2017-11-21 广西电网有限责任公司电力科学研究院 A kind of construction method of equipment condition monitoring quality of data evaluation and test system
CN107748775A (en) * 2017-10-17 2018-03-02 上海计算机软件技术开发中心 A kind of data governing system based on the quality of data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530347A (en) * 2013-10-09 2014-01-22 北京东方网信科技股份有限公司 Internet resource quality assessment method and system based on big data mining
CN104484448A (en) * 2014-12-26 2015-04-01 浙江协同数据系统有限公司 Assessment method for relational data quality
CN107368957A (en) * 2017-07-04 2017-11-21 广西电网有限责任公司电力科学研究院 A kind of construction method of equipment condition monitoring quality of data evaluation and test system
CN107748775A (en) * 2017-10-17 2018-03-02 上海计算机软件技术开发中心 A kind of data governing system based on the quality of data

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298564A (en) * 2019-06-17 2019-10-01 迪普佰奥生物科技(上海)有限公司 Biomedical product evaluation method, device, medium, terminal based on artificial intelligence
CN111026742A (en) * 2019-12-05 2020-04-17 东莞中国科学院云计算产业技术创新与育成中心 Data quality evaluation method and device, computer equipment and storage medium
CN113127482A (en) * 2019-12-31 2021-07-16 奇安信科技集团股份有限公司 Data quality analysis method and device, computer equipment and storage medium
CN113127482B (en) * 2019-12-31 2024-03-26 奇安信科技集团股份有限公司 Data quality analysis method, device, computer equipment and storage medium
CN111897803A (en) * 2020-08-17 2020-11-06 国网辽宁省电力有限公司信息通信分公司 Database integrity evaluation method for power industry business system
CN111897803B (en) * 2020-08-17 2023-10-20 国网辽宁省电力有限公司信息通信分公司 Database integrity evaluation method for power industry service system
CN112559510A (en) * 2021-01-18 2021-03-26 山东省齐鲁大数据研究院 Method and system for evaluating open data quality

Similar Documents

Publication Publication Date Title
CN109299062A (en) A kind of quality evaluating method and system towards document category digital resource metadata
US10417120B2 (en) Pluggable fault detection tests for data pipelines
CA2773919C (en) Systems and methods for creating intuitive context for analysis data
US20160078113A1 (en) Validating code of an extract, transform and load (etl) tool
US10268749B1 (en) Clustering sparse high dimensional data using sketches
CN107844588A (en) A kind of processing method of data dictionary, device, storage medium and processor
Müller et al. Goodness-of-fit tests for the cure rate in a mixture cure model
AU2018235926A1 (en) Property graph data model representing system architecture
CN112597062B (en) Military software structured quality data extraction method and device and software testing device
CN105868956A (en) Data processing method and device
CN108460068A (en) Method, apparatus, storage medium and the terminal that report imports and exports
CN113868498A (en) Data storage method, electronic device, device and readable storage medium
CN110134596A (en) The generation method and terminal device of test document
CN112699142A (en) Cold and hot data processing method and device, electronic equipment and storage medium
CN113268665A (en) Information recommendation method, device and equipment based on random forest and storage medium
CN109101431A (en) A kind of testing case management, computer readable storage medium and terminal device
CN110532612A (en) The operation data processing method and processing device of ship power system
CN110134646A (en) The storage of knowledge platform service data and integrated approach and system
US10146809B2 (en) Mining of policy data source description based on file, storage and application meta-data
WO2019192310A1 (en) Group network identification method and device, computer device, and computer-readable storage medium
CN109584091B (en) Generation method and device of insurance image file
Lacroix et al. Lessons learnt in industrial data platform integration
EP2731021A1 (en) Apparatus, program, and method for reconciliation processing in a graph database
CN116701714A (en) Data storage method, device, equipment and medium based on multi-way tree
CN115952224A (en) Heterogeneous report integration method, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190201