CN113052411A - Data product quality evaluation method and device - Google Patents

Data product quality evaluation method and device Download PDF

Info

Publication number
CN113052411A
CN113052411A CN201911364505.5A CN201911364505A CN113052411A CN 113052411 A CN113052411 A CN 113052411A CN 201911364505 A CN201911364505 A CN 201911364505A CN 113052411 A CN113052411 A CN 113052411A
Authority
CN
China
Prior art keywords
index
data
score
evaluation
indexes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911364505.5A
Other languages
Chinese (zh)
Inventor
苏静
司亚清
肖庆军
关虎
王涛
李广凯
王明月
覃思瑶
郭晓峰
郑金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201911364505.5A priority Critical patent/CN113052411A/en
Publication of CN113052411A publication Critical patent/CN113052411A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a method and a device for evaluating the quality of a data product, wherein the method comprises the following steps: determining the weight of each secondary index in a pre-established data product quality evaluation index framework by using an analytic hierarchy process, and obtaining the total weight of objective indexes and the total weight of subjective indexes in the secondary indexes based on the weight of each secondary index; automatically collecting quality evaluation characteristic parameters of objective indexes in the data product, and calculating evaluation scores of the objective indexes of the data product; calculating the overall objective index score of the data product based on the weight of each objective index; receiving language evaluation information of each subjective index of the data product by a plurality of evaluators, converting the language evaluation information into a trapezoidal fuzzy number, and calculating the score of the data product on the current subjective index based on the weight of the evaluators; calculating the overall subjective index score of the data product based on the weight and the score of each subjective index; and obtaining a data product quality evaluation comprehensive score based on the objective index overall score and the subjective index overall score.

Description

Data product quality evaluation method and device
Technical Field
The invention relates to the technical field of data product transaction, in particular to a data product quality evaluation method and device.
Background
The data entering the data product trading market after being commercialized is a necessary means for the data to exert the maximum utilization value, and the quality level of the data product is an important influence factor on whether the data product trading market can run orderly and efficiently. When low-quality data enters market circulation, the market trading efficiency is reduced, the operation cost of a data product trading market is increased, a plurality of data quality problems such as data redundancy, data distortion, data loss and data inconsistency are brought to a data buyer, and the maximum utilization value of the data obtained by the data buyer is hindered. At present, the problem of data product quality evaluation in the academic community is still in an exploration stage, and no effective solution with very high recognition degree exists.
Although some people have already proposed data quality evaluation ideas, such as data quality evaluation frameworks (DQAF) constructed by existing DAMA data management knowledge system guidelines, Laura Sebastian-Coleman and the like, and comprehensive data quality management (TDQM) proposed by Richard y. Most of the prior art schemes are qualitative analysis of data quality, and no reasonable index quantification method exists, so that the data quality evaluation result is doped with too many subjective factors. In addition, the existing data quality evaluation does not consider the characteristics of the data from the production.
Therefore, how to more objectively measure the quality level of the data product and reflect the quality level of the data product more intuitively in a quantitative form is a problem to be solved urgently.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for evaluating quality of a data product, so as to solve the problem in the prior art that the quality evaluation of the data product is difficult to be reasonably quantified.
One aspect of the present invention provides a data product quality evaluation method, including the steps of:
determining the weight of each secondary index in a pre-established data product quality evaluation index architecture by using an analytic hierarchy process, and obtaining the total weight of objective indexes and the total weight of subjective indexes in the secondary indexes based on the weight of each secondary index, wherein the pre-established data product quality evaluation index architecture comprises a plurality of data evaluation dimensions, each data evaluation dimension comprises at least one primary index, each primary index comprises a plurality of secondary indexes, and the secondary indexes under each data evaluation dimension are objective indexes and subjective indexes;
automatically collecting quality evaluation characteristic parameters of each objective index in the data product, and calculating the evaluation score of each objective index of the data product based on the collected quality evaluation characteristic parameters of each objective index;
calculating the overall objective index score of the data product based on the weight and the score of each objective index;
receiving language evaluation information of each subjective index of the data product by a plurality of evaluators, converting the language evaluation information into a trapezoidal fuzzy number, and calculating the score of the data product on the current subjective index based on the weight of the evaluators;
calculating the overall subjective index score of the data product based on the weight and the score of each subjective index;
and obtaining a data product quality evaluation comprehensive score based on the objective index overall score and the subjective index overall score.
In an embodiment of the present invention, the step of calculating the overall score of the subjective index of the data product based on the weight and the score of each subjective index includes:
based on the weight and the score of each subjective index, representing the subjective index comprehensive score of the data product by a trapezoidal fuzzy number;
determining the optimal point and the worst point of the quality of the data product, and calculating the distance between the subjective index comprehensive score represented by the trapezoidal fuzzy number and the optimal point and the worst point by using an ideal point method to obtain a data product quality approach degree coefficient;
and obtaining the overall score of the subjective index of the data product based on the quality posted progress coefficient of the data product and the total weight of the subjective index.
In an embodiment of the present invention, the step of calculating the overall score of the subjective index of the data product by using the ideal point method includes: calculating the overall initial subjective index score of the data product represented by the trapezoidal fuzzy number
In an embodiment of the present invention, the step of obtaining a data product quality evaluation comprehensive score based on the objective index overall score and the subjective index overall score includes: and taking the sum of the overall score of the objective index and the overall score of the subjective index as the comprehensive score of the quality evaluation of the data product.
In an embodiment of the present invention, the plurality of data evaluation dimensions include: a data content rating dimension, a product packaging rating dimension, and a market circulation rating dimension.
In an embodiment of the present invention, the data content evaluation dimension includes part or all of the following primary indexes: accuracy index, integrity index, timeliness index, uniqueness index and effectiveness index; the product package evaluation dimension comprises the following primary indexes: metadata normalization indexes; the market circulation evaluation dimension comprises the following first-level indexes: a service level indicator and/or a market feedback indicator.
In an embodiment of the present invention, the accuracy index includes the following two levels: a grammar accuracy index and/or a semantic accuracy index; the integrity index comprises part or all of the following two-level indexes: describing an integrity index, a factual integrity index, a column integrity index and a reference integrity index; the timeliness indexes comprise the following two-level indexes: content timeliness indexes and/or acquisition timeliness indexes; the uniqueness index comprises the following two-level indexes: an extent uniqueness index and/or a depth uniqueness index; the effectiveness index comprises the following two-level indexes: a format validity index and/or a quantity validity index; the metadata normalization indexes comprise the following two-level indexes: a metadata accuracy index and/or a metadata integrity index; the service level indicators include the following two-level indicators: the credit scoring index of the seller and/or the evaluation degree of the buyer to the seller; the market feedback indexes comprise the following two-level indexes: a product sales index and/or a buyer score index for the product; the column integrity index, the content timeliness index, the collection timeliness index, the depth uniqueness index, the format validity index, the quantity validity index, the metadata integrity index, the seller credit rating index, the buyer-to-seller goodness index and the buyer-to-product rating index are objective indexes; the grammar accuracy index, the semantic accuracy index, the integrity index, the fact integrity index, the reference integrity index, the breadth uniqueness index, the metadata accuracy index and the product sales index are subjective indexes.
In an embodiment of the present invention, the step of calculating the evaluation score of each objective index of the data product based on the collected quality evaluation characteristic parameters of each objective index includes some or all of the following steps:
the evaluation scores of the column integrity indicators were calculated using the following formula: s1=1-∑[WiNumber of missing values in ith column/total number of values in ith column)](ii) a Wherein S is1A rating score representing a column integrity indicator; wiRepresenting the weight of the ith column attribute in all the column attributes of the data set;
the evaluation score of the content timeliness index is calculated by the following formula: s2=e^(-Δt)=e^(t2-t1) (ii) a Wherein S is2Evaluation score, t, representing content timeliness index1And t2Respectively representing the data release time and the latest data content coverage time;
the evaluation score S of the acquisition timeliness index is calculated by the following formula3:S3E ^ (-f); wherein S is3The evaluation value of the acquisition timeliness index is shown, and f shows the data acquisition frequency;
the evaluation score of the depth uniqueness index is calculated using the following formula: s41-data line repetition number ÷ total number of data lines, where S4A rating score representing a depth uniqueness index;
the evaluation score of the format validity index is calculated using the following formula: s5Format correct value sum ÷ data record value sum, where S5A rating score representing a format validity index;
the evaluation score of the quantitative significance index is calculated using the following formula: s6Total number of correct values in quantity format ÷ total number of data record values, where S6Expressing the evaluation value of the collected timeliness index;
the evaluation score of the metadata integrity indicator is calculated using the following formula: s7-total number of metadata non-empty fields ÷ total number of canonical metadata fields, wherein S7A rating score representing a data integrity indicator;
calculating a rating score for the seller credit rating indicator based on the seller credit rating;
calculating the evaluation score of the buyer goodness index by using the following formula: s9Total number of good order of data buyer ÷ total number of completed order of data seller, wherein S9The evaluation score of the index of the goodness of the buyer is represented;
and calculating the evaluation score of the buyer scoring index by collecting the scores of the data buyer to the data product after the data product transaction is completed.
Another aspect of the present invention also provides a data product quality evaluation device comprising a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the device implementing the steps of the method according to any one of claims 1 to 8 when the computer instructions are executed by the processor.
A further aspect of the invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth above.
The data product quality evaluation method and the evaluation device provided by the invention are based on the characteristics of the data product content, the data quality is considered from five dimensions of data accuracy, data integrity, data timeliness, data uniqueness and data validity, the corresponding refinement secondary index under each dimension is designed, and the quality level of the data resource is evaluated completely and comprehensively. The invention can scientifically and visually display the quality level of the data product in a digital form.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present invention will be more clearly understood from the detailed description that follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
fig. 1 is a schematic diagram of a data product quality evaluation architecture according to an embodiment of the present invention.
Fig. 2 is a schematic flow chart of a data product quality evaluation method according to an embodiment of the present invention.
Fig. 3 is a flowchart illustrating a subjective index quantization method according to an embodiment of the present invention.
FIG. 4 is a diagram of an image as a function of gradient blur number.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.
It should be emphasized that the term "comprises/comprising/comprises/having" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
The invention starts from three aspects of data content, product packaging and market circulation aiming at data products in a data trading market, and constructs a data product quality evaluation index system (or called data product quality evaluation index architecture) comprising eight primary indexes and eighteen secondary indexes on the basis of the established data quality evaluation index system. Meanwhile, an index measuring and quantifying method according to a data product quality evaluation index system is designed, so that the quality level of the data product is visually displayed in a digital form.
The data quality evaluation index system is a complete and comprehensive data quality evaluation index system, the data quality is considered from five dimensions of data accuracy, data integrity, data timeliness, data uniqueness and data effectiveness from the characteristics of data contents, and corresponding refinement secondary indexes under each dimension are designed, so that the quality level of data resources can be completely and comprehensively evaluated.
The data product quality evaluation index system constructed based on the data quality evaluation index system in the embodiment of the present invention is shown in table 1 below.
Table 1. data product quality evaluation index system:
Figure BDA0002338063750000051
Figure BDA0002338063750000061
as can be seen from table 1, on the basis of the data quality evaluation index system, the data product quality evaluation index system of the present invention further sets forth the characteristics after data productization from the perspective of data vendors and the perspective of data product trading markets, mainly including metadata normalization, data vendor service level and market feedback. These features are data productization features that can be automatically extracted by the transaction system.
As shown in table 1, each secondary index can be divided into an objective index and a subjective index, wherein:
the objective index may include: column integrity index, content timeliness index, acquisition timeliness index, depth uniqueness index, format validity index, quantity validity index, metadata integrity index, seller credit scoring index, buyer-to-seller goodness index and buyer-to-product scoring index.
Subjective indicators may include: grammar accuracy index, semantic accuracy index, integrity index, factual integrity index, referential integrity index, breadth uniqueness index, metadata accuracy index and product sales index.
The invention designs different quantification methods aiming at two indexes, namely subjective index and objective index. As shown in fig. 1, the overall index quantization concept of the present invention is as follows: in order to reduce the influence of human subjective factors on the data product quality evaluation result to the maximum extent and improve the reliability of the data product quality evaluation method, eighteen secondary indexes in a data product quality evaluation index system are divided into ten objective indexes and eight subjective indexes, and the objective indexes and the subjective indexes are measured, calculated and quantified by adopting different methods respectively, for example, for the objective indexes, the index values are calculated based on the characteristics of the data product, and further, the overall scores of the objective indexes can be obtained based on the weights of the objective indexes; for the index in charge, the expert can evaluate each index and convert the index into a trapezoidal fuzzy number, and then an ideal point method (i.e. TOPSIS method) is used for obtaining the overall score of the subjective index based on the weight of the expert and the trapezoidal fuzzy number of each index. When the data quality is evaluated by using a data product quality evaluation index system, firstly, the weights of all secondary indexes are determined by experts by using an analytic hierarchy process, meanwhile, the sum of the weights of the subjective indexes is calculated to obtain the total weight of the subjective indexes, and then, the performance scores S, C of the data products on the objective indexes and the subjective indexes are respectively measured and calculated, so that the data product quality comprehensive evaluation score Q is S + C.
Fig. 2 is a schematic flow chart of a data product quality evaluation method according to an embodiment of the present invention. As shown in fig. 2, the method comprises the following steps:
step S110, determining the weight of each secondary index in the pre-established data product quality evaluation index framework by using an analytic hierarchy process, and obtaining the total weight of the objective index and the total weight of the subjective index in the secondary indexes based on the weight of each secondary index.
The pre-established data product quality evaluation index architecture comprises a plurality of data evaluation dimensions, each data evaluation dimension comprises at least one primary index, each primary index comprises a plurality of secondary indexes, and the secondary indexes under each data evaluation dimension are objective indexes and subjective indexes. Preferably, the pre-established data product quality evaluation index structure has the structure shown in table 1, but the present invention is not limited thereto, and the secondary index may be divided into more or less indexes.
The analytic hierarchy process involved in the step is an existing system analysis process, and the weight of each layer index is systematically determined through the steps of establishing a problem hierarchical relationship, constructing a judgment matrix, determining a weight vector, performing consistency check on the weight vector, performing hierarchical total sequencing and performing consistency check on the weight vector, and further determining the optimal scheme in the alternative schemes. The hierarchical analysis method has the advantages of complete theoretical basis, rigorous structure setting, simplicity in problem solving and great advantage in solving the problem of non-structuring.
Since the specific implementation of determining the weights of different indexes based on the analytic hierarchy process belongs to the prior art, no further description is provided herein.
And step S120, automatically collecting quality evaluation characteristic parameters of each objective index related to the data product, and calculating the evaluation score of each objective index of the data product based on the collected quality evaluation characteristic parameters of each objective index.
The step is used for quantifying the quality evaluation of the objective indexes, and in the step, the quality evaluation characteristic parameters of each objective index are collected from the data product or the trading platform by analyzing the content of the data product or the trading related information of the data product, so that the evaluation score of each objective index of the data product is calculated. An objective index is an index that can be directly quantified by automated means. The quantification of each objective index will be described in detail later.
In this step, it is preferable to calculate the evaluation score of each objective index of the data product based on the quality evaluation characteristic parameters of all 10 objective indexes shown in the foregoing table 1, but the present invention is not limited thereto, and more or less objective indexes may be selected for the quantification of the quality evaluation.
Step S130, calculating the overall objective index score S of the data product based on the weight and the score of each objective index.
Step S140, receiving language evaluation information of each subjective index of the data product for a plurality of evaluators, converting the language evaluation information into a trapezoidal fuzzy number, and calculating the score of the data product on the current subjective index based on the weight of the evaluators.
The step belongs to a quality evaluation quantification step of subjective indexes.
And step S150, calculating the overall subjective index score C of the data product based on the weight and the score of each subjective index.
And step S160, obtaining a data product quality evaluation comprehensive score based on the obtained objective index overall score and the subjective index overall score.
For example, the overall score S of the objective index and the overall score C of the subjective index are added to obtain the overall score Q of the data product quality evaluation: q ═ S + C.
The method of quantifying the objective index in step S120 is described below.
The objective index is measured and calculated, and the quality of the data product is evaluated mainly according to the performance of the content of the data product on each index. Each specific secondary index estimation method will be explained below.
1) Column integrity
Column integrity is reflected by a column missing value ratio, and the score S of the data product on the column integrity1The specific measuring and calculating method comprises the following steps:
S1=1-∑[Winumber of missing values in ith column/total number of values in ith column)];
Wherein, WiMeans that the ith column attribute occupies the weight of all the column attributes in the data set, 0<Wi<1. Calculated S1Typically between 0 and 1.
2) Content timeliness
Content timeliness passes through data release time t1With the latest time t of data content overlay2The difference Δ t, where the data content covering the latest time refers to the time point recorded in the data closest to the current time. The invention uses the exponential function y ═ e-xStandardizing the delta t to obtain the score S of the data product on the content timeliness2The specific measuring and calculating method comprises the following steps:
S2=e^(-Δt)=e^(t2-t1)。
3) acquisition timeliness
The timeliness of data content acquisition can be measured by the data acquisition frequency f in unit time, and is similar to the timeliness of the content, and an exponential function y-e is adopted in the method-xTo standardize the collection frequency f to obtain the score S of the data product on the collection timeliness index3The specific measuring and calculating method comprises the following steps:
S3=e^(-f)。
4) depth uniqueness
The depth uniqueness can be reflected by the data line repetition number ratio, and the score S of the data product on the depth uniqueness4The specific measuring and calculating method comprises the following steps:
S41-number of data line repeats ÷ total number of data lines.
5) Format validation
The format validity refers to considering and evaluating the data value from the aspects of data type, predefined enumeration value, storage format and the like, and judging whether the data value meets the special requirements of the data type of the attribute on a certain attribute. For example, the ID number format can only be numeric and 18 bits in length. The format effectiveness is reflected by the proportion of the correct value of the format in the data, and the data product scores S on the format effectiveness5The specific measuring and calculating method comprises the following steps:
S5format correct value sum ÷ data record value sum
6) Quantity validity
Quantity validity refers to whether a data value meets requirements on precision and value range. For example, age is a positive integer and is typically less than 130. The quantity effectiveness is reflected by the proportion of the quantity format effective records in the data values, and the data products score S on the quantity effectiveness6The specific measuring and calculating method comprises the following steps:
S6number format correct value total ÷ data record value total.
7) Metadata integrity
Metadata integrity is reflected by the proportion of complete fields in metadata to the total number of fields, i.e. the proportion of non-null fields in metadata to the total number of fields, the data product scores S on metadata integrity7The specific measuring and calculating method comprises the following steps:
S7the total number of metadata non-empty fields ÷ the total number of canonical metadata fields.
8) Seller credit scoring
The seller credit score is reflected by the social influence of the data seller and the historical data left by the data seller in the data transaction system, and the data transaction system automatically evaluates the seller credit grade according to the historical behavior of the data seller and gives a data seller credit score S8(0≤S8Less than or equal to 1). The credit rating of the data seller may be evaluated, for example, based on the data seller's historical behavior, performance capabilities, etc. in the data transaction system.
9) Goodness of buyer
The evaluation degree of the buyer is reflected by the proportion of the total evaluation orders of the data buyer to the data seller in the total number of the data product transaction completion orders, and the score S of the data product on the evaluation degree of the buyer9The specific measuring and calculating method comprises the following steps:
S9total number of good order orders for the data buyer ÷ total number of completed orders for the data seller.
In other embodiments of the invention, the average star rating score may be calculated based on the scores corresponding to the star ratings of the data buyers to the data sellers.
10) Buyer scoring
The buyer score is embodied by the data buyer scoring the data product after the data product transaction is completed, and the transaction system automatically collects and calculates the comprehensive score S of the data product on the buyer score10(0≤S10≤1)。
In conclusion, the scores of the data products on the objective indexes are determined by the attribute characteristics of the data, and are not influenced by other human factors, so that the objectivity and the effectiveness of the quality evaluation result of the data products are ensured.
In step S130, after the scores of the data products on the single objective indexes are obtained, the weights of the objective indexes are integrated, so that the score S of the data product quality on the objective indexes can be calculated, wherein the larger S, the better the data product is represented on the objective indexes, and the higher the data quality level is.
The subjective index quantization method in step S140 is described below.
Subjective indexes are indexes that are difficult to directly quantify by an automated means, and mainly include: grammar accuracy, semantic accuracy, description completeness, fact completeness, referential completeness, breadth uniqueness, metadata accuracy, and product sales.
In the invention, as shown in fig. 3, the expert is adopted to perform language evaluation on the expression of the data product on each subjective index, and then the idea of converting the language evaluation of the character into the trapezoidal fuzzy number by using the fuzzy mathematical theory is utilized to quantify the subjective index evaluation. The following will describe the specific implementation steps of the subjective index quantification method.
Expert language evaluation
Inviting m experts with authority in data quality evaluation to perform language evaluation on the performance of the data product on each subjective index to obtain expert language evaluation information Lij(i.e. the language evaluation of the ith expert on the jth subjective quality index of the data product, i is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to 8 and is an integer), and determining the weight P of each experti. In the expert language evaluation process, different experts are allowed to have different language evaluation scales, and the deviation of the data product quality evaluation result caused by the inconsistency of the evaluation standards of the different experts is reduced. Where the language scale refers to a standard scale used by a person to evaluate a thing in language, and the language granularity refers to how thick a set of language phrases is divided, e.g., { very poor, bad, general, good, very good } can be used as the language scale for evaluating a thing, which is a set of language phrases with a language granularity of 7.
(II) converting language evaluation information into trapezoidal fuzzy number by using fuzzy theory
The trapezoidal fuzzy number is an important concept in fuzzy mathematics and is an effective tool for expressing uncertainty problems in life by using mathematical languages. In fuzzy mathematics, it is considered that, given a fuzzy set in a domain of interest U, for any x ∈ U, there is a unique U (x) ∈ [0, 1] corresponding to it, which represents the degree of membership x to U, and U (x) is called as the membership function of x.
If a is a trapezoidal fuzzy number, a is (a, b, c, d), and its membership function is:
Figure BDA0002338063750000111
u(x)=1,x∈[b,c]
Figure BDA0002338063750000112
u (x) 0, others.
The function image corresponding to the trapezoidal blur number is shown in fig. 4.
The specific conversion formula for converting the language evaluation information into the trapezoidal fuzzy number is designed as follows:
Figure BDA0002338063750000113
wherein τ is a natural number greater than 0,2 τ +1 represents the granularity of the language phrase set, k is an integer between [0,2 τ +1], and Ak represents the kth element in the language phrase set.
Obtaining the evaluation information A of each expert on the data product quality expressed by the trapezoidal fuzzy number by utilizing the conversion relation between the language evaluation and the trapezoidal fuzzy numberij(aij,bij,cij,dij) Wherein A isij(aij,bij,cij,dij) Representing the trapezoidal fuzzy number corresponding to the language evaluation information of the ith expert on the jth subjective index, wherein i is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to 8, and both i and j are integers.
(III) the comprehensive scores of all subjective indexes of the data product are expressed by trapezoidal fuzzy numbers
Language evaluation information A for quality of data product by combining each expertijThe comprehensive score A of the data product on a certain subjective index can be determined according to the weight of each expertjThe specific calculation method comprises the following steps:
Figure BDA0002338063750000121
(IV) the comprehensive score of the data product on the whole subjective index is expressed by a trapezoidal fuzzy number
By AjWith the weight of each subjective index IjThe comprehensive score A of the data product expressed by the trapezoidal fuzzy number on the subjective index can be obtained by calculation, and the specific calculation formula is as follows:
Figure BDA0002338063750000122
fifthly, further, the data product quality sticking progress can be calculated by using an ideal point method to obtain the data product quality evaluation comprehensive score
The ideal point method, i.e., the TOPSIS method, is a commonly used calculation method in multi-target decision making when decision analysis is performed. The ideal point method comprises the steps of calculating the distance between an evaluated object and an optimal point and the distance between the evaluated object and the worst point to obtain a penetration coefficient of the evaluated object, and determining the optimal scheme in the decision problem through the penetration coefficient sequencing. Only when the evaluated object is closest to the optimal point and is farthest from the worst point, the evaluated object can be selected as the optimal scheme.
In the present invention, the data quality worst point is d0(0, 0, 0, 0), with the most preferred point d1(1,1,1,1). And (3) calculating the distances between the A (a, b, C and d) and the optimal point and the worst point to obtain a data product quality fitting progress coefficient, and combining the total weight of the subjective indexes to obtain a final score C of the data product quality subjective index evaluation.
The specific calculation process is as follows:
calculating the distance D between A (a, b, c, D) and the optimal point and the worst point respectively1、D0
Figure BDA0002338063750000123
Figure BDA0002338063750000124
Calculating a data product quality fitting progress coefficient:
Figure BDA0002338063750000125
the larger the sticking progress coefficient c is, the closer A (a, b, c, d) is to the worst point and to the optimal point, and the higher the quality level of the data product is in terms of subjective indexes.
Thirdly, calculating the comprehensive score C of the quality of the data product on the subjective index:
C=c*It
wherein, ItThe subjective index is the weight of all indexes.
After calculating the overall objective index score S and the overall subjective index score C, a data product quality evaluation comprehensive score Q can be obtained in step S160: q ═ S + C.
Based on the method, the quality level of the data product can be effectively and reasonably evaluated according to the characteristics of the data and the expert language evaluation, and the quality level of the data product can be visually displayed in a digital form.
The invention has the following advantages:
a complete and comprehensive data quality evaluation index system is established, and a data product quality evaluation index system with universality is established on the basis of the data quality evaluation index system by combining the productization characteristics of data, which can be automatically extracted by a trading system. The quality level of the data product is comprehensively measured from the aspects of data content, data product packaging, data product market circulation and the like, and a reference basis is provided for solving the problem of data product quality evaluation.
The invention provides a method for measuring, calculating and quantifying each secondary index in an index system aiming at a data product quality evaluation index system, can effectively and reasonably evaluate the data product quality level according to the characteristics of the data and expert language evaluation, and visually displays the data product quality level in a digital form.
The invention also provides a data product quality evaluation device, which comprises a processor and a memory, wherein the memory stores computer instructions, the processor is used for executing the computer instructions stored in the memory, and when the computer instructions are executed by the processor, the device realizes the steps of the method.
The present invention also relates to a storage medium, on which computer program code may be stored, which when executed may implement various embodiments of the method of the present invention, the storage medium may be a tangible storage medium, such as an optical disk, a U-disk, a floppy disk, a hard disk, etc.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A data product quality evaluation method is characterized by comprising the following steps:
determining the weight of each secondary index in a pre-established data product quality evaluation index architecture by using an analytic hierarchy process, and obtaining the total weight of objective indexes and the total weight of subjective indexes in the secondary indexes based on the weight of each secondary index, wherein the pre-established data product quality evaluation index architecture comprises a plurality of data evaluation dimensions, each data evaluation dimension comprises at least one primary index, each primary index comprises a plurality of secondary indexes, and the secondary indexes under each data evaluation dimension are objective indexes and subjective indexes;
automatically collecting quality evaluation characteristic parameters of each objective index in the data product, and calculating the evaluation score of each objective index of the data product based on the collected quality evaluation characteristic parameters of each objective index;
calculating the overall objective index score of the data product based on the weight and the score of each objective index;
receiving language evaluation information of each subjective index of the data product by a plurality of evaluators, converting the language evaluation information into a trapezoidal fuzzy number, and calculating the score of the data product on the current subjective index based on the weight of the evaluators;
calculating the overall subjective index score of the data product based on the weight and the score of each subjective index;
and obtaining a data product quality evaluation comprehensive score based on the objective index overall score and the subjective index overall score.
2. The method according to claim 1, wherein the step of calculating the overall score of the subjective index of the data product based on the weight and the score of each subjective index comprises:
based on the weight and the score of each subjective index, representing the subjective index comprehensive score of the data product by a trapezoidal fuzzy number;
determining the optimal point and the worst point of the quality of the data product, and calculating the distance between the subjective index comprehensive score represented by the trapezoidal fuzzy number and the optimal point and the worst point by using an ideal point method to obtain a data product quality approach degree coefficient;
and obtaining the overall score of the subjective index of the data product based on the quality posted progress coefficient of the data product and the total weight of the subjective index.
3. The method according to claim 1, wherein the step of calculating the overall score of the subjective index of the data product using the ideal point method comprises:
and calculating the overall initial subjective index score of the data product represented by the trapezoidal fuzzy number.
4. The genus determination method according to claim 1,
the step of obtaining a data product quality evaluation comprehensive score based on the objective index overall score and the subjective index overall score comprises the following steps:
and taking the sum of the overall score of the objective index and the overall score of the subjective index as the comprehensive score of the quality evaluation of the data product.
5. The method of claim 1, wherein the plurality of data evaluation dimensions comprise: a data content rating dimension, a product packaging rating dimension, and a market circulation rating dimension.
6. The method of claim 5, wherein:
the data content evaluation dimension comprises part or all of the following primary indexes: accuracy index, integrity index, timeliness index, uniqueness index and effectiveness index;
the product package evaluation dimension comprises the following primary indexes: metadata normalization indexes;
the market circulation evaluation dimension comprises the following first-level indexes: a service level indicator and/or a market feedback indicator.
7. The method of claim 6, wherein:
the accuracy indexes comprise the following two-level indexes: a grammar accuracy index and/or a semantic accuracy index;
the integrity index comprises part or all of the following two-level indexes: describing an integrity index, a factual integrity index, a column integrity index and a reference integrity index;
the timeliness indexes comprise the following two-level indexes: content timeliness indexes and/or acquisition timeliness indexes;
the uniqueness index comprises the following two-level indexes: an extent uniqueness index and/or a depth uniqueness index;
the effectiveness index comprises the following two-level indexes: a format validity index and/or a quantity validity index;
the metadata normalization indexes comprise the following two-level indexes: a metadata accuracy index and/or a metadata integrity index;
the service level indicators include the following two-level indicators: the credit scoring index of the seller and/or the evaluation degree of the buyer to the seller;
the market feedback indexes comprise the following two-level indexes: a product sales index and/or a buyer score index for the product;
the column integrity index, the content timeliness index, the collection timeliness index, the depth uniqueness index, the format validity index, the quantity validity index, the metadata integrity index, the seller credit rating index, the buyer-to-seller goodness index and the buyer-to-product rating index are objective indexes;
the grammar accuracy index, the semantic accuracy index, the integrity index, the fact integrity index, the reference integrity index, the breadth uniqueness index, the metadata accuracy index and the product sales index are subjective indexes.
8. The attribute determining method according to claim 7, wherein the step of calculating the evaluation score of each objective index of the data product based on the collected quality evaluation feature parameters of each objective index comprises some or all of the steps of:
the evaluation scores of the column integrity indicators were calculated using the following formula: s1=1-∑[WiNumber of missing values in ith column/total number of values in ith column)](ii) a Wherein S is1A rating score representing a column integrity indicator; wiRepresenting the weight of the ith column attribute in all the column attributes of the data set;
the evaluation score of the content timeliness index is calculated by the following formula: s2=e^(-Δt)=e^(t2-t1) (ii) a Wherein S is2Evaluation score, t, representing content timeliness index1And t2Respectively representing the data release time and the latest data content coverage time;
the evaluation score S of the acquisition timeliness index is calculated by the following formula3:S3E ^ (-f); wherein S is3The evaluation value of the acquisition timeliness index is shown, and f shows the data acquisition frequency;
the evaluation score of the depth uniqueness index is calculated using the following formula: s41-data line repetition number ÷ total number of data lines, where S4A rating score representing a depth uniqueness index;
the evaluation score of the format validity index is calculated using the following formula: s5Format correct value sum ÷ data record value sum, where S5A rating score representing a format validity index;
the evaluation score of the quantitative significance index is calculated using the following formula: s6Total number of correct values in quantity format ÷ total number of data record values, where S6Expressing the evaluation value of the collected timeliness index;
the evaluation score of the metadata integrity indicator is calculated using the following formula: s7-total number of metadata non-empty fields ÷ total number of canonical metadata fields, wherein S7A rating score representing a data integrity indicator;
calculating a rating score for the seller credit rating indicator based on the seller credit rating;
calculating the evaluation score of the buyer goodness index by using the following formula: s9Total number of good order of data buyer ÷ total number of completed order of data seller, wherein S9The evaluation score of the index of the goodness of the buyer is represented;
and calculating the evaluation score of the buyer scoring index by collecting the scores of the data buyer to the data product after the data product transaction is completed.
9. An apparatus for quality assessment of a data product, the apparatus comprising a processor and a memory, wherein the memory has stored therein computer instructions for executing the computer instructions stored in the memory, the apparatus performing the steps of the method of any one of claims 1 to 8 when the computer instructions are executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
CN201911364505.5A 2019-12-26 2019-12-26 Data product quality evaluation method and device Pending CN113052411A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911364505.5A CN113052411A (en) 2019-12-26 2019-12-26 Data product quality evaluation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911364505.5A CN113052411A (en) 2019-12-26 2019-12-26 Data product quality evaluation method and device

Publications (1)

Publication Number Publication Date
CN113052411A true CN113052411A (en) 2021-06-29

Family

ID=76505325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911364505.5A Pending CN113052411A (en) 2019-12-26 2019-12-26 Data product quality evaluation method and device

Country Status (1)

Country Link
CN (1) CN113052411A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113971520A (en) * 2021-10-25 2022-01-25 重庆允成互联网科技有限公司 Software product quality evaluation method delivered by research and development team
CN116976919A (en) * 2023-09-25 2023-10-31 国品优选(北京)品牌管理有限公司 Block chain-based anti-counterfeiting traceability method and system for oral liquid
CN117114819A (en) * 2023-10-23 2023-11-24 临沂大学 Evaluation body-based data transaction reputation evaluation method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469279A (en) * 2015-11-24 2016-04-06 杭州师范大学 Commodity quality evaluation method and apparatus thereof
CN106502910A (en) * 2016-11-09 2017-03-15 攀枝花学院 Software quality evaluation system and method
CN106570525A (en) * 2016-10-26 2017-04-19 昆明理工大学 Method for evaluating online commodity assessment quality based on Bayesian network
CN107766254A (en) * 2017-11-13 2018-03-06 长春长光精密仪器集团有限公司 A kind of Evaluation of Software Quality and system based on step analysis
CN109146611A (en) * 2018-07-16 2019-01-04 浙江大学 A kind of electric business product quality credit index analysis method and system
CN109146402A (en) * 2018-07-13 2019-01-04 成都颠峰科创信息技术有限公司 A kind of appraisal procedure of software development supplier delivery quality
CN109254959A (en) * 2018-08-17 2019-01-22 广东技术师范学院 A kind of data evaluation method, apparatus, terminal device and readable storage medium storing program for executing
CN109615185A (en) * 2018-11-19 2019-04-12 北京航空航天大学 A kind of unit mass guarantee ability evaluation method based on Fuzzy AHP
CN109960640A (en) * 2017-12-22 2019-07-02 镇江市志捷软件开发有限公司 Software Quality Evaluation System and method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469279A (en) * 2015-11-24 2016-04-06 杭州师范大学 Commodity quality evaluation method and apparatus thereof
CN106570525A (en) * 2016-10-26 2017-04-19 昆明理工大学 Method for evaluating online commodity assessment quality based on Bayesian network
CN106502910A (en) * 2016-11-09 2017-03-15 攀枝花学院 Software quality evaluation system and method
CN107766254A (en) * 2017-11-13 2018-03-06 长春长光精密仪器集团有限公司 A kind of Evaluation of Software Quality and system based on step analysis
CN109960640A (en) * 2017-12-22 2019-07-02 镇江市志捷软件开发有限公司 Software Quality Evaluation System and method
CN109146402A (en) * 2018-07-13 2019-01-04 成都颠峰科创信息技术有限公司 A kind of appraisal procedure of software development supplier delivery quality
CN109146611A (en) * 2018-07-16 2019-01-04 浙江大学 A kind of electric business product quality credit index analysis method and system
CN109254959A (en) * 2018-08-17 2019-01-22 广东技术师范学院 A kind of data evaluation method, apparatus, terminal device and readable storage medium storing program for executing
CN109615185A (en) * 2018-11-19 2019-04-12 北京航空航天大学 A kind of unit mass guarantee ability evaluation method based on Fuzzy AHP

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡圣武等: "空间数据产品质量语言评价研究", 《北京测绘》, 28 February 2019 (2019-02-28), pages 127 - 131 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113971520A (en) * 2021-10-25 2022-01-25 重庆允成互联网科技有限公司 Software product quality evaluation method delivered by research and development team
CN116976919A (en) * 2023-09-25 2023-10-31 国品优选(北京)品牌管理有限公司 Block chain-based anti-counterfeiting traceability method and system for oral liquid
CN116976919B (en) * 2023-09-25 2024-01-02 国品优选(北京)品牌管理有限公司 Block chain-based anti-counterfeiting traceability method and system for oral liquid
CN117114819A (en) * 2023-10-23 2023-11-24 临沂大学 Evaluation body-based data transaction reputation evaluation method

Similar Documents

Publication Publication Date Title
CN113052411A (en) Data product quality evaluation method and device
CN105512465B (en) Based on the cloud platform safety quantitative estimation method for improving VIKOR methods
CN106095895B (en) Information pushing method and device
CN108764707A (en) A kind of data assessment system and method
CN112801393A (en) Transfer factor-based vehicle insurance risk prediction method and device and storage medium
CN112562863A (en) Epidemic disease monitoring and early warning method and device and electronic equipment
CN111882198A (en) Project performance evaluation method and system
CN115330203A (en) Liquefied natural gas storage tank health state assessment method based on game evidence network
CN109636184B (en) Method and system for evaluating account assets of brands
CN110851784A (en) Early warning method for field operation of electric energy meter
CN114493208A (en) Method and device for evaluating engineering project full life cycle, electronic equipment and medium
CN113723747A (en) Analysis report generation method, electronic device and readable storage medium
CN113283795A (en) Data processing method and device based on two-classification model, medium and equipment
CN106779354B (en) Bayes data fusion evaluation method for aircraft performance evaluation
CN108960954B (en) Content recommendation method and system based on user group behavior feedback
CN114282951B (en) Network retail prediction method, device and medium
CN115796665A (en) Multi-index carbon efficiency grading evaluation method and device for green energy power generation project
Wang et al. Multi-criteria decision-making method-based approach to determine a proper level for extrapolation of Rainflow matrix
CN109636437A (en) Cell average price predictor method, electronic device and storage medium
CN115293609A (en) Method and system for constructing personnel safety responsibility and job making evaluation index weight system
CN115169686A (en) Method for optimizing full life cycle quality evaluation index system for complex product
CN109377110B (en) Evaluation method and system for brand content assets
CN109379334B (en) Network security risk assessment index weight self-adaptive construction method and device
CN111738542A (en) Reliability analysis method for social life cycle evaluation of complex product
US20150302419A1 (en) Appraisal adjustments scoring system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination