CN115392811A - Method, device and equipment for evaluating quality of main data and storage medium - Google Patents

Method, device and equipment for evaluating quality of main data and storage medium Download PDF

Info

Publication number
CN115392811A
CN115392811A CN202211343105.8A CN202211343105A CN115392811A CN 115392811 A CN115392811 A CN 115392811A CN 202211343105 A CN202211343105 A CN 202211343105A CN 115392811 A CN115392811 A CN 115392811A
Authority
CN
China
Prior art keywords
evaluation result
physical table
data
fields
evaluated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211343105.8A
Other languages
Chinese (zh)
Inventor
隋少春
邱权
张历记
王尚超
范东皖
雷霭荻
赵炜煜
罗佳丽
谭丽娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Aircraft Industrial Group Co Ltd
Original Assignee
Chengdu Aircraft Industrial Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Aircraft Industrial Group Co Ltd filed Critical Chengdu Aircraft Industrial Group Co Ltd
Priority to CN202211343105.8A priority Critical patent/CN115392811A/en
Publication of CN115392811A publication Critical patent/CN115392811A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for evaluating the quality of main data, which solve the technical problem that the existing method for evaluating the quality of the main data is not flexible enough. The method comprises the following steps: dividing the main data to be evaluated based on a preset dividing method to obtain a plurality of objects to be evaluated; based on the fields in the first physical table, performing accuracy evaluation on the object to be evaluated to obtain an accuracy evaluation result; performing consistency evaluation on the object to be evaluated based on the data items in the second physical table and the first physical table to obtain a consistency evaluation result; obtaining a quality evaluation result of an object to be evaluated based on the number of fields in a first physical table and the number of corresponding rules of the fields in the first physical table in a data standard; and obtaining a final quality evaluation result of the object to be evaluated based on the accuracy evaluation result, the consistency evaluation result and the quality evaluation result. The invention realizes the accurate evaluation of the main data and improves the efficiency and the objectivity of the evaluation of the main data.

Description

Method, device and equipment for evaluating quality of main data and storage medium
Technical Field
The present application relates to the field of data processing, and in particular, to a method, an apparatus, a device, and a storage medium for evaluating quality of main data.
Background
The main data is the basic information of an organization which meets the requirement of cross-department business cooperation and reflects the state attribute of the core business entity, and compared with transaction data, the main data has the advantages of more stable attribute, higher accuracy requirement and unique identification. It can be seen by definition that the primary data is core business entity data that is highly shared within the enterprise, and therefore the quality requirements of the primary data are higher compared to the business data. In order to monitor and evaluate the data quality in real time, a set of data quality evaluation model is often defined for the evaluation and monitoring of the main data quality. In the prior art, the quality of main data is evaluated from five indexes of integrity, accuracy, effectiveness, timeliness and consistency, and the evaluation method cannot be adjusted and updated according to data services and is not accurate enough.
Disclosure of Invention
The application mainly aims to provide a method, a device, equipment and a storage medium for evaluating the quality of main data, and aims to solve the technical problem that the existing method for evaluating the quality of main data is not accurate enough.
In order to solve the technical problem, the application provides: a main data quality evaluation method comprises the following steps:
dividing main data to be evaluated based on a preset dividing method to obtain a plurality of objects to be evaluated; the object to be evaluated is associated with a first physical table and a plurality of second physical tables, the first physical table and the second physical table both comprise a plurality of fields, and the fields comprise a plurality of data items;
based on the fields in the first physical table, carrying out accuracy evaluation on the object to be evaluated to obtain an accuracy evaluation result;
performing consistency evaluation on the object to be evaluated based on the data items in the second physical table and the first physical table to obtain a consistency evaluation result;
obtaining a quality evaluation result of the object to be evaluated based on the number of the fields in the first physical table and the corresponding rule number of the fields in the first physical table in the data standard;
and obtaining a final quality evaluation result of the object to be evaluated based on the accuracy evaluation result, the consistency evaluation result and the quality evaluation result.
As some optional embodiments of the present application, the performing accuracy evaluation on the object to be evaluated based on the field in the first physical table to obtain an accuracy evaluation result includes:
acquiring the number of rows and the number of fields of the first physical table and the number of corresponding rules of each field in a data standard;
acquiring an accuracy score coefficient of each field based on the number of rules of each field;
calculating an accuracy score for each of the fields based on a number of rules that each of the fields satisfies in the data criteria, the corresponding accuracy score coefficient, and the number of rules;
and obtaining the accuracy evaluation result according to the accuracy score and the line number of each field.
As some optional embodiments of the present application, the obtaining the accuracy evaluation result according to the accuracy score and the number of rows of each of the fields includes:
obtaining the accuracy evaluation result according to the following formula:
Figure 556057DEST_PATH_IMAGE001
wherein C is the accuracy evaluation result, p is the sum of the accuracy scores of all the fields, m is the number of rows, and q is the number of the fields.
As some optional embodiments of the present application, the first physical table is associated with a first set of MD5 tags and the second physical table is associated with a second set of MD tags; the consistency evaluation of the object to be evaluated based on the data items of the second physical table and the first physical table is performed, and the obtaining of the consistency evaluation result comprises the following steps:
acquiring the first MD5 label set and a plurality of second MD5 label sets;
and obtaining the consistency evaluation result based on the first MD5 label set and each second MD5 label set.
As some optional embodiments of the present application, the first MD5 tag set includes MD5 tags corresponding to data items in the first physical table, and the second MD5 tag set includes MD5 tags corresponding to data items in the second physical table; the obtaining the consistency evaluation result based on the first MD5 tag set and each second MD5 tag set includes:
comparing each second MD5 label set with the first MD5 label set to obtain a consistency score of each second physical table;
and obtaining the consistency evaluation result based on the consistency score of each second physical table.
As some optional embodiments of the present application, the obtaining, based on the number of fields in the first physical table and the number of corresponding rules of the fields in the first physical table in the data standard, a quality evaluation result of the object to be evaluated includes:
acquiring the total number of fields in each first physical table and the corresponding rule number of each field in a data standard based on each first physical table;
calculating the average rule number of each object to be evaluated based on the total number of fields in each first physical table and the corresponding rule number of each field in the data standard;
sequencing each object to be evaluated based on the average rule number to obtain a sequencing result;
and scoring each object to be evaluated based on the sequencing result and the number of the objects to be evaluated to obtain a quality evaluation result.
As some optional embodiments of the present application, the obtaining a final quality evaluation result of each object to be evaluated based on each accuracy evaluation result, the consistency evaluation result, and the quality evaluation result includes:
and according to a preset weight, performing weighted operation on the accuracy evaluation result, the consistency evaluation result and the quality evaluation result to obtain the final quality evaluation result.
In order to solve the technical problem, the application further provides: a master data quality evaluation apparatus, characterized in that the apparatus comprises:
the dividing module is used for dividing the main data to be evaluated based on a preset dividing method to obtain a plurality of objects to be evaluated; the object to be evaluated is associated with a first physical table and a plurality of second physical tables, the first physical table and the second physical table both comprise a plurality of fields, and the fields comprise a plurality of data items;
the first evaluation module is used for evaluating the accuracy of the object to be evaluated based on the field in the first physical table to obtain an accuracy evaluation result;
the second evaluation module is used for evaluating the consistency of the object to be evaluated based on the data items in the second physical table and the first physical table to obtain a consistency evaluation result;
the third evaluation module is used for obtaining a quality evaluation result of the object to be evaluated based on the number of the fields in the first physical table and the corresponding rule number of the fields in the first physical table in the data standard;
and the final evaluation module is used for obtaining a final quality evaluation result of the object to be evaluated based on the accuracy evaluation result, the consistency evaluation result and the quality evaluation result.
In order to solve the technical problem, the application further provides: an electronic device, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory which, when executed by the processor, implement the method as described above.
In order to solve the technical problem, the application further provides: a storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method as described above.
In summary, the invention has the following beneficial effects:
the application provides a main data quality evaluation method, which comprises the steps of firstly, dividing main data to be evaluated based on a preset division rule to obtain a plurality of objects to be evaluated, and conveniently counting the number of lines and fields of the data objects and the number of rules in an adaptive standard; then, an accuracy evaluation result is obtained based on the fields in the first physical table associated with the main data to be evaluated, and the accuracy evaluation result is obtained based on each field in the data table corresponding to the main data to be evaluated, so that the data rule corresponding to each field in the data standard can be obtained through the fields, the main data can be evaluated accurately and objectively, and the individual fields can be evaluated accurately according to the service requirements; then, consistency evaluation is carried out on the main data to be evaluated based on a plurality of second physical tables and data items of the first physical table to obtain consistency evaluation results, the main data can be quoted by a plurality of target ends due to the high sharing characteristic of the main data, and the accuracy of the main data can be comprehensively evaluated by comparing the second physical tables existing at the target ends where the main data to be evaluated are quoted with the data items of the first physical table existing at the fish source end; and then, obtaining a quality evaluation result of the main data to be evaluated based on the number of fields in a data table corresponding to the main data to be evaluated and the number of data rules corresponding to each field, wherein the main data is constructed based on a data standard system, each field has one or more data constraint rules, and the number of the data constraint rules is continuously changed in the application process and the service development process of the main data.
Drawings
Fig. 1 is a schematic flow chart of a master data quality evaluation method according to an embodiment of the present application.
Fig. 2 is a schematic flow chart illustrating accuracy evaluation of an object to be evaluated according to an embodiment of the present application.
Fig. 3 is a schematic diagram of a master data quality evaluation apparatus according to an embodiment of the present application.
Fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
In the prior art, in order to monitor and evaluate the data quality in real time, a set of data quality evaluation model is often defined for evaluating and monitoring the main data quality. Generally, the data quality is evaluated from five indexes of integrity, accuracy, effectiveness, timeliness and consistency, a data processing script is compiled according to the definition of each index, data is collected, the score condition of each index is calculated, and finally the quality evaluation is generated in a summarizing mode. The traditional monitoring method for the data quality indexes has the disadvantages of multiple involved dimensions, fuzzy index calculation rules, complex evaluation models, large influence on the whole due to adjustment of local algorithms, and incapability of dynamically adjusting and updating in real time according to data service requirements. The quality requirements of each piece of data are different, the quality rules of the data are different, some rules are multiple, some rules are few, some rules are in a value range, and some rules are mutually related.
Based on this, as shown in fig. 1, an embodiment of the present application proposes a method for evaluating quality of master data, including the following steps:
s1, dividing main data to be evaluated based on a preset dividing method to obtain a plurality of objects to be evaluated; the object to be evaluated is associated with a first physical table and a plurality of second physical tables, the first physical table and the second physical table both comprise a plurality of fields, and the fields comprise a plurality of data items;
firstly, dividing main data to be evaluated to obtain a plurality of objects to be evaluated based on a preset dividing method, wherein in one embodiment, the preset dividing method comprises dividing the main data to be evaluated according to a data principal of the main data to be evaluated, and dividing the main data to be evaluated, which has the same data principal, into the objects to be evaluated, wherein the data principal refers to a database or a data principal of a table, and manages and controls a database table which is responsible for other users; the object to be evaluated is associated with a first physical table and a plurality of second physical tables, the first physical table and the second physical table both comprise a plurality of fields, and the fields comprise a plurality of data items; the physical table is a table in a specific data source, the first physical table is a physical table in which the source end stores the object to be evaluated, the second physical table is a physical table in which the target end of the object to be evaluated is stored and referred, each row in the physical table is called a "record", each record contains all information in the row, for example, all information of a certain person in an address book database, but the record does not have a special record name in the database, the row where the record is located is often used for representing the number of records, the field is a unit smaller than the record, the field set forms the record, and each field describes a certain characteristic of the document, namely a data item;
s2, evaluating the accuracy of the object to be evaluated based on the fields in the first physical table to obtain an accuracy evaluation result;
specifically, since the accuracy evaluation result is obtained based on the field in the first physical table, the first physical table is a physical table in which the object to be evaluated is stored in the source end, accuracy evaluation based on the finest granularity is realized, and the evaluation result is more accurate;
in an embodiment, referring to fig. 2, the step of performing accuracy evaluation on the object to be evaluated based on the field in the first physical table to obtain an accuracy evaluation result specifically includes:
s21, acquiring the line number and the field number of the first physical table and the corresponding rule number of each field in a data standard;
the first physical table is a physical table in which the source end stores the object to be evaluated, and the accuracy of the object to be evaluated is evaluated in this step, and the evaluation is independent of the target end which refers to the object to be evaluated, so in this embodiment, the physical table in which the source end stores the object to be evaluated is directly obtained to be evaluated, thereby avoiding unnecessary system overhead caused by obtaining the physical table in the target end, after the first physical table is obtained, the number of rows and the number of fields of the first physical table can be obtained, because the main data is constructed based on a data standard system, the data standard is a consistent agreement on expression, format and definition of data, and includes a unified definition of data service attributes, technical attributes and management attributes, so each field in the first physical table corresponds to one or more data rules in the data standard, and in the process of main data application and the process of service development, the data rule corresponding to each field in the data standard is continuously changed, so when the accuracy evaluation is performed each time, the number of the data rule corresponding to each field is obtained, thereby realizing the data accuracy that the evaluation supports the dynamic change of the data accuracy, i.e., the evaluation rule change of the data.
S22, acquiring an accuracy score coefficient of each field based on the rule number of each field;
in this embodiment, the number of the rules of each field is n, and the accuracy score coefficient is 1/n, and since each field satisfies the field product 1 of all the data rules corresponding to each field in the data standard when performing the accuracy score subsequently, the accuracy score of each field can be quickly calculated by setting the accuracy score coefficient of each field.
S23, calculating an accuracy score of each field based on the number of rules met by each field in the data standard, the corresponding accuracy score coefficient and the number of rules;
recording the number of rules that a field meets in the data standard as t, and recording the accuracy score coefficient as 1/n, then the accuracy score of the field is calculated by the following formula:
S=t/n
the calculation formula divides the accuracy score into a plurality of equal parts according to the number of rules of each field in the data standard, and after calculation, each part of the accuracy score is obtained when each field meets one rule, if the field meets all the rules corresponding to the field in the data standard, the accuracy score of the field is one part, and if the field does not have a corresponding rule in the data standard, the accuracy score of the field is 0 part.
And S24, obtaining the accuracy evaluation result based on the accuracy score and the line number of each field.
As an optional embodiment of the present application, the step of obtaining the accuracy evaluation result according to the accuracy score and the number of rows of each of the fields includes:
the accuracy evaluation result is calculated by the following formula:
Figure 183479DEST_PATH_IMAGE002
wherein C is the accuracy evaluation result, p is the sum of the accuracy scores of all the fields, m is the number of rows, and q is the number of the fields;
the corresponding physical fields and the number of lines of each object to be evaluated are different, so that the average accuracy score of the fields is used as data accuracy evaluation, the accuracy of all the fields is added and then divided by the number of lines, and then the real score of the data accuracy of the object to be evaluated can be calculated.
S3, performing consistency evaluation on the object to be evaluated based on the data items in the second physical table and the first physical table to obtain a consistency evaluation result;
the main data refers to shared data between systems, and in order to ensure that data of a source end is the same as data of a target end, consistency of the main data needs to be evaluated, the consistency of data in a first physical table (namely, a main data resource library) of the source end is mainly compared with data in other target ends which reference the main data, and since an object to be evaluated may be referenced by one or more target ends, consistency evaluation is performed on the object to be evaluated based on a plurality of second physical tables corresponding to the object to be evaluated and the first physical table in this step, so as to obtain a consistency evaluation result.
However, because the application range of the main data is very wide, if the centralized total data is compared, the consumption of service system resources is very large; as some optional embodiments of the present application, the step of associating the first physical table with a first MD5 tag set, associating the second physical table with a second MD tag set, and performing consistency evaluation on the object to be evaluated based on the data items of the second physical table and the first physical table to obtain a consistency evaluation result includes:
s31, acquiring the first MD5 label set and a plurality of second MD5 label sets;
MD5 (Message-Digest Algorithm), which is a widely used cryptographic hash function, may generate a 128-bit (16-byte) hash value for ensuring the integrity and consistency of information transmission, and in order to avoid large consumption of service system resources when performing consistency comparison, in this embodiment, an MD5 tag set is added to a database at a source end, i.e., an interface log of a master data resource library, and a database at a target end, i.e., a target end that refers to the object to be evaluated, and is respectively denoted as a first MD5 tag set and a second MD5 tag set, where the MD5 Algorithm belongs to the prior art and is not described herein again.
The method comprises the steps of firstly obtaining a first MD5 label set and a second MD5 label set of interface logs stored in advance at a source end and a target end, wherein the first MD5 label set and the second MD5 label set are automatically generated when the target end refers to master data of the source end, and can be directly obtained, so that the situation that the consistency judgment of the extraction of the master data of the source end and the target end is directly realized is avoided, and the consistency evaluation is conveniently carried out in the subsequent steps.
S32, obtaining the consistency evaluation result based on the first MD5 label set and each second MD5 label set;
after the first MD5 tag set and each second MD5 tag set are obtained, the consistency evaluation result of the object to be evaluated can be obtained by comparing each second MD5 tag set with the first MD5 tag set, through comparing the MD5 tags, the judgment of consistency of the master data of the source end and the target end which is directly extracted is avoided, a large amount of resource consumption of centralized calculation is reduced, and the influence of data reference on the target end and a source end system is reduced.
As an optional embodiment of the present application, the first physical table is associated with a first MD5 tag set, and the second physical table is associated with a second MD tag set;
as an optional embodiment of the present application, the step of obtaining the consistency evaluation result based on the first MD5 tag set and each second MD5 tag set includes:
s321, comparing each second MD5 label set with the first MD5 label set to obtain a consistency score of each second physical table;
the first MD5 tag set includes MD5 tags corresponding to data items in the first physical table, the second MD5 tag set includes MD5 tags corresponding to data items in the second physical table, the second MD5 tag set is generated when a corresponding target terminal refers to the object to be evaluated, specifically, when the target terminal refers to the object to be evaluated, the source terminal requests the source terminal to obtain the object to be evaluated and unique identification information of the object to be evaluated through an interface, the source terminal records interface call log information, the interface call log information includes a corresponding target terminal, call time, transmitted data content, a return result, and the like, then the source terminal generates an MD5 tag from the data content transmitted by the interface through an MD5 algorithm and stores the MD5 tag into an interface log, then according to the data range of the object to be evaluated, the interface call log record of the interface call log record is analyzed, the unique identification information and the MD5 tags recorded by the interface call log information are analyzed, the target terminal referring to the object to be evaluated obtains the referenced master data information through the unique identification information and generates the second MD5 tag set, and consistency can be obtained when the MD5 tag set and the MD5 set and the subsequent consistency evaluation can be directly obtained, namely, the consistency can be evaluated.
Specifically, the number of MD5 tags in a second MD5 tag set is denoted by m, the same number of MD5 tags in the second MD5 tag set and MD5 tags in the first MD5 tag set is denoted by n, where m and n are both integers greater than or equal to 1, because the MD5 tags in the first MD5 tag set and the MD5 tags in the second MD5 tag set are both generated by data items in the first physical table and the second physical table, the MD5 tags are the same, i.e., represent that corresponding data items are the same, and a consistency score of each second physical table can be obtained by counting the number of the same MD5 tags, where the consistency score is equal to n/m.
And S322, obtaining the consistency evaluation result based on the consistency score of each second physical table.
Since one object to be evaluated may correspond to one or two target systems, in this embodiment, the consistency score of each second physical table is obtained, and the consistency evaluation result of the object to be evaluated can be obtained by solving an average value.
S4, obtaining a quality evaluation result of the object to be evaluated based on the number of the fields in the first physical table and the corresponding rule number of the fields in the first physical table in the data standard;
in this embodiment, considering that the main data is constructed based on a data standard system, each field has one or more data constraint rules, and the number of the rules is continuously changed in the process of main data application and the process of business development, the object to be evaluated is evaluated based on the number of the fields in the first physical table and the number of the rules corresponding to each of the fields in the data standard. As an optional embodiment of the present application, the step of obtaining a quality evaluation result of the main data to be evaluated based on the number of the fields in the first physical table and the number of rules corresponding to each of the fields in the data standard includes:
s41, acquiring the total number of fields in each first physical table and the corresponding rule number of each field in a data standard based on each first physical table;
after the first physical table is obtained, the total number of fields of the first physical table is counted, the rule number of each field in the first physical table in the data standard is counted, each field has one or more data constraint rules, the rule number is continuously changed in the process of main data application and the process of service development, and the number of the fields and the rule number are counted when the data quality of the object to be evaluated is evaluated each time, so that the evaluation standard is dynamically adjusted and updated in real time according to the data service requirements.
S42, calculating the average rule number of each object to be evaluated based on the total number of fields in each first physical table and the corresponding rule number of each field in the data standard;
the average rule number is obtained by dividing the sum of the rule number of each field in the data standard by the number of all the fields of the object to be evaluated, the quality requirement number of each field is obtained, the quality requirements of each data are different, the quality rules of the data are different, some rules are more, some rules are less, some are in a value range, and some rules are related to each other.
S43, sequencing each object to be evaluated based on the average rule number to obtain a sequencing result;
when the main data is evaluated, generally, evaluation is not performed only on one main data, when a plurality of main data are evaluated simultaneously, the ordering is performed according to the average rule number of each object to be evaluated to obtain an ordering result, and the quality evaluation result of each object to be evaluated is obtained according to the subsequent ordering result, different from consistency evaluation and accuracy evaluation, the consistency evaluation and the accuracy evaluation need to participate in the method, so that the objectivity of the evaluation result can be ensured, and the quality evaluation of the object to be evaluated needs to participate in a plurality of objects to ensure the objectivity of the evaluation result.
And S44, scoring each object to be evaluated based on the sequencing result and the number of the objects to be evaluated to obtain a quality evaluation result.
Specifically, in this embodiment, after sorting the average rule number of each field of all the objects to be evaluated participating in evaluation from large to small, it is noted that the number of the objects to be evaluated is n, the quality evaluation result of the first object to be evaluated sorted is n, the quality evaluation result of the second object to be evaluated sorted is n-1, and so on, the quality evaluation result of the ith object to be evaluated is n +1-i, where n and i are integers, n ≧ i is greater than or equal to 1, and when sorting is performed, the total number of rules is preferably greater if the average number is the same.
And S5, obtaining a final quality evaluation result of each object to be evaluated based on each accuracy evaluation result, the consistency evaluation result and the quality evaluation result.
The definition of the main data determines the high sharing and high stability characteristics of the main data, and in combination with the standard content of the data, the data accuracy and consistency are considered to be the most important of the quality indexes of the main data, one is to ensure the high quality of the main data, the other is to ensure that the high-quality data is widely applied, the application process is also high-quality, and the requirements of the management of the main data are met; in addition, considering that the main data is constructed on the basis of a data standard system, each field has one or more data constraint rules, and the number of the rules is continuously changed in the process of main data application and the process of business development, the average number of the rules of each field is used as a third index for main data quality evaluation.
As an optional embodiment of the present application, the step of obtaining a final quality evaluation result of each object to be evaluated based on each accuracy evaluation result, the consistency evaluation result, and the quality evaluation result includes:
s51, according to a preset weight, carrying out weighted operation on the accuracy evaluation result, the consistency evaluation result and the quality evaluation result to obtain a final quality evaluation result;
the three dimensions of the accuracy evaluation result, the consistency evaluation result and the quality evaluation result represent different stages of main data governance implementation, so that the weight of each index can be flexibly configured from the management perspective, and the weight of the average rule number score can be set to be the maximum in the standard stage advancing stage; in the standard landing stage, the weight of the data accuracy score can be set to be the maximum; in the main data integration application process, the consistency weight can be set to be the maximum of the three, and in a specific embodiment, the accuracy evaluation result, the consistency evaluation result and the quality evaluation result are respectively as follows: 0.5:0.3:0.2.
in summary, the present application provides a method for evaluating quality of main data, which includes, firstly, dividing main data to be evaluated based on a preset division rule to obtain a plurality of objects to be evaluated, so as to facilitate statistics of the number of lines, the number of fields, and the number of rules in an adaptive standard of data objects; then, an accuracy evaluation result is obtained based on fields in the first physical table associated with the main data to be evaluated, and the accuracy evaluation result is obtained based on each field in the data table corresponding to the main data to be evaluated, so that a data rule corresponding to each field in a data standard can be obtained through the fields, the main data can be evaluated accurately and objectively, and individual fields can be evaluated accurately according to business requirements; subsequently, consistency evaluation is carried out on the main data to be evaluated based on a plurality of second physical tables and data items of the first physical table to obtain consistency evaluation results, the main data can be quoted by a plurality of target ends due to the high sharing characteristic of the main data, and the accuracy of the main data can be comprehensively evaluated by comparing the second physical table of each target end which quotes the main data to be evaluated with the data items of the first physical table of each fish source end; and then, obtaining a quality evaluation result of the main data to be evaluated based on the number of fields in a data table corresponding to the main data to be evaluated and the number of data rules corresponding to each field, wherein the main data is constructed based on a data standard system, each field has one or more data constraint rules, and the number of the data constraint rules is continuously changed in the application process and the service development process of the main data.
In order to solve the above technical problem, as shown in fig. 3, the present application further provides a main data quality evaluation device, where the device includes:
the dividing module is used for dividing the main data to be evaluated based on a preset dividing method to obtain a plurality of objects to be evaluated;
the first evaluation module is used for evaluating the accuracy of the object to be evaluated based on the field in the first physical table to obtain an accuracy evaluation result; the first physical table is a physical table of which the source end stores the object to be evaluated;
the second evaluation module is used for carrying out consistency evaluation on the object to be evaluated based on a plurality of second physical tables corresponding to the object to be evaluated and the first physical table to obtain a consistency evaluation result; the second physical table refers to a physical table in a target end of the object to be evaluated;
the third evaluation module is used for obtaining a quality evaluation result of the object to be evaluated based on the number of the fields in the first physical table and the corresponding rule number of each field in a data standard;
and the final evaluation module is used for obtaining a final quality evaluation result of the main data to be evaluated based on the accuracy evaluation result, the consistency evaluation result and the quality evaluation result.
It should be noted that, each module in the main data quality evaluation apparatus in this embodiment corresponds to each step in the main data quality evaluation method in the foregoing embodiment one to one, and therefore, the specific implementation and achieved technical effects of this embodiment may refer to the implementation of the foregoing compiling method, and are not described herein again.
In addition, a method for evaluating the quality of main data according to the embodiment of the present invention described in conjunction with fig. 1 may be implemented by an electronic device. Fig. 4 shows a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
The electronic device may comprise at least one processor 301, at least one memory 302 and computer program instructions stored in the memory 302 as shown, which when executed by the processor 301, implement the method of the above described embodiments.
In particular, the processor 301 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured as one or more Integrated circuits implementing embodiments of the present invention.
Memory 302 may include mass storage for data or instructions. By way of example, and not limitation, memory 302 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 302 may include removable or non-removable (or fixed) media, where appropriate. The memory 302 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 302 is a non-volatile solid-state memory. In a particular embodiment, the memory 302 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these.
The processor 301 realizes any one of the main data quality evaluation methods in the above embodiments by reading and executing the computer program instructions stored in the memory 302.
In one example, the master data quality-based evaluation device may further include a communication interface and a bus. As shown in fig. 4, the processor 301, the memory 302, and the communication interface 303 are connected via a bus 310 to complete communication therebetween. The communication interface is mainly used for realizing communication among modules, devices, units and/or equipment in the embodiment of the invention.
A bus comprises hardware, software, or both that couple components of an electronic device to one another. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. A bus may include one or more buses, where appropriate. Although specific buses have been described and shown in the embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.
In addition, in combination with the master data quality evaluation method in the foregoing embodiment, the embodiment of the present invention may be implemented by providing a computer-readable storage medium. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the primary data quality evaluation methods in the above embodiments.
It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, an Erasable ROM (EROM), a floppy disk, a CD-ROM, an optical disk, a hard disk, an optical fiber medium, a Radio Frequency (RF) link, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims (10)

1. A master data quality evaluation method is characterized by comprising the following steps:
dividing main data to be evaluated based on a preset dividing method to obtain a plurality of objects to be evaluated; the object to be evaluated is associated with a first physical table and a plurality of second physical tables, the first physical table and the second physical table both comprise a plurality of fields, and the fields comprise a plurality of data items;
based on the fields in the first physical table, carrying out accuracy evaluation on the object to be evaluated to obtain an accuracy evaluation result;
performing consistency evaluation on the object to be evaluated based on the data items in the second physical table and the first physical table to obtain a consistency evaluation result;
obtaining a quality evaluation result of the object to be evaluated based on the number of the fields in the first physical table and the corresponding rule number of the fields in the first physical table in the data standard;
and obtaining a final quality evaluation result of the object to be evaluated based on the accuracy evaluation result, the consistency evaluation result and the quality evaluation result.
2. The method for evaluating the quality of the main data according to claim 1, wherein the performing accuracy evaluation on the object to be evaluated based on the field in the first physical table to obtain an accuracy evaluation result comprises:
acquiring the number of rows and the number of fields of the first physical table and the number of corresponding rules of each field in a data standard;
acquiring an accuracy score coefficient of each field based on the number of rules of each field;
calculating an accuracy score for each of the fields based on a number of rules that each of the fields satisfies in the data criteria, the corresponding accuracy score coefficient, and the number of rules;
and obtaining the accuracy evaluation result according to the accuracy score and the line number of each field.
3. The method of claim 2, wherein obtaining the accuracy evaluation result according to the accuracy score and the number of rows of each field comprises:
obtaining the accuracy evaluation result according to the following formula:
Figure 784246DEST_PATH_IMAGE001
wherein C is the accuracy evaluation result, p is the sum of the accuracy scores of all the fields, m is the number of rows, and q is the number of the fields.
4. The master data quality evaluation method according to claim 1, wherein the first physical table is associated with a first MD5 tag set, and the second physical table is associated with a second MD tag set; the consistency evaluation of the object to be evaluated based on the data items of the second physical table and the first physical table is performed, and the obtaining of the consistency evaluation result comprises the following steps:
acquiring the first MD5 label set and a plurality of second MD5 label sets;
and obtaining the consistency evaluation result based on the first MD5 label set and each second MD5 label set.
5. The master data quality evaluation method according to claim 4, wherein the first MD5 tag set comprises MD5 tags corresponding to data items in the first physical table, and the second MD5 tag set comprises MD5 tags corresponding to data items in the second physical table; the obtaining the consistency evaluation result based on the first MD5 tag set and each second MD5 tag set includes:
comparing each second MD5 label set with the first MD5 label set to obtain a consistency score of each second physical table;
and obtaining the consistency evaluation result based on the consistency score of each second physical table.
6. The method according to claim 1, wherein obtaining the quality evaluation result of the object to be evaluated based on the number of the fields in the first physical table and the number of the corresponding rules of the fields in the first physical table in the data standard comprises:
acquiring the total number of fields in each first physical table and the corresponding rule number of each field in a data standard based on each first physical table;
calculating the average rule number of each object to be evaluated based on the total number of fields in each first physical table and the corresponding rule number of each field in the data standard;
sequencing each object to be evaluated based on the average rule number to obtain a sequencing result;
and scoring each object to be evaluated based on the sequencing result and the number of the objects to be evaluated to obtain a quality evaluation result.
7. The method for evaluating the quality of main data according to claim 1, wherein the obtaining of the final quality evaluation result of each object to be evaluated based on each accuracy evaluation result, the consistency evaluation result and the quality evaluation result comprises:
and according to a preset weight, performing weighted operation on the accuracy evaluation result, the consistency evaluation result and the quality evaluation result to obtain the final quality evaluation result.
8. A master data quality evaluation apparatus, characterized in that the apparatus comprises:
the dividing module is used for dividing the main data to be evaluated based on a preset dividing method to obtain a plurality of objects to be evaluated; the object to be evaluated is associated with a first physical table and a plurality of second physical tables, the first physical table and the second physical table both comprise a plurality of fields, and the fields comprise a plurality of data items;
the first evaluation module is used for evaluating the accuracy of the object to be evaluated based on the field in the first physical table to obtain an accuracy evaluation result;
the second evaluation module is used for carrying out consistency evaluation on the object to be evaluated based on the data items in the second physical table and the first physical table to obtain a consistency evaluation result;
the third evaluation module is used for obtaining a quality evaluation result of the object to be evaluated based on the number of the fields in the first physical table and the corresponding rule number of the fields in the first physical table in the data standard;
and the final evaluation module is used for obtaining a final quality evaluation result of the object to be evaluated based on the accuracy evaluation result, the consistency evaluation result and the quality evaluation result.
9. An electronic device, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory that, when executed by the processor, implement the method of any of claims 1-7.
10. A storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of any one of claims 1-7.
CN202211343105.8A 2022-10-31 2022-10-31 Method, device and equipment for evaluating quality of main data and storage medium Pending CN115392811A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211343105.8A CN115392811A (en) 2022-10-31 2022-10-31 Method, device and equipment for evaluating quality of main data and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211343105.8A CN115392811A (en) 2022-10-31 2022-10-31 Method, device and equipment for evaluating quality of main data and storage medium

Publications (1)

Publication Number Publication Date
CN115392811A true CN115392811A (en) 2022-11-25

Family

ID=84114931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211343105.8A Pending CN115392811A (en) 2022-10-31 2022-10-31 Method, device and equipment for evaluating quality of main data and storage medium

Country Status (1)

Country Link
CN (1) CN115392811A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699693A (en) * 2014-01-10 2014-04-02 中国南方电网有限责任公司 Metadata-based data quality management method and system
CN107229694A (en) * 2017-05-22 2017-10-03 北京红马传媒文化发展有限公司 A kind of data message consistency processing method, system and device based on big data
CN107368957A (en) * 2017-07-04 2017-11-21 广西电网有限责任公司电力科学研究院 A kind of construction method of equipment condition monitoring quality of data evaluation and test system
US20180232407A1 (en) * 2017-02-10 2018-08-16 Wipro Limited Method and system for assessing quality of incremental heterogeneous data
CN109101539A (en) * 2018-06-29 2018-12-28 东软集团股份有限公司 Business datum quality evaluating method, device, storage medium and electronic equipment
CN110033201A (en) * 2019-04-22 2019-07-19 浙江中烟工业有限责任公司 A kind of tobacco industry batch overall process quality testing and improved method and apparatus
CN110362563A (en) * 2019-07-19 2019-10-22 北京明略软件系统有限公司 The processing method and processing device of tables of data, storage medium, electronic device
CN111143623A (en) * 2019-12-31 2020-05-12 科技谷(厦门)信息技术有限公司 Data quality monitoring method in big data environment
CN113127482A (en) * 2019-12-31 2021-07-16 奇安信科技集团股份有限公司 Data quality analysis method and device, computer equipment and storage medium
CN114357032A (en) * 2022-01-06 2022-04-15 杭州隆埠科技有限公司 Data quality monitoring method and device, electronic equipment and storage medium
CN115080552A (en) * 2022-06-21 2022-09-20 北京友友天宇系统技术有限公司 Data quality evaluation method, device, equipment and computer readable storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699693A (en) * 2014-01-10 2014-04-02 中国南方电网有限责任公司 Metadata-based data quality management method and system
US20180232407A1 (en) * 2017-02-10 2018-08-16 Wipro Limited Method and system for assessing quality of incremental heterogeneous data
CN107229694A (en) * 2017-05-22 2017-10-03 北京红马传媒文化发展有限公司 A kind of data message consistency processing method, system and device based on big data
CN107368957A (en) * 2017-07-04 2017-11-21 广西电网有限责任公司电力科学研究院 A kind of construction method of equipment condition monitoring quality of data evaluation and test system
CN109101539A (en) * 2018-06-29 2018-12-28 东软集团股份有限公司 Business datum quality evaluating method, device, storage medium and electronic equipment
CN110033201A (en) * 2019-04-22 2019-07-19 浙江中烟工业有限责任公司 A kind of tobacco industry batch overall process quality testing and improved method and apparatus
CN110362563A (en) * 2019-07-19 2019-10-22 北京明略软件系统有限公司 The processing method and processing device of tables of data, storage medium, electronic device
CN111143623A (en) * 2019-12-31 2020-05-12 科技谷(厦门)信息技术有限公司 Data quality monitoring method in big data environment
CN113127482A (en) * 2019-12-31 2021-07-16 奇安信科技集团股份有限公司 Data quality analysis method and device, computer equipment and storage medium
CN114357032A (en) * 2022-01-06 2022-04-15 杭州隆埠科技有限公司 Data quality monitoring method and device, electronic equipment and storage medium
CN115080552A (en) * 2022-06-21 2022-09-20 北京友友天宇系统技术有限公司 Data quality evaluation method, device, equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
US20150081683A1 (en) Ranking search results based on word weight
CN113918622A (en) Information tracing method and system based on block chain
CN108428138B (en) Customer survival rate analysis device and method based on customer clustering
CN117171118B (en) Rural marketing data intelligent management system
CN115392811A (en) Method, device and equipment for evaluating quality of main data and storage medium
CN111752734A (en) Abnormal data classification method, abnormal data analysis method, abnormal data classification device and abnormal data analysis device, and storage medium
CN112100177A (en) Data storage method and device, computer equipment and storage medium
CN114996165B (en) Business data auditing method and device, storage medium and electronic equipment
CN113407816A (en) Object recommendation method and device, electronic equipment and storage medium
CN109523296B (en) User behavior probability analysis method and device, electronic equipment and storage medium
CN110717653A (en) Risk identification method and device and electronic equipment
CN115563310A (en) Method, device, equipment and medium for determining key service node
CN115048487A (en) Artificial intelligence-based public opinion analysis method, device, computer equipment and medium
CN111652281B (en) Information data classification method, device and readable storage medium
CN113590322A (en) Data processing method and device
CN113934894A (en) Data display method based on index tree and terminal equipment
CN111681090A (en) Account grouping method and device of business system, terminal equipment and storage medium
CN111160929A (en) Method and device for determining client type
CN113905400B (en) Network optimization processing method and device, electronic equipment and storage medium
CN113407521B (en) User behavior tag preference sorting method, device, equipment and storage medium
CN113032628B (en) Method, device, equipment and medium for determining content ecological index segmentation threshold
CN115619290B (en) Method, device and equipment for determining product service of enterprise
CN113505532B (en) Equipment remaining life prediction method, device, computer equipment and medium
CN110377592B (en) Data preprocessing method and device for quantifying variable to virtual variable and terminal equipment
CN118277742A (en) User portrait construction method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20221125

RJ01 Rejection of invention patent application after publication