CN112700157A - Data asset generation method and device and electronic equipment - Google Patents

Data asset generation method and device and electronic equipment Download PDF

Info

Publication number
CN112700157A
CN112700157A CN202110020007.XA CN202110020007A CN112700157A CN 112700157 A CN112700157 A CN 112700157A CN 202110020007 A CN202110020007 A CN 202110020007A CN 112700157 A CN112700157 A CN 112700157A
Authority
CN
China
Prior art keywords
data
abnormal
information
rule
asset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110020007.XA
Other languages
Chinese (zh)
Inventor
自建华
张延松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN202110020007.XA priority Critical patent/CN112700157A/en
Publication of CN112700157A publication Critical patent/CN112700157A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A data asset generation method, apparatus, electronic device and machine-readable storage medium are disclosed. In the application, the table information of the data table after data management and the data abnormal information corresponding to the data table are obtained from the butted data management platform; calculating to obtain the table quality information of the data table based on the data exception information and the table information; and when the table quality information reaches a preset threshold value, outputting the data table to a user so that the user can determine whether the data table is the data asset. On one hand, the process of determining the data table as the data asset is streamlined, and the generation and management efficiency of the data asset is improved. On the other hand, the table quality of the data table is calculated, and the data table meeting the target quality standard is generated into the data asset, so that the control on the data quality of the data asset is improved.

Description

Data asset generation method and device and electronic equipment
Technical Field
One or more embodiments of the present application relate to the field of computer application technologies, and in particular, to a data asset generation method, apparatus, electronic device, and machine-readable storage medium.
Background
In the course of the business' transition to digital, it has become common to manage data as assets. Today, enterprises rely on their own data assets to make more informed and efficient decisions, utilize data asset management, provide better products and services, reduce costs and risk control. However, data is often complex and complex, and the efficiency of generating data assets based on data is low, so how to quickly and efficiently convert data into corresponding trusted data assets and perform effective management and control is very important for improving the use value of the data assets.
Disclosure of Invention
The application provides a data asset generation method, which is applied to a data asset management platform and comprises the following steps:
acquiring table information of a data table after data management and data abnormal information corresponding to the data table from a butted data management platform;
calculating to obtain the table quality information of the data table based on the data abnormality information and the table information;
and when the table quality information reaches a preset threshold value, outputting the data table to a user so that the user can determine whether the data table is a data asset.
Optionally, the table information at least includes a table name, a table structure, and a table data line number; the data exception information at least comprises a data exception handling rule type, an exception data line number and an exception field number corresponding to the data exception handling rule type;
the calculating the table quality information of the data table based on the data exception information and the table information comprises:
calculating to obtain quality scores respectively corresponding to the data abnormality treatment rule types based on the table structure, the table data line number, the abnormal data line number and the abnormal field number corresponding to the table name;
and performing weighted calculation on the basis of preset weight coefficients respectively corresponding to the data anomaly management rule types and the calculated quality scores to obtain a final total score, and determining the obtained final total score as the surface quality information of the data table.
Optionally, the calculating, based on the table structure, the table data line number, the abnormal data line number, and the abnormal field number corresponding to the table name, quality scores respectively corresponding to the types of the data abnormal governance rules includes:
dividing the product of the number of the abnormal data lines and the number of the abnormal fields by the quotient obtained by the product of the number of the table data lines and the total number of the fields of the table structure, and calculating the score according to a preset percentile calculation method to obtain a percentile score;
and determining the obtained percentile score as the quality score corresponding to each data abnormality treatment rule type.
Optionally, each data exception handling rule type includes a plurality of sub-rule types respectively; the sub-rule types respectively correspond to the number of abnormal data lines and the number of abnormal fields;
the calculating to obtain quality scores respectively corresponding to the data abnormality management rule types based on the table structure corresponding to the table name, the table data line number, the abnormal data line number and the abnormal field number comprises:
summarizing the abnormal data line number and the abnormal field number respectively corresponding to the sub-rule types according to each data abnormal treatment rule type to obtain the summarized abnormal data number respectively corresponding to each data abnormal treatment rule type;
and calculating to obtain quality scores respectively corresponding to the data abnormality treatment rule types based on the table structure corresponding to the table name, the table data line number and the summarized abnormal data number.
Optionally, the method further includes:
and responding to an operation instruction which is triggered by a user and used for determining the data table as the data asset, and storing the data table as the data asset in a local database.
Optionally, the method further includes:
and when the data table is rejected as an operation instruction of the data asset triggered by a user and the table quality information of the data table does not reach a preset threshold value, returning the data table to the data governance platform so that the data governance platform carries out data governance again.
Optionally, the method further includes:
and when the data table is rejected as an operation instruction of the data asset triggered by a user and the table information of the data table is incomplete, outputting the data table to the user so that the user can supplement the table information of the data table.
The present application further provides a data asset generating device applied to a data asset management platform, the device includes:
the acquisition module is used for acquiring the table information of the data table after data management and the data abnormal information corresponding to the data table from the butted data management platform;
the calculation module is used for calculating the table quality information of the data table based on the data abnormal information and the table information;
and the output module is used for outputting the data table to a user when the table quality information reaches a preset threshold value so that the user can determine whether the data table is a data asset.
Optionally, the table information at least includes a table name, a table structure, and a table data line number; the data exception information at least comprises a data exception handling rule type, an exception data line number and an exception field number corresponding to the data exception handling rule type;
in the process of obtaining the table quality information of the data table by calculation based on the data anomaly information and the table information, the calculation module further:
calculating to obtain quality scores respectively corresponding to the data abnormality treatment rule types based on the table structure, the table data line number, the abnormal data line number and the abnormal field number corresponding to the table name;
and performing weighted calculation on the basis of preset weight coefficients respectively corresponding to the data anomaly management rule types and the calculated quality scores to obtain a final total score, and determining the obtained final total score as the surface quality information of the data table.
Optionally, in the process of calculating, based on the table structure, the table data line number, the abnormal data line number, and the abnormal field number corresponding to the table name, quality scores respectively corresponding to the data abnormal management rule types, the calculation module further:
dividing the product of the number of the abnormal data lines and the number of the abnormal fields by the quotient obtained by the product of the number of the table data lines and the total number of the fields of the table structure, and calculating the score according to a preset percentile calculation method to obtain a percentile score;
and determining the obtained percentile score as the quality score corresponding to each data abnormality treatment rule type.
Optionally, each data exception handling rule type includes a plurality of sub-rule types respectively; the sub-rule types respectively correspond to the number of abnormal data lines and the number of abnormal fields;
in the process of calculating the quality scores respectively corresponding to the data abnormality management rule types based on the table structure, the table data line number, the abnormal data line number and the abnormal field number corresponding to the table name, the calculation module further:
summarizing the abnormal data line number and the abnormal field number respectively corresponding to the sub-rule types according to each data abnormal treatment rule type to obtain the summarized abnormal data number respectively corresponding to each data abnormal treatment rule type;
and calculating to obtain quality scores respectively corresponding to the data abnormality treatment rule types based on the table structure corresponding to the table name, the table data line number and the summarized abnormal data number.
Optionally, the apparatus further comprises:
and the storage module is used for responding to an operation instruction which is triggered by a user and used for determining the data table as the data asset, and storing the data table as the data asset in a local database.
Optionally, the apparatus further comprises:
and the return module is used for returning the data table to the data management platform when the data table serves as an operation instruction of the data asset and the table quality information of the data table does not reach a preset threshold value in response to refute triggered by a user so as to enable the data management platform to perform data management again.
Optionally, the apparatus further comprises:
and the supplementary recording module responds to a refuting instruction triggered by a user and takes the data table as a data asset, and outputs the data table to the user when the table information of the data table is incomplete, so that the user can supplement the table information of the data table.
The application also provides an electronic device, which comprises a communication interface, a processor, a memory and a bus, wherein the communication interface, the processor and the memory are mutually connected through the bus;
the memory stores machine-readable instructions, and the processor executes the method by calling the machine-readable instructions.
The present application also provides a machine-readable storage medium having stored thereon machine-readable instructions which, when invoked and executed by a processor, perform the method described above.
Through the embodiment, the table information of the data table after data management and the data abnormal information corresponding to the data table are obtained from the butted data management platform; calculating to obtain the table quality information of the data table based on the data exception information and the table information; and when the table quality information reaches a preset threshold value, outputting the data table to a user so that the user can determine whether the data table is the data asset. On one hand, the process of determining the data table as the data asset is streamlined, and the generation and management efficiency of the data asset is improved. On the other hand, the table quality of the data table is calculated, and the data table meeting the target quality standard is generated into the data asset, so that the control on the data quality of the data asset is improved.
Drawings
FIG. 1 is a flow chart of a method for data asset generation provided by an exemplary embodiment;
FIG. 2 is a hardware block diagram of an electronic device provided by an exemplary embodiment;
FIG. 3 is a block diagram of a data asset generation apparatus provided in an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In order to make those skilled in the art better understand the technical solution in the embodiment of the present specification, a brief description will be given below of the related art for generating data assets related to the embodiment of the present specification.
Generally, in the prior art scheme of converting data tables into data assets, it is necessary to manually record which data tables can be used as data assets and perform simple classification management, and due to lack of a management system for a data asset generation process, a user cannot perform rapid and efficient data asset flow management, and in addition, the data quality of the data assets cannot be guaranteed.
Based on this, the present application aims to provide a technical solution for generating a data table as a trusted data asset quickly and efficiently.
When the data management platform is in realization, the data asset management platform acquires the table information of the data table after data management and the data abnormal information corresponding to the data table from the butted data management platform.
Further, the data asset management platform calculates the table quality information of the data table based on the data exception information and the table information.
Further, when the table quality information reaches a preset threshold value, the data asset management platform outputs the data table to a user so that the user can determine whether the data table is a data asset.
In the scheme, the table information of the data table after data management and the data abnormal information corresponding to the data table are obtained from the butted data management platform; calculating to obtain the table quality information of the data table based on the data exception information and the table information; and when the table quality information reaches a preset threshold value, outputting the data table to a user so that the user can determine whether the data table is the data asset. On one hand, the process of determining the data table as the data asset is streamlined, and the generation and management efficiency of the data asset is improved. On the other hand, the table quality of the data table is calculated, and the data table meeting the target quality standard is generated into the data asset, so that the control on the data quality of the data asset is improved.
The present application is described below with reference to specific embodiments and specific application scenarios.
Referring to fig. 1, fig. 1 is a flowchart of a data asset generating method according to an embodiment of the present application, where the method is applied to a data asset management platform, and the method performs the following steps:
and 102, acquiring the table information of the data table after data management and the data abnormal information corresponding to the data table from the butted data management platform.
And 104, calculating the table quality information of the data table based on the data abnormity information and the table information.
And 106, outputting the data table to a user when the table quality information reaches a preset threshold value so that the user can determine whether the data table is a data asset.
In this specification, the data management platform refers to a big data platform that performs data processing and data analysis on an original data table to output a data table after data management.
For example, in practical application, the data governance platform may specifically be a big data platform based on architectures such as Hadoop and Spark. The data management platform can perform ETL data processing and data analysis aiming at the original mass data table and output the data table after data management.
In this specification, the data asset management platform may include a data asset management platform of any architecture, which is docked with the data administration platform and is used to obtain a data table after data administration output by the data administration platform, and manage the obtained data table after data administration to generate a corresponding data asset.
For example, in practical applications, the data asset management platform may be a data asset management platform with a distributed architecture, or may be a data asset management platform with a centralized architecture.
In this specification, the table information refers to table basic information of the data table after data management.
In one embodiment, the table information at least includes a table name, a table structure, and a table data line number of the data table; wherein the table structure at least comprises the total number of table fields of the data table.
For example, taking data table t1 as an example, the table structure of t1 includes 20 fields, and the t1 table stores 1 ten thousand rows of data according to the table structure, so the table information of t1 specifically includes a table name (e.g., table1), the total number of t1 table fields (20 fields), and the number of table data rows (1 ten thousand rows of data).
Of course, in practical applications, the table information may also include other information such as table description of the data table, table data size, and the like, and the other information is not specifically limited in this specification.
In this specification, the data asset management platform acquires the table information of the data table after data management from the data management platform.
For example, in practical application, the data asset management platform may obtain, from the data governance platform, 1 ten thousand pieces of table information corresponding to 1 ten thousand data tables after data governance.
In this specification, the data exception information refers to exception data information obtained by performing exception scanning on the data table according to a data exception management rule in a data management process of the data management platform.
In one embodiment, the data exception information at least includes a data exception handling rule type, an exception data line number corresponding to the data exception handling rule type, and an exception field number.
For example, taking data table t1 of the previous example as an example, data table t1 includes 4 data exception management rules according to the data exception management rules during data management of the data management platform: rule A, rule B, rule C, rule D; if the number of rows (2000 rows of abnormal data) and the number of fields (10 abnormal fields correspond to the 2000 rows of abnormal data) of the abnormal data obtained by performing the abnormal scan on the data table t1 according to the rule a, the data abnormality information corresponding to the data abnormality management rule type a includes: the number of exception data lines is 2000 and the number of exception fields is 10. Similarly, for the number of rows of abnormal data and the number of fields of abnormal data corresponding to rule B, rule C, and rule D, respectively, the same situation as the number of rows of abnormal data and the number of fields of abnormal data corresponding to rule a is applied, and details are not repeated here.
In this specification, the data asset management platform may acquire the table information of the data table and may acquire the data abnormality information corresponding to the data table.
Continuing the example from the above example, while obtaining the table information of data table t1, the data asset management platform may also obtain data anomaly information corresponding to each of the 4 data anomaly governance rules (rule a, rule B, rule C, rule D) corresponding to data table t 1.
In this specification, the data asset management platform calculates table quality information of the data table based on the table information of the data table and the data abnormality information.
For example, taking the data table t1 in the foregoing example as an example, the data asset management platform calculates the table quality information of the data table t1 based on the table information of the data table t1 and the data anomaly information corresponding to the 4 data anomaly governance rules, respectively.
In one embodiment, in the step of calculating the table quality information of the data table based on the data abnormality information and the table information, the data asset management platform calculates the quality score corresponding to each data abnormality management rule type based on the table structure and the table data line number corresponding to the table name in the table information of the data table, and the abnormal data line number and the abnormal field number in the data abnormality information corresponding to the data table.
For example, taking data table t1 as an example, the table structure (the total number of table fields of the table structure is 20) and the number of table data lines (1 ten thousand lines) corresponding to the table name (table1) in the table information of data table t 1; the data abnormality management rule types (rule A, rule B, rule C, rule D) corresponding to the data table t1 correspond to the number of abnormal data lines and abnormal field in the data abnormality information (for example, rule A corresponds to the number of abnormal data lines ED _ A and abnormal field EF _ A in the data abnormality information, rule B corresponds to the number of abnormal data lines ED _ B and abnormal field EF _ B in the data abnormality information, rule C corresponds to the number of abnormal data lines ED _ C and abnormal field EF _ C in the data abnormality information, rule D corresponds to the number of abnormal data lines ED _ D and abnormal field EF _ D in the data abnormality information), the data asset management platform is based on the total number of table fields (the total number of table fields is 20) and the number of table data lines (the number of table data lines is 1 ten thousand) of the table structure of the data table t1, and the number of abnormal data lines (ED _ A, rule C, rule D) in the data abnormality information corresponding to the data table t1, ED _ B, ED _ C, ED _ D) and the number of exception fields (EF _ A, EF _ B, EF _ C, EF _ D), and quality scores corresponding to the data exception handling rule types (rule a, rule B, rule C, and rule D) are calculated (quality Score for rule a is Score _ a, quality Score for rule B is Score _ B, quality Score for rule C is Score _ C, and quality Score for rule D is Score _ D).
In this specification, after the quality scores respectively corresponding to the data abnormality management rule types are calculated, the data asset management platform performs weighted calculation based on the preset weight coefficients respectively corresponding to the data abnormality management rule types and the calculated quality scores to obtain a final total score, and determines the final total score as the table quality information of the data table.
Continuing the example from the above example, in the data asset management platform, a weighting coefficient corresponding to each data anomaly governance rule type configuration may be configured, such as: the preset weight coefficient WA corresponding to the rule a is 0.3, the weight coefficient WB corresponding to the rule B is 0.4, the weight coefficient WC corresponding to the rule C is 0.2, and the weight coefficient WD corresponding to the rule D is 0.1;
the preset weight coefficients respectively corresponding to the data anomaly governance rule types and the calculated quality scores are weighted and calculated to obtain a final Total Score _ Total, and the final Total Score _ Total can be calculated based on the following formula:
Score_Total=Score_A*WA+Score_B*WB+Score_C*WC+Score_D*WD
taking the calculated quality score as a percentage score, for example: the quality Score _ a corresponding to rule a is 90, the quality Score _ B corresponding to rule B is 80, the quality Score _ C corresponding to rule C is 70, and the quality Score _ D corresponding to rule D is 60;
values of Score _ A, Score _ B, Score _ C, Score _ D, WA, WB, WC, WD are respectively substituted into a calculation formula of Score _ Total to obtain a value of Score _ Total as follows:
score _ Total > 90 + 0.3+80 + 0.4+70 + 0.2+60 + 0.1 ═ 79 (min)
And the data asset management platform determines the obtained final Total Score (Score _ Total is 79 scores) as the table quality information of the data table t 1.
In this specification, for convenience of understanding and clarity, how the quality scores corresponding to the respective data abnormality management rule types are calculated is described. Please refer to the detailed description of the examples below.
In one embodiment, in the process of calculating the quality scores corresponding to the respective data abnormality management rule types based on the table structure, the table data line number, the abnormal data line number, and the abnormal field number corresponding to the table name, the data asset management platform calculates the score by a preset score calculation method by dividing a quotient obtained by multiplying the abnormal data line number and the abnormal field number by the table data line number and the total field number of the table structure, and determines the obtained score as the quality score corresponding to the respective data abnormality management rule type.
For example, taking the calculation process of the quality Score _ a (Score _ a is expressed by a percentage system) corresponding to the rule a as an example, the data asset management platform divides the product of the abnormal data line number ED _ a and the abnormal field number EF _ a corresponding to the rule a by the quotient obtained by dividing the product of the table data line number TL of the data table t1 and the total field number TF of the table structure, and calculates the Score according to a preset percentage system calculation method to obtain a percentage system Score; wherein, the Score _ a is calculated based on the following formula:
Score_A=(1-(ED_A*EF_A)/(TL*TF))*100
such as: when ED _ a is 2000, EF _ a is 10, TL is 10000, and TF is 20, the values of ED _ A, EF _ a, TL, and TF are substituted into the formula for calculating Score _ a to obtain Score _ a, as follows:
score _ a ═ (1- (2000 × 10)/(10000 × 20)) × 100 ═ 90 (min)
And the data asset management platform determines the obtained percentage Score _ a to be 90 as a quality Score corresponding to the data anomaly governance rule a.
Similarly, the quality Score _ B corresponding to the rule B, the quality Score _ C corresponding to the rule C, and the quality Score _ D corresponding to the rule D may be obtained through calculation, and the specific process is not repeated.
It should be noted that, in the process that the data asset management platform calculates the quality scores respectively corresponding to the types of the data abnormality management rules based on the table structure and the table data line number corresponding to the table name in the table information of the data table, and the abnormal data line number and the abnormal field number in the data abnormality information corresponding to the data table, the above example is only exemplified by the case where each data abnormality management rule directly corresponds to the abnormal data line number and the abnormal field number.
In one embodiment, each of the data exception handling rule types may include a plurality of sub-rule types; the data asset management platform collects the abnormal data line number and the abnormal field number respectively corresponding to the plurality of sub-rule types according to the data abnormal management rule types to obtain the abnormal data number respectively corresponding to each data abnormal management rule type after collection in the process of calculating the quality score respectively corresponding to each data abnormal management rule type based on the table structure, the table data line number, the abnormal data line number and the abnormal field number corresponding to the table name; and calculating to obtain quality scores respectively corresponding to the data abnormality treatment rule types based on the table structure corresponding to the table name, the table data line number and the summarized abnormal data number.
For example, taking the data exception handling rule types including rule a, rule B, rule C, and rule D as examples:
the rule A comprises 2 types of sub-rules (A1 sub-rule, A2 sub-rule), the number of abnormal data lines corresponding to the A1 sub-rule is 3000 lines and the number of abnormal fields is 6, and the number of abnormal data lines corresponding to the A2 sub-rule is 3000 lines and the number of abnormal fields is 4;
the rule B comprises 3 types of sub-rules (B1 sub-rule, B2 sub-rule and B3 sub-rule), the number of abnormal data lines corresponding to the B1 sub-rule is 2000 lines and 10 abnormal fields, the number of abnormal data lines corresponding to the B2 sub-rule is 4000 lines and 5 abnormal fields, and the number of abnormal data lines corresponding to the B3 sub-rule is 1000 lines and 10 abnormal fields;
the rule C comprises 1 type of sub-rule (C1 sub-rule), and the number of abnormal data lines corresponding to the C1 sub-rule is 3000 and the number of abnormal fields is 10;
the rule D comprises 1 type of sub-rule (D1 sub-rule), and the number of rows of abnormal data corresponding to the D1 sub-rule is 4000 and the number of abnormal fields is 10;
for the rule A, the data asset management platform collects the abnormal data line number and the abnormal field number respectively corresponding to the sub-rule A1 and the sub-rule A2 to obtain the collected abnormal data number corresponding to the rule A; the number of the summarized abnormal data corresponding to the rule A is the sum of the numbers of the abnormal data corresponding to the A1 sub-rule and the A2 sub-rule respectively; the number of abnormal data corresponding to each sub-rule (a1 sub-rule, a2 sub-rule) is the product of the number of abnormal data lines and the number of abnormal fields corresponding to each sub-rule. That is, the number of pieces of abnormal data after aggregation corresponding to rule a is 3000 × 6+3000 × 4 — 18000.
For rule B, C, D, the process is similar to the calculation process of the summarized number of abnormal data of rule a, and the detailed process is not described here again. The specific calculation results are as follows:
the number of the aggregated abnormal data corresponding to the rule B is 2000 × 10+4000 × 5+1000 × 10 — 50000
The number of the collected abnormal data corresponding to the rule C is 3000 × 10 — 30000
The number of the collected abnormal data corresponding to the rule D is 4000 × 10 40000
After the number EN of the summarized abnormal data corresponding to each data abnormal management rule type is obtained, the data asset management platform calculates to obtain quality scores corresponding to each data abnormal management rule type based on the number EN of the summarized abnormal data and by combining the table data line number TL of the data table t1 and the total field number TF of the table structure; the calculation process of the quality score corresponding to each data anomaly treatment rule type is shown as the following formula:
the quality score corresponding to each data anomaly treatment rule is (1-EN/(TL TF)). 100
Taking rule a as an example, which includes sub-rules, the formula for calculating the quality score is similar to the formula for calculating the quality described above, with the main difference that EN is substituted for ED _ a × EF _ a. The calculation process is similar for other rules including sub-rules, and is not described here.
It should be noted that, in the above example process, the quality scores respectively corresponding to the data anomaly governance rule types and the final total score obtained by performing the weighted calculation based on the quality scores are all exemplified by a percentile system, and in practical application, the final total score may also be represented and calculated in other division forms (for example, a 5-division system).
In this specification, after the table quality information of the data table is obtained through calculation, the data asset management platform monitors the obtained table quality information of the data table, and when the table quality information reaches a preset threshold, the data table is output to a user, so that the user can determine whether the data table is a data asset.
For example, the data asset management platform monitors the table quality information of the obtained data table t1, and when the table quality information reaches a preset threshold (for example, the preset threshold is 60 points), the data table t1 is output to the user, so that the user can determine the data table t1 as the data asset or determine not to be the data asset.
In practical applications, the preset threshold may be a plurality of classification thresholds, such as: the preset threshold value of 60 points represents that the data quality of the data table reaches the basic requirement, and the preset threshold value of 80 points represents that the data quality of the data table reaches good. The data asset management platform can directly determine the data assets by the data tables with good data quality.
It should be noted that by calculating the table quality of the data table and generating the data table meeting the target quality standard into the data asset, the management and control of the data quality of the data asset are improved, manual screening is avoided, and the data asset generation efficiency is improved.
In one embodiment, the data asset management platform stores the data table as a data asset in a local database in response to an operation instruction triggered by a user and determining the data table as the data asset.
For example, the data asset management platform receives and responds to an operation instruction triggered by a user and used for determining the data table t1 as the data asset through a Web interface or a CLI command line, and stores the data table t1 as the data asset in an asset table in a local database.
In another embodiment shown, when the data asset management platform responds to a rejection of the data table triggered by a user as an operation instruction of the data asset and the table quality information of the data table does not reach a preset threshold value, the data asset management platform returns the data table to the data administration platform so that the data administration platform performs data administration again.
For example, when the table quality information of the data table t1 does not reach the preset threshold (for example, does not reach 60 minutes), if the data asset management platform receives a user-triggered refusal data table t1 as an operation instruction of the data asset through a Web interface or a CLI command line, the data asset management platform returns the data table t1 to the docking data governance platform so that the data governance platform performs data governance again, and the data asset management platform may obtain the data table t1 after the data governance platform performs data governance again for data quality monitoring, output the data table t1 to the user, and finally determine whether to determine the data table t1 after performing data governance again as the data asset by the user.
In one embodiment, when the data asset management platform responds to a rejection of the data table triggered by a user as an operation instruction of the data asset and table information of the data table is incomplete, the data asset management platform outputs the data table to the user so that the user can additionally record the table information of the data table.
For example, when the table quality information of the data table t1 has reached a preset threshold (for example, 60 points are not reached), when the acquired table information of the data table t1 is incomplete (for example, the table information of the data table t1 lacks asset description information which needs to be added, such as a department to which the table belongs, a source business system, a business classification to which the table belongs, an update cycle, and the like, and cannot be acquired from the data administration platform), the data asset management platform outputs the data table t1 to the user, so that the user can complement the missing table information of the data table t1 through an interface provided by the data asset management platform.
It should be noted that, when the user refutes the data table as the data asset through the data asset management platform, the data asset management platform may further store the refuted data table in the asset refund table in the local database, so as to facilitate the tracking and management of the user.
In the technical scheme, the table information of the data table after data management and the data abnormal information corresponding to the data table are obtained from the butted data management platform; calculating to obtain the table quality information of the data table based on the data exception information and the table information; and when the table quality information reaches a preset threshold value, outputting the data table to a user so that the user can determine whether the data table is the data asset. On one hand, the process of determining the data table as the data asset is streamlined, and the generation and management efficiency of the data asset is improved. On the other hand, the table quality of the data table is calculated, and the data table meeting the target quality standard is generated into the data asset, so that the control on the data quality of the data asset is improved.
Corresponding to the method embodiment, the present specification also provides an embodiment of a data asset generating device. The embodiments of the data asset generation apparatus of the present specification can be applied to an electronic device. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. From a hardware aspect, as shown in fig. 2, a hardware structure diagram of an electronic device in which a data asset generating apparatus of this specification is located is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 2, the electronic device in which the apparatus is located in the embodiment may also include other hardware according to an actual function of the electronic device, which is not described again.
Fig. 3 is a block diagram of a data asset generation apparatus shown in an exemplary embodiment of the present specification.
Referring to fig. 3, the data asset generating apparatus 30 may be applied to a data asset management platform, and the apparatus includes:
the acquisition module 301 acquires the table information of the data table after data management and the data abnormal information corresponding to the data table from the docked data management platform;
a calculating module 302, configured to calculate, based on the data anomaly information and the table information, table quality information of the data table;
and the output module 303 is configured to output the data table to a user when the table quality information reaches a preset threshold, so that the user determines whether the data table is a data asset.
In this embodiment, the table information at least includes a table name, a table structure, and a table data line number; the data exception information at least comprises a data exception handling rule type, an exception data line number and an exception field number corresponding to the data exception handling rule type;
in the process of calculating the table quality information of the data table based on the data anomaly information and the table information, the calculation module 302 further:
calculating to obtain quality scores respectively corresponding to the data abnormality treatment rule types based on the table structure, the table data line number, the abnormal data line number and the abnormal field number corresponding to the table name;
and performing weighted calculation on the basis of preset weight coefficients respectively corresponding to the data anomaly management rule types and the calculated quality scores to obtain a final total score, and determining the obtained final total score as the surface quality information of the data table.
In this embodiment, in the process of calculating the quality scores respectively corresponding to the data abnormal governance rule types based on the table structure, the table data line number, the abnormal data line number, and the abnormal field number corresponding to the table name, the calculating module 302 further:
dividing the product of the number of the abnormal data lines and the number of the abnormal fields by the quotient obtained by the product of the number of the table data lines and the total number of the fields of the table structure, and calculating the score according to a preset percentile calculation method to obtain a percentile score;
and determining the obtained percentile score as the quality score corresponding to each data abnormality treatment rule type.
In this embodiment, each data exception handling rule type includes a plurality of sub-rule types; the sub-rule types respectively correspond to the number of abnormal data lines and the number of abnormal fields;
in the process of calculating the quality scores respectively corresponding to the data abnormality management rule types based on the table structure, the table data line number, the abnormal data line number, and the abnormal field number corresponding to the table name, the calculating module 302 further:
summarizing the abnormal data line number and the abnormal field number respectively corresponding to the sub-rule types according to each data abnormal treatment rule type to obtain the summarized abnormal data number respectively corresponding to each data abnormal treatment rule type;
and calculating to obtain quality scores respectively corresponding to the data abnormality treatment rule types based on the table structure corresponding to the table name, the table data line number and the summarized abnormal data number.
In this embodiment, the apparatus further includes:
the saving module 304 (not shown in fig. 3) saves the data table as a data asset in a local database in response to a user-triggered operation instruction for determining the data table as a data asset.
In this embodiment, the apparatus further includes:
the returning module 305 (not shown in fig. 3) returns the data table to the data governance platform in response to a user-triggered operation instruction for refuting the data table as a data asset and when the table quality information of the data table does not reach a preset threshold, so that the data governance platform performs data governance again.
In this embodiment, the apparatus further includes:
and the supplementary recording module 306 (not shown in fig. 3) outputs the data table to the user in response to the operation instruction of rejecting the data table as the data asset triggered by the user and when the table information of the data table is incomplete, so that the user can supplement the table information of the data table.
The apparatuses, modules or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by an article with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (10)

1. A data asset generation method is applied to a data asset management platform and comprises the following steps:
acquiring table information of a data table after data management and data abnormal information corresponding to the data table from a butted data management platform;
calculating to obtain the table quality information of the data table based on the data abnormality information and the table information;
and when the table quality information reaches a preset threshold value, outputting the data table to a user so that the user can determine whether the data table is a data asset.
2. The method of claim 1, the table information comprising at least a table name, a table structure, a table data row number; the data exception information at least comprises a data exception handling rule type, an exception data line number and an exception field number corresponding to the data exception handling rule type;
the calculating the table quality information of the data table based on the data exception information and the table information comprises:
calculating to obtain quality scores respectively corresponding to the data abnormality treatment rule types based on the table structure, the table data line number, the abnormal data line number and the abnormal field number corresponding to the table name;
and performing weighted calculation on the basis of preset weight coefficients respectively corresponding to the data anomaly management rule types and the calculated quality scores to obtain a final total score, and determining the obtained final total score as the surface quality information of the data table.
3. The method according to claim 2, wherein the calculating, based on the table structure, the table data line number, the abnormal data line number, and the abnormal field number corresponding to the table name, a quality score corresponding to each data abnormal governance rule type includes:
dividing the product of the number of the abnormal data lines and the number of the abnormal fields by the quotient obtained by the product of the number of the table data lines and the total number of the fields of the table structure, and calculating the score according to a preset percentile calculation method to obtain a percentile score;
and determining the obtained percentile score as the quality score corresponding to each data abnormality treatment rule type.
4. The method of claim 2, wherein each data exception management rule type comprises a plurality of sub-rule types; the sub-rule types respectively correspond to the number of abnormal data lines and the number of abnormal fields;
the calculating to obtain quality scores respectively corresponding to the data abnormality management rule types based on the table structure corresponding to the table name, the table data line number, the abnormal data line number and the abnormal field number comprises:
summarizing the abnormal data line number and the abnormal field number respectively corresponding to the sub-rule types according to each data abnormal treatment rule type to obtain the summarized abnormal data number respectively corresponding to each data abnormal treatment rule type;
and calculating to obtain quality scores respectively corresponding to the data abnormality treatment rule types based on the table structure corresponding to the table name, the table data line number and the summarized abnormal data number.
5. The method of claim 1, further comprising:
and responding to an operation instruction which is triggered by a user and used for determining the data table as the data asset, and storing the data table as the data asset in a local database.
6. The method of claim 1, further comprising:
and when the data table is rejected as an operation instruction of the data asset triggered by a user and the table quality information of the data table does not reach a preset threshold value, returning the data table to the data governance platform so that the data governance platform carries out data governance again.
7. The method of claim 1, further comprising:
and when the data table is rejected as an operation instruction of the data asset triggered by a user and the table information of the data table is incomplete, outputting the data table to the user so that the user can supplement the table information of the data table.
8. A data asset generating device applied to a data asset management platform comprises:
the acquisition module is used for acquiring the table information of the data table after data management and the data abnormal information corresponding to the data table from the butted data management platform;
the calculation module is used for calculating the table quality information of the data table based on the data abnormal information and the table information;
and the output module is used for outputting the data table to a user when the table quality information reaches a preset threshold value so that the user can determine whether the data table is a data asset.
9. An electronic device comprises a communication interface, a processor, a memory and a bus, wherein the communication interface, the processor and the memory are connected with each other through the bus;
the memory has stored therein machine-readable instructions, the processor executing the method of any of claims 1 to 7 by calling the machine-readable instructions.
10. A machine readable storage medium having stored thereon machine readable instructions which, when invoked and executed by a processor, carry out the method of any of claims 1 to 7.
CN202110020007.XA 2021-01-07 2021-01-07 Data asset generation method and device and electronic equipment Pending CN112700157A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110020007.XA CN112700157A (en) 2021-01-07 2021-01-07 Data asset generation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110020007.XA CN112700157A (en) 2021-01-07 2021-01-07 Data asset generation method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN112700157A true CN112700157A (en) 2021-04-23

Family

ID=75513232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110020007.XA Pending CN112700157A (en) 2021-01-07 2021-01-07 Data asset generation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112700157A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976120A (en) * 2016-05-17 2016-09-28 全球能源互联网研究院 Electric power operation monitoring data quality assessment system and method
CN109241107A (en) * 2018-08-03 2019-01-18 北京邮电大学 Big data controlling device based on Hadoop
CN109299083A (en) * 2018-10-16 2019-02-01 全球能源互联网研究院有限公司 A kind of data governing system
CN109325018A (en) * 2018-08-10 2019-02-12 山东超越数控电子股份有限公司 A kind of data assets management method and device based on block number evidence and distributed account book technology
CN109344133A (en) * 2018-08-27 2019-02-15 成都四方伟业软件股份有限公司 A kind of data administer driving data and share exchange system and its working method
CN111967988A (en) * 2020-07-10 2020-11-20 广州汇通国信科技有限公司 Smart power grid data governance framework based on block chain technology
CN112181955A (en) * 2020-09-01 2021-01-05 西南交通大学 Data standard management method for information sharing of heavy haul railway comprehensive big data platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976120A (en) * 2016-05-17 2016-09-28 全球能源互联网研究院 Electric power operation monitoring data quality assessment system and method
CN109241107A (en) * 2018-08-03 2019-01-18 北京邮电大学 Big data controlling device based on Hadoop
CN109325018A (en) * 2018-08-10 2019-02-12 山东超越数控电子股份有限公司 A kind of data assets management method and device based on block number evidence and distributed account book technology
CN109344133A (en) * 2018-08-27 2019-02-15 成都四方伟业软件股份有限公司 A kind of data administer driving data and share exchange system and its working method
CN109299083A (en) * 2018-10-16 2019-02-01 全球能源互联网研究院有限公司 A kind of data governing system
CN111967988A (en) * 2020-07-10 2020-11-20 广州汇通国信科技有限公司 Smart power grid data governance framework based on block chain technology
CN112181955A (en) * 2020-09-01 2021-01-05 西南交通大学 Data standard management method for information sharing of heavy haul railway comprehensive big data platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
荀华等: ""基于规则的电力数据指标检查系统设计与实现"", 《东北电力科技》, vol. 41, no. 5, pages 5 - 9 *

Similar Documents

Publication Publication Date Title
CN109918454B (en) Method and device for embedding nodes into relational network graph
CN110059155A (en) The calculating of text similarity, intelligent customer service system implementation method and device
WO2021143370A1 (en) Method and device for processing resource data
CN106878367A (en) The implementation method and device of service interface asynchronous call
CN111242319A (en) Model prediction result interpretation method and device
CN112149702A (en) Feature processing method and device
CN113010791A (en) Search result display processing method and device and computer readable storage medium
CN108133020A (en) Video classification methods, device, storage medium and electronic equipment
CN108985755B (en) Account state identification method and device and server
CN112700157A (en) Data asset generation method and device and electronic equipment
CN111400129B (en) Distributed application performance monitoring and bottleneck positioning system, method and equipment
CN116108697B (en) Acceleration test data processing method, device and equipment based on multiple performance degradation
TWI704469B (en) Data statistics method and device
JP6447054B2 (en) Information processing method and information processing program
CN110717653A (en) Risk identification method and device and electronic equipment
CN113256422B (en) Method and device for identifying bin account, computer equipment and storage medium
JP2017200079A (en) Computing for outputting distrust index of user by acquiring telephone number of user declared from portable terminal through internet
CN108429632B (en) Service monitoring method and device
CN110264306B (en) Big data-based product recommendation method, device, server and medium
CN111797282A (en) Product label weight determination method and device, electronic equipment and readable storage medium
CN117974009B (en) Method and device for determining task splitting rate, electronic equipment and storage medium
CN117910456B (en) Method and system for evaluating software cost
US12112381B2 (en) Analysis method and analysis system of financial securities product value model based on artificial intelligence and non-transitory computer readable recording medium
CN111882429B (en) Bank system field length segmentation method and device
CN116909917A (en) Financial software development defect prediction method, device, equipment, medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination