CN111597177A - Data governance method for improving data quality - Google Patents

Data governance method for improving data quality Download PDF

Info

Publication number
CN111597177A
CN111597177A CN202010406901.6A CN202010406901A CN111597177A CN 111597177 A CN111597177 A CN 111597177A CN 202010406901 A CN202010406901 A CN 202010406901A CN 111597177 A CN111597177 A CN 111597177A
Authority
CN
China
Prior art keywords
data
metadata
current metadata
current
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010406901.6A
Other languages
Chinese (zh)
Inventor
侯政斌
委中原
谭小勰
刘引
黄康圣
秦邱川
周期律
张雅琦
李云
税萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Rural Commercial Bank Co ltd
Original Assignee
Chongqing Rural Commercial Bank Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Rural Commercial Bank Co ltd filed Critical Chongqing Rural Commercial Bank Co ltd
Priority to CN202010406901.6A priority Critical patent/CN111597177A/en
Publication of CN111597177A publication Critical patent/CN111597177A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/909Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data management method, a device, equipment and a storage medium for improving data quality, wherein the method comprises the following steps: collecting metadata of a specified database table as current metadata, and comparing the current metadata with a preset validity rule; wherein the current metadata comprises technical metadata and business metadata; if the current metadata conforms to the validity rule, determining that the current metadata has validity; and if the current metadata does not accord with the validity rule, indicating a corresponding person in charge to correct the current metadata, determining that the metadata obtained by correcting the current metadata is the current metadata, and returning to the step of comparing the current metadata with the preset validity rule until the frequency of determining that the current metadata does not accord with the validity rule reaches a frequency threshold. Therefore, the data quality is improved, and the effective treatment of the data is realized.

Description

Data governance method for improving data quality
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data management method, device, apparatus, and storage medium for improving data quality.
Background
In the data era, for enterprises such as banks and the like, if no data governance exists, any metadata management scheme is destined to fail; specifically, metadata management can be an important function, so that an IT department can manage changes in a complex data integration environment and deliver credible and safe data at the same time; when the business interest stakeholders participate in the process and accept responsibility for the data reference framework, the advantages become more convincing, and at the moment, the enterprises can associate the business metadata with the basic technical metadata to provide information such as vocabularies and background data for cooperation in the whole company range. Therefore, data governance is important for enterprises such as banks, but no technical scheme capable of effectively realizing data governance exists in the prior art.
Disclosure of Invention
The invention aims to provide a data management method, a device, equipment and a storage medium for improving data quality, which can realize effective management of data.
In order to achieve the above purpose, the invention provides the following technical scheme:
a data governance method for improving data quality, comprising:
collecting metadata of a specified database table as current metadata, and comparing the current metadata with a preset validity rule; wherein the current metadata comprises technical metadata and business metadata;
if the current metadata conforms to the validity rule, determining that the current metadata has validity;
and if the current metadata does not accord with the validity rule, indicating a corresponding person in charge to correct the current metadata, determining that the metadata obtained by correcting the current metadata is the current metadata, and returning to the step of comparing the current metadata with the preset validity rule until the frequency of determining that the current metadata does not accord with the validity rule reaches a frequency threshold.
Preferably, after collecting the metadata of the specified database table as the current metadata, the method further includes:
and judging whether the example data corresponding to the current metadata has data loss or meets the dependency relationship among different example data, if the two judgment results are yes, determining that the example data corresponding to the current metadata has validity, and otherwise, indicating a corresponding responsible person to correct the example data corresponding to the current metadata.
Preferably, after collecting the metadata of the specified database table as the current metadata, the method further includes:
judging whether the example data corresponding to the current metadata belongs to preset sensitive information, if so, desensitizing the sensitive information according to a preset desensitizing program, and if not, determining that the example data corresponding to the current metadata does not need to be desensitized.
Preferably, the method further comprises the following steps:
and checking whether repeated example data exist in the specified database table, if so, determining that any one of the repeated example data is target data, deleting other example data repeated with the target data, and if not, determining that the example data does not need to be deleted.
Preferably, the method further comprises the following steps:
and judging whether the example data in the specified database table is data which is acquired at a specified time and a specified interface and is in a specified range, if so, determining that the example data is available, otherwise, determining that the example data is unavailable.
Preferably, after collecting the metadata of the specified database table as the current metadata, the method further includes:
acquiring current metadata and map information corresponding to the instance data, and displaying the map information in a pre-drawn data map in a visual manner; the map information comprises the flow direction, the reference relation and the organization rule of the current metadata and the corresponding instance data.
Preferably, the method further comprises the following steps:
and determining unique data sources respectively corresponding to the systems, analyzing the relation among the data provided by the unique data sources, obtaining a relation map representing the relation among the data provided by the unique data sources, and displaying the relation map.
A data governance device for improving data quality, comprising:
a comparison module for: collecting metadata of a specified database table as current metadata, and comparing the current metadata with a preset validity rule; wherein the current metadata comprises technical metadata and business metadata;
a first processing module to: if the current metadata conforms to the validity rule, determining that the current metadata has validity;
a second processing module to: and if the current metadata does not accord with the validity rule, indicating a corresponding person in charge to correct the current metadata, determining that the metadata obtained by correcting the current metadata is the current metadata, and returning to the step of comparing the current metadata with the preset validity rule until the frequency of determining that the current metadata does not accord with the validity rule reaches a frequency threshold.
A data governance device for improving data quality, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the data governance method for improving data quality as described in any one of the above when executing the computer program.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of a data governance method for improving data quality as described in any one of the preceding claims.
The invention provides a data management method, a device, equipment and a storage medium for improving data quality, wherein the method comprises the following steps: collecting metadata of a specified database table as current metadata, and comparing the current metadata with a preset validity rule; wherein the current metadata comprises technical metadata and business metadata; if the current metadata conforms to the validity rule, determining that the current metadata has validity; and if the current metadata does not accord with the validity rule, indicating a corresponding person in charge to correct the current metadata, determining that the metadata obtained by correcting the current metadata is the current metadata, and returning to the step of comparing the current metadata with the preset validity rule until the frequency of determining that the current metadata does not accord with the validity rule reaches a frequency threshold. According to the technical scheme, after metadata of a specified database table is collected, if the metadata does not accord with the validity rule, a corresponding responsible person is indicated to correct the metadata, whether the corrected metadata accords with the validity rule or not is continuously judged, and the time when the metadata does not accord with the validity rule is determined to reach a corresponding time threshold value; therefore, through the mode, the metadata comprising the technical metadata and the service metadata can be guaranteed to have effectiveness, so that the corresponding data has service effectiveness and technical effectiveness, the data quality is improved, and the data effectiveness is effectively treated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a data governance method for improving data quality according to an embodiment of the present invention;
fig. 2 is a schematic diagram of data governance in a data governance method for improving data quality according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a data map in a data governance method for improving data quality according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data governance device for improving data quality according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart of a data governance method for improving data quality according to an embodiment of the present invention is shown, where the method includes:
s11: collecting metadata of a specified database table as current metadata, and comparing the current metadata with a preset validity rule; wherein the current metadata includes technical metadata and business metadata.
The execution main body of the data governance method for improving the data quality provided by the embodiment of the invention can be a corresponding data governance device for improving the data quality; metadata is data describing data, mainly describes data attribute (property) information, and is used for supporting functions such as indicating storage location, history data, resource lookup, file record and the like, and may include table name, field type, field interpretation, code value meaning and the like; the instance data is the data described by the metadata stored in the database table. The designated database table can be a table which is randomly positioned in the database and needs to realize data management, after the designated database table which needs to realize data management is determined, metadata of the designated database table can be collected, and then whether the metadata has validity is judged by comparing the metadata with preset validity rules, and the mode of realizing metadata collection can be specifically that the backup library of the direct connection database extracts the metadata of the database table. The metadata can include technical metadata and business metadata, the technical metadata and the business metadata have the same meaning as corresponding concepts in the prior art, in brief, the metadata of the technical example data is technical metadata, and the metadata of the business example data is business metadata, so that the validity of the business metadata and the technical metadata in the specified database table can be ensured through the method, namely, the business validity and the technical validity of the database table are ensured. In addition, the setting of the validity rule can be realized in a page configuration mode, for example, the certificate number of the client information table cannot be null, the naming field specification of the table, the field format specification, the industry to which the table belongs or the inline rule and the like can be specifically set according to actual needs.
S12: if the current metadata meets the validity rule, the current metadata is determined to be valid.
And if the current metadata conforms to the validity rule, the current metadata is proved to have the validity represented by the validity rule.
S13: and if the current metadata does not accord with the validity rule, indicating a corresponding person in charge to correct the current metadata, determining that the metadata obtained by correcting the current metadata is the current metadata, and returning to the step of comparing the current metadata with the preset validity rule until the frequency of determining that the current metadata does not accord with the validity rule reaches a frequency threshold.
If the current metadata does not meet the validity rule, it indicates that the corresponding data needs to be modified, where the data that needs to be modified usually includes metadata that does not meet the validity rule, and of course, the data may also include example data corresponding to the metadata that does not meet the validity rule, and both are within the scope of the present invention. In addition, when the current metadata is determined not to accord with the validity rule, the condition can be displayed on a preset metadata question billboard in the form of a metadata question list; meanwhile, determining a responsible person corresponding to the data needing to be corrected through a preset data management flow specification, and sending a work order on line (namely, indicating the corresponding responsible person to correct the data needing to be corrected); after the responsible person corrects the corresponding data and feeds back the corrected result on the work order system, whether the corrected data meets the validity rule can be continuously judged, and therefore the problem can be solved.
In addition, the time threshold can be set according to actual needs, such as 3 times, 5 times and the like; if the number of continuous corrections or the number of continuous determinations that the metadata does not conform to the validity rule reaches the threshold number, but the problem that the metadata does not conform to the validity rule cannot be solved, in order to avoid excessive useless work, corresponding correction and other steps are not performed, alarm information can be directly sent to a corresponding management terminal and the like of a manager, and problem alarm is achieved.
According to the technical scheme, after metadata of a specified database table is collected, if the metadata does not accord with the validity rule, a corresponding responsible person is indicated to correct the metadata, whether the corrected metadata accords with the validity rule or not is continuously judged, and the time when the metadata does not accord with the validity rule is determined to reach a corresponding time threshold value; therefore, through the mode, the metadata containing the technical metadata and the service metadata can be guaranteed to have effectiveness, so that the corresponding data has service effectiveness and technical effectiveness, the data quality is improved, and the data is effectively managed.
It should be noted that, after the metadata of the database table is acquired, in order to ensure the management effectiveness of the data, the management access authority may be judged and divided based on the description information of the acquired metadata, that is, a person who has access authority to the metadata is determined based on the description information of the acquired metadata, and then only this part of the person is set to normally access the metadata and the corresponding instance data having access authority, and other persons cannot normally access the metadata and the corresponding instance data, so that the illegal access to the metadata and the corresponding instance data is avoided by this way; for example, the basic information table of the client in the core system, the administrator should belong to the principal of the core system or the creator of the table, and zhang san of the remark administrator can be obtained by collection, and at this time, the right to query the basic information table of the client and the metadata thereof can be given to zhang san. In addition, the management responsible person, the credibility and the business verification rule of the offline communication negotiation data (the data in the application comprise metadata and corresponding example data) can be further controlled, so that good data quality is guaranteed; for example, some service tables such as the basic information tables of the client of the core system are not provided with the creator and the responsible person when the tables are built, and the responsible person of the service tables can be supplemented into the corresponding metadata system through investigation under the line.
The data management method for improving data quality provided by the embodiment of the invention can also comprise the following steps after collecting the metadata of the specified database table as the current metadata:
and judging whether the example data corresponding to the current metadata has data loss or meets the dependency relationship among different example data, if the two judgment results are yes, determining that the example data corresponding to the current metadata has validity, and otherwise, indicating a corresponding responsible person to correct the example data corresponding to the current metadata.
In order to ensure data integrity, in this embodiment, whether data loss exists in corresponding instance data can be judged from a metadata layer by setting a corresponding keyword rule, if so, details of the missing data (including information, fields and the like) and the like can be displayed at a preset problem display position, and meanwhile, the details are sent to corresponding responsible persons in a form of a work order to require the corresponding responsible persons to verify and solve the problem of data loss and are fed back to a data management device after the problem is solved, the data management device returns to execute the step of judging whether data loss exists to realize the verification of whether the problem of data loss is solved, and the alarm can be realized until the number of data loss is judged to reach a number threshold; for example, the core system client basic information table contains a client number cust _ no field, and the description of the field can determine that the client number of the table should be a non-empty field, so that the rule can be specified as cust _ no is null or cust _ no! By the rule, specific example data in the table is verified, and if the client number is empty, missing information can be returned to a platform (namely, a data management device) for realizing metadata control, so that the quality is improved by a data management means.
In order to ensure that the data consistency can be judged whether to accord with the data consistency from a metadata level in the embodiment, the dependency consistency and the reference consistency are mainly involved, the dependency relationship and/or the association relationship between fields in a corresponding table, between tables and the like are judged according to the metadata, and whether the corresponding instance data meets the dependency relationship or the association relationship is checked, so that the quality of the data is determined according to the judgment result; for example, in a client basic information table of a core system, a province field and an urban area field of a client must have a containment relationship, such as Guangdong Shenzhen city, and if a record records the Guangdong province Chongqing city, the Guangdong province Chongqing city cannot meet the consistency rule, which is problematic. In addition, when it is determined that the example data does not conform to the rule of the consistency (in this embodiment, the consistency may be the dependency and/or the association), details of the data (including information, fields, and the like) of the rule that does not conform to the consistency may be displayed at a preset problem display place, and the details may be sent to the corresponding responsible person in the form of a work order, so as to require the corresponding responsible person to verify and solve the problem that the data does not conform to the consistency, and the details are fed back to the data management device after the problem is solved, and the data management device may return to perform verification that whether the data conforms to the consistency and solves the problem that the data consistency is solved, until the number of times that the data does not conform to the consistency reaches the number threshold, an alarm may be implemented, thereby improving the quality by the data management means.
It should be noted that, in order to evaluate the data accuracy, in this embodiment, it may also be determined whether the data is accurate through a dual mode of metadata collection and instance data collection, in combination with the business meaning of the data, generally speaking, the metadata conforms to the validity rule, the instance data is not missing, and conforms to the corresponding dependency relationship, and then the metadata is considered to have accuracy, otherwise, the accuracy is considered to be low; the accuracy of the data is stored in a feature library of the data, and the use basis of the data quality condition is provided for a user; for example: the technical effectiveness is judged through rules of technical effectiveness and service effectiveness, for example, whether a client number field of a client basic information table accords with a corresponding naming specification belongs to the technical effectiveness, whether interpretation (1, VIP clients and 2, common clients) of client classification code value distribution and code value mapping are correct belongs to the service effectiveness, and data meeting the technical effectiveness and the service effectiveness can be judged to be data with better quality, namely data with higher accuracy.
The data management method for improving data quality provided by the embodiment of the invention can also comprise the following steps after collecting the metadata of the specified database table as the current metadata:
and judging whether the example data corresponding to the current metadata belongs to preset sensitive information, if so, desensitizing the sensitive information according to a preset desensitizing program, and if not, determining that the example data corresponding to the current metadata does not need to be desensitized.
It should be noted that, by metadata collection, whether the table and the field belong to sensitive information is judged according to the keywords, and then the judged sensitive information is stored in the feature library; judging whether the sensitive information belongs to the sensitive information according to the keywords specifically can be that if the sensitive information contains the keywords corresponding to the sensitive information, the sensitive information is indicated, otherwise, the sensitive information is not indicated; and moreover, the desensitization rules obtained based on experience from the outside can be continuously received and stored, so that continuous perfection of a desensitization feature library for storing the desensitization rules is realized, desensitization processing of data is realized based on the desensitization rules by a downstream desensitization program, and the quality of the data is improved from the aspect of data sensitivity. The desensitization treatment can be to replace part or all of sensitive information with special characters, letters or other ways to hide real information; for example: if the license number is the identification number, the relevant desensitization rule can be input into the desensitization characteristic library, the first three bits and the last three bits of the identification number are reserved, the middle bits are replaced by the star, if other license number types exist, the sensitive rule needs to be added into the desensitization characteristic library, and a new desensitization means is formulated for desensitization of data.
The data management method for improving data quality provided by the embodiment of the invention can further comprise the following steps:
checking whether repeated example data exist in a specified database table, if so, determining that any one of the repeated example data is target data, deleting other example data which are repeated with the target data, and if not, determining that the example data do not need to be deleted.
In order to ensure data uniqueness, in this embodiment, all instance data included in the specified database table is checked, so that when there is duplicate instance data, only one instance is saved; in addition, data collection can be performed by interfacing a database, a file system and the like, so that when checking whether repeated example data exists, search checking can be performed according to the primary keys of the database table, the file system and the like, and thus, whether the example data exists repeatedly can be quickly confirmed.
The data management method for improving data quality provided by the embodiment of the invention can further comprise the following steps:
and judging whether the example data in the specified database table is data which is acquired by the specified interface at the specified time and is within the specified range, if so, determining that the example data is available, otherwise, determining that the example data is unavailable.
In order to ensure the availability of the data, in the present embodiment, the time, the interface, and the range of the instance data may be checked, specifically, if the instance data is obtained through the valid interface (specified interface) for obtaining the instance data at the valid time (specified time) for obtaining the instance data, and the range of the instance data is the preset range where the instance data should be, it may be determined that the instance data is available, otherwise, it may be determined that the instance data is not available; in addition, for the time and the interface for acquiring the instance data, a corresponding label can be marked on the instance data, and whether the instance data is available or not is judged based on the label; for example: for the loan balance index, when the loan balance is counted at the end of each month, data is run out offline at No. 1T +1 per month, the data of the index in the current month is unavailable at the end of each month, and the index is not available until No. 1 next month, namely, the availability check is met.
The data management method for improving data quality provided by the embodiment of the invention can also comprise the following steps after collecting the metadata of the specified database table as the current metadata:
acquiring current metadata and map information corresponding to the instance data, and displaying the map information in a pre-drawn data map in a visual manner; the map information comprises the flow direction, the reference relation and the organization rule of the current metadata and the corresponding instance data.
The data map is mainly used for providing business personnel for data research and development to check organization rules, source conditions and the like of the data, so that the data map can find out where the data comes from and how to process the data. Specifically, the data map can develop a java program analysis data processing program through a data management and control platform, and the program can find the flow direction (the flow direction is the change of the position where the data is located, and can be represented in the form of lines), reference (the reference is the use of data implementation), table and field information (such as table name, field name and the like), rule mapping logic (how the data has organization rules at different positions, the organization rules include the type of the data, the standard adopted by the data and the like) of the data through keywords, and visually display the data; the information is found in a keyword mode, namely the information is obtained by tracking the specified keyword; therefore, when data index development is carried out, the name of the data index can be found on a data map, blood margin analysis is further carried out, the data index is connected in series from a data source node to a target node one by one in a line form, the flow direction of the data can be clearly seen, related index mapping processing logic can be also checked, and development work is facilitated. In addition, the data map can also solve the problem of upstream and downstream influence analysis, the change of the database table and the field of the upstream system can influence the downstream application system, at the moment, the influence analysis of the data map can clearly see which downstream the field is quoted, the change of the field can influence which departments, systems and applications, and the influence analysis can inform the downstream system to be matched and modified, thereby solving the production problem caused by the change of the field due to the upstream change.
The data management method for improving data quality provided by the embodiment of the invention can further comprise the following steps:
and determining unique data sources respectively corresponding to the systems, analyzing the relation among the data provided by the unique data sources, obtaining a relation map representing the relation among the data provided by the unique data sources, and displaying the relation map.
In order to implement data uniqueness and avoid problems such as data collision caused by data provided by different data sources, in this embodiment, different systems respectively correspond to unique data sources, so that for a certain system, the unique data source corresponding to the certain system is regarded as the standard, and adverse effects caused by other data sources on the certain system are avoided; the system is a mechanism, a model and the like for realizing the service function; for example: the customer number may have a core system and a credit system, and the unique data source of the business data is marked based on the principle of unique data export, so that data collision can be reduced.
In addition, a unique data source of a certain system can be determined through business research, data provided by the unique data source is subject data, so that a main data label is marked on the main model data to form a corresponding subject library, and a master-slave relation and incidence relation map of the main data of each subject library is formed for displaying. For example: the organization theme records various information of the organization by using a set of data system with the organization as a central axis, the product theme records relevant information of various products by using the products as the center, and the product theme has a certain incidence relation with the organization, such as the relation between Yukukuai fast loan and the financial innovation department, and has important significance for the performance assessment of the organization.
In a specific application scenario, the schematic diagram of the present invention for implementing data governance may be as shown in fig. 2, so that the data quality determination and data governance are implemented based on fig. 2, and the data quality mainly may relate to the following aspects: business availability, technology availability, management availability, integrity, consistency, accuracy, sensitivity, uniqueness, and availability.
1. Service availability and technical availability:
through the information acquisition of metadata management and control platform (metadata management and control platform can belong to the data administration device, can be referred to as the platform for short) realization database, table, the mode of gathering can be that the backup base that directly links extracts the metadata of database table, and the metadata of extraction specifically can include: table name, field type, field interpretation, code value meaning, etc. The service validity may include whether the service meaning of the system, the table and the field is correct, whether the code value category to which the field belongs is correct, and the like; technical validity may include whether the system, table, field naming complies with the corresponding naming specifications, whether the field format (field type, field length, field precision) complies with the field format specifications, whether key values (primary key, index) comply with requirements, whether statistical information (data distribution) complies with requirements, etc.
The following are examples of technical usefulness:
metadata (such as field names, types, lengths, annotations and the like of a client information table of a core system) of a database table is collected on the platform through the platform, and some rules are set for the database table through a page configuration mode, for example, the certificate number of the client information table cannot be null; through the comparison of the collected metadata and the rules, whether the table accords with inline or industrial rules is determined, namely whether data with a blank certificate number exist, and if not, the data is displayed on a metadata problem billboard in a metadata problem list mode; determining a responsible person of the data problem and a person responsible for solving the problem through a data management standard flow, and sending a work order on line; the problem is fed back to the problem responsible person for problem solving and repairing, and the result is fed back on the work order system; and (5) carrying out rule matching check again to see whether the problem is solved.
2. And (3) management effectiveness:
1) and judging and dividing management access authority according to the acquired description information of the metadata. For example: for example, a client basic information table of a core system, a manager should belong to a principal of the core system or a creator of the table, Zhang III can be obtained by collecting a remark manager, and the Zhang III is given an authority to query metadata and corresponding instance data;
2) the data quality is guaranteed by a manager for communicating and negotiating data under the line, the credibility (which can be verified by a data source and other rules), the version information (whether the version information meets the requirements) and the business verification rule. For example: some service tables such as the basic information tables of the client of the core system are created without remarks for the creator and the responsible person at the time of table creation, so that the managers of these tables can be supplemented by investigation under the line.
3. Integrity:
data acquisition is carried out through a metadata management and control platform, and whether data loss exists in example data is judged from a metadata level through setting relevant keyword rules, for example: such as the client number cust _ no field of the core system client basic information table, from which it can be confirmed that the client number of the table should be a non-empty field, a rule can be specified as cust _ nois null or cust _ no! The specific example data is verified through the rule, if the client number is empty, the missing information is returned to the platform, and the quality is improved through a data management means; the data management means here refers to: and displaying the details of the missing information, the missing field and the like at the details of the platform problem, then sending the problem to a related responsible person in a form of a work order, requiring to verify and solve the problem that the number of the data client is empty, and feeding back to verify whether the problem is solved or not after the problem is solved.
4. Consistency:
1) the method mainly relates to dependency consistency and reference consistency from a metadata level, judges the dependency relationship and the incidence relationship of fields in and among tables according to the metadata, and checks whether the instance data meets the dependency or incidence relationship, thereby judging the quality of the data. For example: in a client basic information table of a core system, a province field and a city field of a client must have a containing relationship, such as Guangdong Shenzhen city, and if the record records the Guangdong Shenzhen city, the data does not meet the consistency rule, which indicates that the data has problems.
5. The accuracy is as follows:
whether the data is accurate is defined through a dual mode of metadata collection and example data collection and by combining the business meaning of the data, and the credibility and the accuracy of the data are extracted to a feature library to provide a use basis for the data quality condition of a user. For example: through the judgment of the rules of technical effectiveness and service effectiveness, for example, the client number field naming specification of the client basic information table is technical effectiveness, the explanation of the client classification code value distribution (1, VIP clients, 2 and common clients) and whether the code value mapping is correct are all service effectiveness, and the data meeting the technical effectiveness and the service effectiveness can be judged to be data with better quality, namely data with higher accuracy.
6. Sensitivity:
through metadata collection, whether the data table and the field belong to sensitive information or not is judged according to the keywords, the sensitive information is extracted to a feature library, desensitization feature rules are gradually perfected, and the desensitization feature rules are provided for a downstream desensitization program to desensitize the data. For example: if the license number is the identification number, the relevant desensitization rule is input into the desensitization characteristic library, the first three bits and the last three bits of the identification number are reserved, the middle bits are replaced by the star bits, if other license numbers exist, a sensitive rule needs to be added into the desensitization characteristic library, and a new desensitization means is formulated for desensitization of data.
7. Uniqueness:
1) and carrying out data acquisition by butting a database, a file system and the like, and carrying out example data uniqueness detection according to the main key to see whether repeated data exists. 2) Uniqueness of data sources, such as: the client number may have a core system and a credit system, and the unique data source of the business data is marked based on the principle of unique data export, so that data collision can be reduced.
8. Availability:
the method comprises the steps of labeling data according to the use time of the data and the use time of an interface for acquiring the data, determining the time range of the data which can be used by combining batch running, and marking the coverage range of the service through service understanding, wherein the coverage range is the usability check. For example: for the loan balance index statistics, the loan balance at the end of the month of each month, and data is run off-line at No. 1T +1 per month, wherein at the end of each month, the index is unavailable for the data in the current month and is not available until No. 1 in the next month, namely, the availability check is met.
9. Data map (which may be shown in fig. 3 in a specific application scenario), main data:
the method is mainly applied to data development and checking of business personnel on organization rules, source conditions and the like of the data. The data map develops a java program analysis data processing program through a data management and control platform, finds the flow direction and the reference of data, table and field information and rule mapping logic through keywords, and visually displays the data. Determining a unique data source of the system through service research, and marking a main data label on the subject model data to form a main data subject library; and forming a master-slave relationship and incidence relationship map of the master data among the topics. For example: for example, under the theme of the organization, a set of data system with the organization as a central axis records various information of the organization, the product theme records related information of various products with the product as the center, and certain association relationship exists between the product attribute and the organization, such as the relationship between Yukuai loan and the financial innovation department, and has important significance for the performance assessment of the organization.
An embodiment of the present invention further provides a data management device for improving data quality, as shown in fig. 4, the data management device may include:
a comparison module 11 for: collecting metadata of a specified database table as current metadata, and comparing the current metadata with a preset validity rule; wherein the current metadata comprises technical metadata and business metadata;
a first processing module 12 configured to: if the current metadata conforms to the validity rule, determining that the current metadata has validity;
a second processing module 13, configured to: and if the current metadata does not accord with the validity rule, indicating a corresponding person in charge to correct the current metadata, determining that the metadata obtained by correcting the current metadata is the current metadata, and returning to the step of comparing the current metadata with the preset validity rule until the frequency of determining that the current metadata does not accord with the validity rule reaches a frequency threshold.
The data management device for improving data quality provided by the embodiment of the invention can further comprise:
a first determining module, configured to: after the metadata of the specified database table is collected as the current metadata, whether data missing exists in the example data corresponding to the current metadata or whether the dependency relationship among different example data is met is judged, if the two judgment results are yes, the example data corresponding to the current metadata is determined to have validity, and if not, a corresponding person in charge is indicated to correct the example data corresponding to the current metadata.
The data management device for improving data quality provided by the embodiment of the invention can further comprise:
a second determination module configured to: after the metadata of the specified database table is collected as the current metadata, whether the instance data corresponding to the current metadata belongs to preset sensitive information or not is judged, if yes, desensitization is carried out on the sensitive information according to a preset desensitization program, and if not, desensitization is determined not to be carried out on the instance data corresponding to the current metadata.
The data management device for improving data quality provided by the embodiment of the invention can further comprise:
an inspection module to: checking whether repeated example data exist in a specified database table, if so, determining that any one of the repeated example data is target data, deleting other example data which are repeated with the target data, and if not, determining that the example data do not need to be deleted.
The data management device for improving data quality provided by the embodiment of the invention can further comprise:
a third judging module, configured to: and judging whether the example data in the specified database table is data which is acquired by the specified interface at the specified time and is within the specified range, if so, determining that the example data is available, otherwise, determining that the example data is unavailable.
The data management device for improving data quality provided by the embodiment of the invention can further comprise:
a map module to: acquiring current metadata and map information corresponding to example data after acquiring metadata of a specified database table as the current metadata, and displaying the map information in a pre-drawn data map in a visual manner; the map information comprises the flow direction, the reference relation and the organization rule of the current metadata and the corresponding instance data.
The data management device for improving data quality provided by the embodiment of the invention can further comprise:
a map module to: and determining unique data sources respectively corresponding to the systems, analyzing the relation among the data provided by the unique data sources, obtaining a relation map representing the relation among the data provided by the unique data sources, and displaying the relation map.
An embodiment of the present invention further provides a data management device for improving data quality, which may include:
a memory for storing a computer program;
and the processor is used for realizing the steps of the data governance method for improving the data quality in any one of the above modes when executing the computer program.
The embodiment of the invention also provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program realizes the steps of the data governance method for improving the data quality.
It should be noted that for the description of the relevant parts in the data governance device, the equipment and the storage medium for improving the data quality provided in the embodiment of the present invention, reference is made to the detailed description of the corresponding parts in the data governance method for improving the data quality provided in the embodiment of the present invention, and details are not repeated here. In addition, parts of the technical solutions provided in the embodiments of the present invention that are consistent with the implementation principles of the corresponding technical solutions in the prior art are not described in detail, so as to avoid redundant description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A data governance method for improving data quality, comprising:
collecting metadata of a specified database table as current metadata, and comparing the current metadata with a preset validity rule; wherein the current metadata comprises technical metadata and business metadata;
if the current metadata conforms to the validity rule, determining that the current metadata has validity;
and if the current metadata does not accord with the validity rule, indicating a corresponding person in charge to correct the current metadata, determining that the metadata obtained by correcting the current metadata is the current metadata, and returning to the step of comparing the current metadata with the preset validity rule until the frequency of determining that the current metadata does not accord with the validity rule reaches a frequency threshold.
2. The method of claim 1, wherein after collecting metadata that specifies a database table as current metadata, further comprising:
and judging whether the example data corresponding to the current metadata has data loss or meets the dependency relationship among different example data, if the two judgment results are yes, determining that the example data corresponding to the current metadata has validity, and otherwise, indicating a corresponding responsible person to correct the example data corresponding to the current metadata.
3. The method of claim 2, wherein after collecting the metadata that specifies the database table as current metadata, further comprising:
judging whether the example data corresponding to the current metadata belongs to preset sensitive information, if so, desensitizing the sensitive information according to a preset desensitizing program, and if not, determining that the example data corresponding to the current metadata does not need to be desensitized.
4. The method of claim 3, further comprising:
and checking whether repeated example data exist in the specified database table, if so, determining that any one of the repeated example data is target data, deleting other example data repeated with the target data, and if not, determining that the example data does not need to be deleted.
5. The method of claim 4, further comprising:
and judging whether the example data in the specified database table is data which is acquired at a specified time and a specified interface and is in a specified range, if so, determining that the example data is available, otherwise, determining that the example data is unavailable.
6. The method of claim 5, wherein after collecting metadata that specifies a database table as current metadata, further comprising:
acquiring current metadata and map information corresponding to the instance data, and displaying the map information in a pre-drawn data map in a visual manner; the map information comprises the flow direction, the reference relation and the organization rule of the current metadata and the corresponding instance data.
7. The method of claim 6, further comprising:
and determining unique data sources respectively corresponding to the systems, analyzing the relation among the data provided by the unique data sources, obtaining a relation map representing the relation among the data provided by the unique data sources, and displaying the relation map.
8. A data governance device for improving data quality, comprising:
a comparison module for: collecting metadata of a specified database table as current metadata, and comparing the current metadata with a preset validity rule; wherein the current metadata comprises technical metadata and business metadata;
a first processing module to: if the current metadata conforms to the validity rule, determining that the current metadata has validity;
a second processing module to: and if the current metadata does not accord with the validity rule, indicating a corresponding person in charge to correct the current metadata, determining that the metadata obtained by correcting the current metadata is the current metadata, and returning to the step of comparing the current metadata with the preset validity rule until the frequency of determining that the current metadata does not accord with the validity rule reaches a frequency threshold.
9. A data governance device for improving data quality, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the data governance method for improving data quality according to any one of claims 1 to 7 when executing said computer program.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, carries out the steps of a data governance method for improving data quality according to any one of claims 1 to 7.
CN202010406901.6A 2020-05-14 2020-05-14 Data governance method for improving data quality Pending CN111597177A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010406901.6A CN111597177A (en) 2020-05-14 2020-05-14 Data governance method for improving data quality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010406901.6A CN111597177A (en) 2020-05-14 2020-05-14 Data governance method for improving data quality

Publications (1)

Publication Number Publication Date
CN111597177A true CN111597177A (en) 2020-08-28

Family

ID=72187357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010406901.6A Pending CN111597177A (en) 2020-05-14 2020-05-14 Data governance method for improving data quality

Country Status (1)

Country Link
CN (1) CN111597177A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183952A (en) * 2020-09-08 2021-01-05 支付宝(杭州)信息技术有限公司 Index quality supervision processing method and device and electronic equipment
CN112509653A (en) * 2020-10-29 2021-03-16 望海康信(北京)科技股份公司 Medical record data processing method and system, corresponding equipment and storage medium
CN112632556A (en) * 2020-12-18 2021-04-09 北京明朝万达科技股份有限公司 Endpoint security response method and device based on data classification and classification
CN112633782A (en) * 2021-03-09 2021-04-09 发明之家(北京)科技有限公司 Enterprise data management method and system based on Internet of things
CN113918555A (en) * 2021-10-29 2022-01-11 桂林航天工业学院 Data management method for improving data quality
CN113918774A (en) * 2021-10-28 2022-01-11 中国平安财产保险股份有限公司 Data management method, device, equipment and storage medium
CN114971140A (en) * 2022-03-03 2022-08-30 北京计算机技术及应用研究所 Service data quality evaluation method oriented to data exchange
CN115309789A (en) * 2022-10-11 2022-11-08 浩鲸云计算科技股份有限公司 Method for generating associated data graph in real time based on intelligent dynamic business object

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595563A (en) * 2018-04-13 2018-09-28 林秀丽 A kind of data quality management method and device
CN109522746A (en) * 2018-11-07 2019-03-26 平安医疗健康管理股份有限公司 A kind of data processing method, electronic equipment and computer storage medium
CN110262926A (en) * 2019-06-05 2019-09-20 世纪龙信息网络有限责任公司 Metadata restorative procedure, device, system and the computer equipment of server
CN110851539A (en) * 2019-10-25 2020-02-28 东软集团股份有限公司 Metadata verification method and device, readable storage medium and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595563A (en) * 2018-04-13 2018-09-28 林秀丽 A kind of data quality management method and device
CN109522746A (en) * 2018-11-07 2019-03-26 平安医疗健康管理股份有限公司 A kind of data processing method, electronic equipment and computer storage medium
CN110262926A (en) * 2019-06-05 2019-09-20 世纪龙信息网络有限责任公司 Metadata restorative procedure, device, system and the computer equipment of server
CN110851539A (en) * 2019-10-25 2020-02-28 东软集团股份有限公司 Metadata verification method and device, readable storage medium and electronic equipment

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183952A (en) * 2020-09-08 2021-01-05 支付宝(杭州)信息技术有限公司 Index quality supervision processing method and device and electronic equipment
CN112509653A (en) * 2020-10-29 2021-03-16 望海康信(北京)科技股份公司 Medical record data processing method and system, corresponding equipment and storage medium
CN112632556A (en) * 2020-12-18 2021-04-09 北京明朝万达科技股份有限公司 Endpoint security response method and device based on data classification and classification
CN112633782A (en) * 2021-03-09 2021-04-09 发明之家(北京)科技有限公司 Enterprise data management method and system based on Internet of things
CN112633782B (en) * 2021-03-09 2021-06-01 发明之家(北京)科技有限公司 Enterprise data management method and system based on Internet of things
CN113918774A (en) * 2021-10-28 2022-01-11 中国平安财产保险股份有限公司 Data management method, device, equipment and storage medium
CN113918555A (en) * 2021-10-29 2022-01-11 桂林航天工业学院 Data management method for improving data quality
CN113918555B (en) * 2021-10-29 2024-05-10 桂林航天工业学院 Data management method for improving data quality
CN114971140A (en) * 2022-03-03 2022-08-30 北京计算机技术及应用研究所 Service data quality evaluation method oriented to data exchange
CN115309789A (en) * 2022-10-11 2022-11-08 浩鲸云计算科技股份有限公司 Method for generating associated data graph in real time based on intelligent dynamic business object

Similar Documents

Publication Publication Date Title
CN111597177A (en) Data governance method for improving data quality
US8341131B2 (en) Systems and methods for master data management using record and field based rules
CN109522746A (en) A kind of data processing method, electronic equipment and computer storage medium
US20160275148A1 (en) Database query method and device
CN111506559B (en) Data storage method, device, electronic equipment and storage medium
US20080109419A1 (en) Computer apparatus, computer program and method, for calculating importance of electronic document on computer network, based on comments on electronic document included in another electronic document associated with former electronic document
CN107844588B (en) Data dictionary processing method and device, storage medium and processor
CN114153962A (en) Data matching method and device and electronic equipment
CN113420057A (en) Account checking data processing method and related device
CN112307052A (en) Data management method, service system, terminal and storage medium
Shahbaz Data mapping for data warehouse design
CN111143421A (en) Data sharing method and device, electronic equipment and storage medium
CN109947797B (en) Data inspection device and method
Zealand Data integration manual
CN112667619B (en) Method, device, terminal equipment and storage medium for auxiliary checking data
CN114490692A (en) Data checking method, device, equipment and storage medium
CN112799868B (en) Root cause determination method and device, computer equipment and storage medium
CN110705816B (en) Task allocation method and device based on big data
CN111754131A (en) Enterprise information dynamic monitoring method, equipment and medium
CN109597828B (en) Offline data checking method, device and server
CN109377391B (en) Information tracking method, storage medium and server
CN109144999B (en) Data positioning method, device, storage medium and program product
CN109033469B (en) Ranking method and device of search results, terminal and computer storage medium
CN113609407B (en) Regional consistency verification method and device
CN113934729A (en) Data management method based on knowledge graph, related equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200828