CN110119395B - Method for realizing association processing of data standard and data quality based on metadata in big data management - Google Patents

Method for realizing association processing of data standard and data quality based on metadata in big data management Download PDF

Info

Publication number
CN110119395B
CN110119395B CN201910446036.5A CN201910446036A CN110119395B CN 110119395 B CN110119395 B CN 110119395B CN 201910446036 A CN201910446036 A CN 201910446036A CN 110119395 B CN110119395 B CN 110119395B
Authority
CN
China
Prior art keywords
data
metadata
standard
quality
standards
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910446036.5A
Other languages
Chinese (zh)
Other versions
CN110119395A (en
Inventor
滑少鹏
王克强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Primeton Information Technology Co ltd
Original Assignee
Primeton Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Primeton Information Technology Co ltd filed Critical Primeton Information Technology Co ltd
Priority to CN201910446036.5A priority Critical patent/CN110119395B/en
Publication of CN110119395A publication Critical patent/CN110119395A/en
Application granted granted Critical
Publication of CN110119395B publication Critical patent/CN110119395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention relates to a method for realizing the association processing of data standards and data quality based on metadata in big data management, which comprises the following steps of (1) collecting metadata; (2) importing enterprise data standards; (3) Classifying the metadata according to the data standard, and storing the metadata by taking the data standard number as a key field; (4) formulating a data quality standard according to the data standard; (5) writing quality rules according to the data quality standard; and (6) checking the metadata according to the quality rule. The method for realizing the association processing of the data standard and the data quality based on the metadata in the big data management breaks through barriers of enterprises in the aspects of business requirements and technical requirements in the data management, can give the improvement opinion according to the data standard requirements, truly aims at business, realizes complete closed loop in the big data management of the enterprises by taking the technology as a means, has important significance in improving the data quality and standardizing the data definition of the enterprises and ensuring the effective management of the data assets, and has good popularization and application values.

Description

Method for realizing association processing of data standard and data quality based on metadata in big data management
Technical Field
The invention relates to the field of computer software, in particular to the field of big data management, and specifically relates to a method for realizing association processing of data standards and data quality based on metadata in big data management.
Background
With the rapid development of big data technology, more and more enterprises begin to pay attention to their own data problems, and begin to use certain means to manage and control data in enterprise data management and data planning, such as using a metadata system to manage enterprise metadata, or using a data quality system to identify problem data of the enterprises, so as to improve data quality, or adopting a consultation company to help the enterprises to comb data standards. The means can help enterprises to improve the data quality to a certain extent, the effect of data management is achieved, and as the informatization construction steps of the enterprises are quickened, the enterprises face more and more data problems, the data are managed only from one view, and the requirements of the enterprises on the data management cannot be met. Therefore, the dimension barriers of metadata, data standards and data quality are required to be opened, quality rules are formulated through the data standards, metadata is checked through the quality rules, the corresponding data standards are found through the metadata, and the data problems are enabled to be according to the data standards and the data standards, so that the data quality is improved, the data definition is standardized, the effective management of data assets is ensured, and a benign closed-loop data management and control system is constructed.
The related technology of the existing big data management is as follows:
(1) A data blood relationship visualization graphic system in data treatment (application number: 201711383801.0), which provides a data blood relationship visualization graphic system in data treatment, comprising information nodes, and further comprising the following modules: a data flow line; refers to the path of the data stream; at least one node of an extraction strategy node, a cleaning rule node, a conversion rule node, a loading rule node and a processing rule node is used for explaining how data is extracted; the cleaning rule node is used for representing the screening standard of the data in the data circulation process; the conversion rule node is used for representing the change standard of the data in the data circulation process; the loading rule node is used for describing how data are put in storage; the processing rule node is used for representing the data archiving or destroying. According to the application, through the blood relationship of different levels, the migration circulation of the data can be clearly known, and the basis is provided for evaluating the data value and providing the data.
(2) A data standard processing method, a device and a storage medium thereof (application number: 201811356788.4) provide a data standard processing method, a device and a storage medium thereof, and relate to the technical field of big data processing. The data standard processing method comprises the following steps: collecting metadata from a business database storing production source data; abstracting N data standards from the metadata, wherein the N data standards at least comprise standard names, and N is a positive integer; m of the N data standards are selected to form a data standard set, wherein M is a positive integer smaller than N; and generating a check result table based on the data standard set. The data standard processing method forms a data standard set based on the data standard of the metadata, and improves the correlation of the data standard.
By the technology of the data blood relationship visualization graphic system in the data processing, the data circulation line is collected: the method for extracting at least one of the strategy node, the cleaning rule node, the conversion rule node, the loading rule node and the processing rule node can realize the blood-cause relation of the metadata, understand the migration flow of the data, and provide basis for evaluating the data value and data. But the association with the data standard is lacking, the quick tracing of the metadata and the data standard cannot be established, and the problem data of the enterprise cannot be found by means of the metadata, so that the benign closed loop of the enterprise big data management cannot be realized.
The technology of the data standard processing method, the device and the storage medium thereof comprises the following steps: collecting metadata from a business database storing production source data; abstracting N data standards from the metadata, wherein the N data standards at least comprise standard names, and N is a positive integer; m of the N data standards are selected to form a data standard set, wherein M is a positive integer smaller than N; and generating a check result table based on the data standard set. The source of the data standard is metadata, and the metadata is derived from the databases of each business system, so that the databases of each business system need to be ensured to be completely built according to the enterprise standard in advance, otherwise, once the correctness of the metadata deviates, the extracted data standard becomes meaningless, and the corresponding data quality also lacks of authenticity and usability.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for realizing the association processing of data standards and data quality based on metadata in big data management with high data quality, high authenticity and good availability.
In order to achieve the above object, the method for realizing the association processing of the data standard and the data quality based on the metadata in the big data management of the invention comprises the following steps:
the method for realizing the association processing of the data standard and the data quality based on the metadata in the big data management is mainly characterized by comprising the following steps:
(1) Collecting metadata;
(2) Importing enterprise data standards;
(3) Classifying the metadata according to the data standard, and storing the metadata by taking the data standard number as a key field;
(4) Formulating a data quality standard according to the data standard;
(5) Writing a quality rule according to a data quality standard;
(6) And checking the metadata according to the quality rules.
Preferably, the step (1) specifically includes the following steps:
(1.1) acquiring data source configuration, and scanning database information in a data source through a metadata adapter;
(1.2) converting the data and writing the data to the metadata system.
Preferably, the database information in the step (1.1) includes organization and structure of the database, table name, field name, view, relationship, primary key and foreign key.
Preferably, the step (2) specifically includes the following steps:
(2.1) sorting the enterprise data standard into a file template with identifiable metadata;
(2.2) importing the data standard into the metadata system in a metadata acquisition mode;
(2.3) managing the data standard as independent metadata.
Preferably, the data standard in the step (3) is applicable to a plurality of metadata, and a single metadata corresponds to a single data standard.
Preferably, the step (4) specifically includes the following steps:
(4.1) importing the data quality criteria into the metadata system to manage as independent metadata.
Preferably, the data standard in the step (4) corresponds to a plurality of data quality standards, and a single data quality standard corresponds to only a single data standard.
Preferably, the data quality standard in the step (5) corresponds to a plurality of quality rules, and a single quality rule is only from a single data quality standard.
Preferably, the quality rule in the step (5) includes a detection range, a detection attribute and a detection rule.
Preferably, the step (6) specifically includes the following steps:
(6.1) executing the quality rule and collecting problem data generated during the execution;
(6.2) corresponding metadata is searched out according to the field names and the table names of the data, and data standards corresponding to the metadata are obtained;
(6.3) sorting the check information and forming a data quality report.
The method for realizing the association processing of the data standard and the data quality based on the metadata in the big data management breaks through barriers of enterprises in the business requirement and the technical requirement in the data management by associating the metadata, the data standard and the data quality, checks the metadata by the data standard, achieves the aim of mastering the data quality, provides a business basis corresponding to the problem when the enterprise problem data is found, and can provide a correction opinion according to the data standard requirement, thereby truly realizing the aim of taking the business as a target, realizing the complete closed loop in the big data management of the enterprise by taking the technology as a means, improving the data quality and the standard data definition of the enterprise, ensuring the effective management of the data asset, and having important significance and good popularization and application value.
Drawings
FIG. 1 is a flow chart of a method for realizing data standard and data quality association processing based on metadata in big data management according to the invention.
FIG. 2 is a diagram of the relationship among metadata, data standards, quality standards, and quality rules of the method for implementing data standard and data quality association processing based on metadata in big data governance of the present invention.
FIG. 3 is a functional architecture diagram of the modules of the data asset platform used in an embodiment of a method for implementing data criteria and data quality association processing based on metadata in big data governance in accordance with the present invention.
FIG. 4 is a quality rule checking flow chart of a method for realizing the association processing of data standards and data quality based on metadata in big data management.
Detailed Description
In order to more clearly describe the technical contents of the present invention, a further description will be made below in connection with specific embodiments.
The method for realizing the association processing of the data standard and the data quality based on the metadata in the big data management comprises the following steps:
(1) Collecting metadata;
(1.1) acquiring data source configuration, and scanning database information in a data source through a metadata adapter;
(1.2) converting the data and writing the data into a metadata system;
(2) Importing enterprise data standards;
(2.1) sorting the enterprise data standard into a file template with identifiable metadata;
(2.2) importing the data standard into the metadata system in a metadata acquisition mode;
(2.3) managing the data standard as independent metadata;
(3) Classifying the metadata according to the data standard, and storing the metadata by taking the data standard number as a key field;
(4) Formulating a data quality standard according to the data standard;
(4.1) importing data quality criteria into the metadata system for management as independent metadata;
(5) Writing a quality rule according to a data quality standard;
(6) Checking the metadata according to the quality rules;
(6.1) executing the quality rule and collecting problem data generated during the execution;
(6.2) corresponding metadata is searched out according to the field names and the table names of the data, and data standards corresponding to the metadata are obtained;
(6.3) sorting the check information and forming a data quality report.
As a preferred embodiment of the present invention, the database information in the step (1.1) includes organization and structure of the database, table name, field name, view, relationship, primary key and foreign key.
As a preferred embodiment of the present invention, the data standard in the step (3) is applicable to a plurality of metadata, and a single metadata corresponds to only a single data standard.
As a preferred embodiment of the present invention, the data standard in the step (4) corresponds to a plurality of data quality standards, and a single data quality standard corresponds to only a single data standard.
As a preferred embodiment of the present invention, the data quality standard in the step (5) corresponds to a plurality of quality rules, and a single quality rule is derived from only a single data quality standard.
As a preferred embodiment of the present invention, the quality rule in the step (5) includes a detection range, a detection attribute, and a detection rule.
In a specific embodiment of the invention, aiming at the defects existing in the background technology, the invention provides a method for associating the data standard with the metadata, creating the quality standard according to the data standard, then configuring the quality rule, finally checking the metadata according to the quality rule, opening a business and technology barrier, taking the real requirement of an enterprise as the standard, taking the metadata as a support, taking the quality rule as a means, and making the data problem be according to the check and the rule, thereby improving the data quality, standardizing the data definition, ensuring the effective management of the data asset and constructing a benign closed-loop data management and control system.
The invention discloses a method for opening up data standard and data quality based on metadata in big data management, which comprises the following steps: and acquiring system metadata, importing enterprise data standards, associating the metadata with the data standards, creating quality standards according to the data standards, reconfiguring quality rules, and finally checking the metadata according to the quality rules. By utilizing the method and the system, the quality difference of metadata in the enterprise informatization system can be rapidly identified, the barriers of enterprises on business requirements and technical requirements in data management are broken through the association of the data standard and the data quality, so that the data problems can be checked and determined according to the method, the method and the system have important significance for improving the data quality and standardizing the data definition of the enterprises and ensuring the effective management of the data assets, and have good popularization and application values.
The invention aims to provide a method for opening up data standard and data quality based on metadata in big data management, which can quickly identify the quality difference of metadata in an enterprise informatization system, breaks through barriers of enterprises on business requirements and technical requirements in big data management through the association of the data standard and the data quality, identifies metadata which does not meet the quality standard, and ensures the reality and effectiveness of the enterprise data from the source, thereby realizing the effective management of enterprise data assets, and comprises the following specific operation steps:
step 1, metadata acquisition: including obtaining a data source configuration and then scanning database information in the data source through a metadata adapter, such as: the method comprises the steps of schema, table names, field names, views, relations, primary keys, external keys and the like, wherein the schema refers to the organization and the structure of a database, data are converted, finally the data are written into a metadata system, the client and the server can be divided integrally, the client comprises the configuration of an adapter, a data source, an acquisition task and the like, and the server is in charge of truly acquiring data, converting the data, warehousing, landing and the like. Common metadata models generally include, but are not limited to: package, class, data type three elements, package: is a container that can group classes and data types related to a metadata model by specific metadata sources, classes: types of metadata objects, such as database types and ETL types, are defined, classes have attributes, and the classes have relations, including combination relations, dependency relations and inheritance relations. Data type: is used to define an attribute, such as a "description" attribute of a database class, whose data type is text-type, so that the metadata system can identify how to present this attribute to the user.
Step 2, importing enterprise data standards: the enterprise data standard is arranged into a file template with identifiable metadata, such as Excel and Xml, the data standard is imported into a metadata system in a metadata acquisition mode, the data standard is managed as independent metadata, and the data standard template needs to include but is not limited to:
1) Data standard numbering
2) Standard first class classification
3) Standard two-stage classification
4) Standard Chinese name
5) Standard alias
6) Service definition
7) Definition basis
8) Data type
9) Value range
10 Data length)
11 Data accuracy)
12 Data presentation format
13 Authoritative system
14 Data standard state
15 A filling date.
Step 3, metadata associated data standard: classifying which data standard the metadata belongs to, storing the data standard number as a key field, wherein the relation between the data standard and the metadata is 1: n, one data standard may be applicable to a plurality of metadata, and one metadata may correspond to only one data standard.
And 4, formulating a quality standard according to the data standard: according to the requirements of the data standards on the integrity, consistency, uniqueness, normalization, timeliness and accuracy of the data, the quality standard is compiled, the compiling work can be finished on line or off line, if the compiling work is finished off line, the data quality standard can be imported into a metadata system in a step 2 mode to serve as independent metadata for management, and the relation between the data standard and the data quality standard is 1: n, one data standard may correspond to a plurality of quality standards, and one quality standard may only correspond to one data standard, where the construction content of the quality standard includes, but is not limited to:
1) Corresponding to the standard number of data
2) Corresponding data standard name
3) Quality standard number
4) Data quality dimension
5) Data quality dimension coding
6) Data quality standard description
7) Reference object standard number
8) Reference object standard name
9) The reason is explained.
Step 5, writing a quality rule according to a quality standard: the data quality rule is a technical implementation of the data quality standard, is generally an executable SQL sentence (database language), and can also be completed by a configuration mode by using a professional data quality system, wherein the relation between the data quality standard and the quality rule is 1: n, a data quality standard may be written with a plurality of quality rules, and a quality rule may only be derived from a data quality standard.
The quality rule at least comprises three pieces of content including detection range, detection attribute and detection rule.
The detection range is a basic range element involved in defining and maintaining the data quality detection rule. The definition of the detection range can be specific data items, SQL statement values or combination by using other attributes. The purpose of the detection range is to define the detection range of the standard specification, and the definition and maintenance of the basic rule are convenient. The detection range includes a name, a description, a value, an addition time, an addition person, and the like. Common detection ranges are: registering certification date, current date of system, legal representative name, ID card number, etc.
The detection attribute is a data quality judgment rule defining a basis according to the data quality control requirement. Through detection attribute management, the detection rule is flexibly defined by organically combining with a detection range. The detection attributes include, but are not limited to, null value checking, value range checking, specification checking, duplicate data checking, record missing checking, referential integrity checking, result set comparison, SQL script checking, outlier checking, balance checking, fluctuation checking, timeliness checking, logic checking, and other quality rules.
The detection rule is a logic rule for judging whether the data has abnormality, and the detection rule is used for defining whether a detected result set belongs to a correct party or an incorrect party based on a detection range and a detection attribute.
Step 6, checking the metadata according to the quality rule: executing a quality rule, collecting problem data generated in the execution process, wherein the problem data comprises, but is not limited to, a field name, a field description, a data value, a data type and a belonged table name, and according to the field name and the belonged table name of the data, the corresponding metadata can be searched out, so that the data standard corresponding to the metadata is obtained, and the data standard corresponding to the metadata is formed by sorting the information, wherein the data quality report comprises, but is not limited to: metadata name, data problem rate, data detail, problem reason, standard value, modified target problem rate, modification opinion, corresponding data standard name, standard basis. The report can be submitted to a business system responsible person or a data management group, and a powerful basis is provided for enterprise big data management.
Through the above six steps, through the association of metadata, data standard and data quality, the barriers of enterprises in the business requirement and technical requirement in the data management are broken, the data quality is formulated through the data standard, the metadata is checked for the data quality, the aim of mastering the data quality is achieved, meanwhile, business basis corresponding to the problem can be provided when the enterprise problem data is found, in addition, correction opinion can be provided according to the data standard requirement, the business is really achieved, the technology is adopted as a means, the complete closed loop in the enterprise big data management is realized, the important significance is provided for enterprises in improving the data quality and standardizing the data definition, and the effective management of the data asset is ensured, and the popularization and application value is good.
The method for opening the data standard and the data quality based on the metadata in the big data management breaks through barriers of enterprises in business requirements and technical requirements in the data management by associating the metadata, the data standard and the data quality, and checks the metadata by formulating the data quality through the data standard, so that the aim of nominating the management and the control of the data quality is fulfilled, meanwhile, the business basis corresponding to the problem can be provided when the enterprise problem data is found, in addition, the improvement opinion can be provided according to the data standard requirement, the business is truly realized as a target, the complete closed loop in the enterprise big data management is realized by taking the technology as a means, the definition of the data quality and the normative data is improved for the enterprises, and the method has important significance for guaranteeing the effective management of the data asset and has good popularization and application values.
Embodiments of the technical solution of the present invention will be specifically described with reference to fig. 1 to 4:
the invention provides a method for opening up data standard and data quality based on metadata in big data management, the specific implementation steps of the invention refer to fig. 1, and fig. 3 is a functional architecture of a data asset platform used in the embodiment:
step 1, metadata acquisition: in specific implementation, the metadata collection module may be used to complete this step, firstly collect data sources of each service system, collect table names, field names, views, relationships, primary keys, external keys and the like in the service library into the metadata system, and store the data as metadata of a database type, and secondly collect ETL processes for data exchange between libraries in an enterprise, and store the data as metadata of an ETL type, for example: powerCenter, storage process, kettle, dataStage, SQL Server Integration Services, SQL Server Analysis Services, perl script, etc., and finally, the source library and the target library in the ETL process are mounted in the corresponding database metadata to form a blood-source map of the metadata.
Step 2, importing enterprise data standards: in the implementation, the data standard results of enterprises or consultation companies are arranged into an Excel format according to a metadata acquisition template, and the data standard results are acquired into a metadata database through a metadata Excel collector to serve as independent metadata for management. The following table shows the data standard templates defined by a certain enterprise during implementation:
step 3, metadata associated data standard: in specific implementation, the metadata management interface provides an association function of the data standard, and according to the metadata names, the standard classification in the system and the data standard, the authority system and other attributes, the system automatically recommends the data standard with the highest possibility of corresponding the metadata, and simultaneously provides a mode of searching other data standards for association, and the associated metadata display mode has a certain change and is used for distinguishing metadata which are not associated and is ended by all metadata associated data standards.
And 4, formulating a quality standard according to the data standard: in particular, a data standard may derive a plurality of quality standards from the data integrity, consistency, uniqueness, normalization, timeliness and accuracy requirements, and the quality standards are stored as a slave table of data standards. After the compiling is completed, the quality standard is collected into a metadata base through metadata analysis and is used as independent metadata for management. The following table shows the data quality criteria established by a certain enterprise during the implementation process:
step 5, writing a quality rule according to a quality standard: in specific implementation, a quality rule is written in a data quality management module according to the requirement of a quality standard, the quality rule at least comprises three contents of a detection range, a detection attribute and a detection rule, for example, the complete quality standard of personnel information is employee codes, employee IDs, departments where the personnel information are located cannot be null values, a specific database user name, a table name and a field name are configured in the detection range, a null value check rule is added in the detection attribute, the configuration value in the detection rule is null as problem data processing, and the following is a part of SQL sentences operated when the quality rule is executed:
(1) Null check total SQL:
SELECT COUNT(*)AS COUNT FROM TEST.EMP_TABLE WHERE 1=1;
(2) Null check problem number SQL:
SELECT COUNT(*)AS COUNT FROM TEST.EMP_TABLE WHERE 1=1AND(TEST.EMP_TABLE.EMPCODE IS NULL OR TEST.EMP_TABLE.EMPNAME IS NULL OR TEST.EMP_TABLE.ORGID IS NULL);
step 6, checking the metadata according to the quality rule: in particular implementations, quality rules are added to the execution task, setting execution cycles, such as 22 per night: 00 executing the task, after execution, the system records the execution result of the rule, such as the number of questions, the total number, the execution time and the like, and collects the question data generated in the execution process, wherein the question data comprises a field name, a field description, a data value, a data type and a table name, and corresponding metadata can be searched according to the field name and the table name of the data, so that the data standard corresponding to the metadata is obtained, the information is collated to form a data quality report, and the report can be submitted to a service system responsible person or a data management group.
The following table is a data quality report for a business in implementation:
through the above six steps, through the association of metadata, data standard and data quality, the barriers of enterprises in the business requirement and technical requirement in the data management are broken, the data quality is formulated through the data standard, the metadata is checked for the data quality, the aim of mastering the data quality is achieved, meanwhile, business basis corresponding to the problem can be provided when the enterprise problem data is found, in addition, correction opinion can be provided according to the data standard requirement, the business is really achieved, the technology is adopted as a means, the complete closed loop in the enterprise big data management is realized, the important significance is provided for enterprises in improving the data quality and standardizing the data definition, and the effective management of the data asset is ensured, and the popularization and application value is good.
The method for realizing the association processing of the data standard and the data quality based on the metadata in the big data management breaks through barriers of enterprises in the business requirement and the technical requirement in the data management by associating the metadata, the data standard and the data quality, checks the metadata by the data standard, achieves the aim of mastering the data quality, provides a business basis corresponding to the problem when the enterprise problem data is found, and can provide a correction opinion according to the data standard requirement, thereby truly realizing the aim of taking the business as a target, realizing the complete closed loop in the big data management of the enterprise by taking the technology as a means, improving the data quality and the standard data definition of the enterprise, ensuring the effective management of the data asset, and having important significance and good popularization and application value.
In this specification, the invention has been described with reference to specific embodiments thereof. It will be apparent, however, that various modifications and changes may be made without departing from the spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (4)

1. A method for realizing association processing of data standards and data quality based on metadata in big data processing, which is characterized by comprising the following steps:
(1) Collecting metadata;
(2) Importing enterprise data standards;
(3) Classifying the metadata according to the data standard, and storing the metadata by taking the data standard number as a key field;
(4) Formulating a data quality standard according to the data standard; the method specifically comprises the following steps:
(4.1) importing data quality criteria into the metadata system for management as independent metadata;
wherein the data standard corresponds to a plurality of data quality standards, and the single data quality standard corresponds to a single data standard only;
(5) Writing a quality rule according to a data quality standard; wherein the data quality standard corresponds to a plurality of quality rules, and a single quality rule is only from a single data quality standard; the quality rules comprise detection ranges, detection attributes and detection rules;
(6) Checking the metadata according to the quality rules;
the step (2) specifically comprises the following steps:
(2.1) sorting the enterprise data standard into a file template with identifiable metadata;
(2.2) importing the data standard into the metadata system in a metadata acquisition mode;
(2.3) managing the data standard as independent metadata;
the data standard in the step (3) is applicable to a plurality of metadata, and single metadata only corresponds to a single data standard.
2. The method for realizing association processing of data standards and data quality based on metadata in big data governance according to claim 1, wherein the step (1) specifically comprises the following steps:
(1.1) acquiring data source configuration, and scanning database information in a data source through a metadata adapter;
(1.2) converting the data and writing the data to the metadata system.
3. The method for associating data standards with data quality based on metadata in big data governance according to claim 2, wherein the database information in step (1.1) includes organization and structure of database, table name, field name, view, relationship, primary key and foreign key.
4. The method for realizing association processing of data standards and data quality based on metadata in big data governance according to claim 1, wherein the step (6) specifically comprises the following steps:
(6.1) executing the quality rule and collecting problem data generated during the execution;
(6.2) corresponding metadata is searched out according to the field names and the table names of the data, and data standards corresponding to the metadata are obtained;
(6.3) sorting the check information and forming a data quality report.
CN201910446036.5A 2019-05-27 2019-05-27 Method for realizing association processing of data standard and data quality based on metadata in big data management Active CN110119395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910446036.5A CN110119395B (en) 2019-05-27 2019-05-27 Method for realizing association processing of data standard and data quality based on metadata in big data management

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910446036.5A CN110119395B (en) 2019-05-27 2019-05-27 Method for realizing association processing of data standard and data quality based on metadata in big data management

Publications (2)

Publication Number Publication Date
CN110119395A CN110119395A (en) 2019-08-13
CN110119395B true CN110119395B (en) 2023-09-15

Family

ID=67523306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910446036.5A Active CN110119395B (en) 2019-05-27 2019-05-27 Method for realizing association processing of data standard and data quality based on metadata in big data management

Country Status (1)

Country Link
CN (1) CN110119395B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125075A (en) * 2019-12-17 2020-05-08 国网天津市电力公司电力科学研究院 Data management method and system for non-computable region
CN111078780A (en) * 2019-12-23 2020-04-28 北京中创信测科技股份有限公司 AI optimization data management method
CN111177134B (en) * 2019-12-26 2021-04-02 上海科技发展有限公司 Data quality analysis method, device, terminal and medium suitable for mass data
CN112131264A (en) * 2020-09-15 2020-12-25 杭州城市大数据运营有限公司 Method, device and system for recommending different source difference information
US11669682B2 (en) * 2020-12-22 2023-06-06 Collibra Belgium Bv Bespoke transformation and quality assessment for term definition
CN112800046A (en) * 2021-02-26 2021-05-14 上海帕科信息科技有限公司 Artificial intelligence platform applied to field data management
CN112905329A (en) * 2021-03-24 2021-06-04 武汉众邦银行股份有限公司 Full life cycle management and control method for improving standard falling rate of data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748775A (en) * 2017-10-17 2018-03-02 上海计算机软件技术开发中心 A kind of data governing system based on the quality of data
CN108717456A (en) * 2018-05-22 2018-10-30 浪潮软件股份有限公司 A kind of data lifecycle management platform that data source is unrelated and method
CN109034532A (en) * 2018-06-20 2018-12-18 江苏网域科技有限公司 A kind of data managing and control system based on big data
CN109272155A (en) * 2018-09-11 2019-01-25 郑州向心力通信技术股份有限公司 A kind of corporate behavior analysis system based on big data
CN109344133A (en) * 2018-08-27 2019-02-15 成都四方伟业软件股份有限公司 A kind of data administer driving data and share exchange system and its working method
CN109408502A (en) * 2018-11-14 2019-03-01 成都四方伟业软件股份有限公司 A kind of data standard processing method, device and its storage medium
CN109523423A (en) * 2018-11-28 2019-03-26 中国海洋石油集团有限公司 A kind of application system generation method, device, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10585875B2 (en) * 2016-04-06 2020-03-10 International Businses Machines Corporation Data warehouse model validation
US10318501B2 (en) * 2016-10-25 2019-06-11 Mastercard International Incorporated Systems and methods for assessing data quality
US10585864B2 (en) * 2016-11-11 2020-03-10 International Business Machines Corporation Computing the need for standardization of a set of values

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748775A (en) * 2017-10-17 2018-03-02 上海计算机软件技术开发中心 A kind of data governing system based on the quality of data
CN108717456A (en) * 2018-05-22 2018-10-30 浪潮软件股份有限公司 A kind of data lifecycle management platform that data source is unrelated and method
CN109034532A (en) * 2018-06-20 2018-12-18 江苏网域科技有限公司 A kind of data managing and control system based on big data
CN109344133A (en) * 2018-08-27 2019-02-15 成都四方伟业软件股份有限公司 A kind of data administer driving data and share exchange system and its working method
CN109272155A (en) * 2018-09-11 2019-01-25 郑州向心力通信技术股份有限公司 A kind of corporate behavior analysis system based on big data
CN109408502A (en) * 2018-11-14 2019-03-01 成都四方伟业软件股份有限公司 A kind of data standard processing method, device and its storage medium
CN109523423A (en) * 2018-11-28 2019-03-26 中国海洋石油集团有限公司 A kind of application system generation method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
数据质量评估管理工具的设计与实现;李晶晶等;《信息技术与标准化》;61-65 *

Also Published As

Publication number Publication date
CN110119395A (en) 2019-08-13

Similar Documents

Publication Publication Date Title
CN110119395B (en) Method for realizing association processing of data standard and data quality based on metadata in big data management
Lenz et al. Summarizability in OLAP and statistical data bases
Stvilia et al. A framework for information quality assessment
Price et al. A semiotic information quality framework: development and comparative analysis
CN111190881A (en) Data management method and system
CN111159191A (en) Data processing method, device and interface
US20090150447A1 (en) Data warehouse test automation framework
CN112199433A (en) Data management system for city-level data middling station
CN111125068A (en) Metadata management method and system
CN110109908B (en) Analysis system and method for mining potential relationship of person based on social basic information
CN112231333A (en) Ecological environment data sharing and exchanging method and system
CN112651218A (en) Automatic generation method and management method of bidding document, medium and computer
Zhang et al. A data driven approach for discovering data quality requirements
CN113722301A (en) Big data processing method, device and system based on education information and storage medium
CN115617776A (en) Data management system and method
CN116383193A (en) Data management method and device, electronic equipment and storage medium
CN105843605B (en) A kind of data mapping method and device
Serbout et al. From openapi fragments to api pattern primitives and design smells
Bicevskis et al. Data quality evaluation: a comparative analysis of company registers' open data in four European countries.
Talha et al. Towards a powerful solution for data accuracy assessment in the big data context
Goasdoué et al. An Evaluation Framework For Data Quality Tools.
Hinrichs et al. An ISO 9001: 2000 Compliant Quality Management System for Data Integration in Data Warehouse Systems.
CN109636303B (en) Storage method and system for semi-automatically extracting and structuring document information
US20040210834A1 (en) Data management method and system for generating and verifying accurate coding information
CN108897877A (en) Big data analysis tool and method based on EXCEL

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant