CN110119395A - The method that data standard and quality of data association process are realized based on metadata in big data improvement - Google Patents

The method that data standard and quality of data association process are realized based on metadata in big data improvement Download PDF

Info

Publication number
CN110119395A
CN110119395A CN201910446036.5A CN201910446036A CN110119395A CN 110119395 A CN110119395 A CN 110119395A CN 201910446036 A CN201910446036 A CN 201910446036A CN 110119395 A CN110119395 A CN 110119395A
Authority
CN
China
Prior art keywords
data
standard
metadata
quality
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910446036.5A
Other languages
Chinese (zh)
Other versions
CN110119395B (en
Inventor
滑少鹏
王克强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PRIMETON INFORMATION TECHNOLOGY Co Ltd
Original Assignee
PRIMETON INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PRIMETON INFORMATION TECHNOLOGY Co Ltd filed Critical PRIMETON INFORMATION TECHNOLOGY Co Ltd
Priority to CN201910446036.5A priority Critical patent/CN110119395B/en
Publication of CN110119395A publication Critical patent/CN110119395A/en
Application granted granted Critical
Publication of CN110119395B publication Critical patent/CN110119395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the methods for realizing data standard and quality of data association process based on metadata in a kind of improvement of big data, including (1) to acquire metadata;(2) business data standard is imported;(3) metadata is sorted out according to data standard, and is stored using data standard number as critical field;(4) data quality standard is formulated according to data standard;(5) quality rule is write according to data quality standard;(6) metadata is checked according to quality rule.Using the method for realizing data standard and quality of data association process based on metadata in big data improvement of the invention, barrier of the enterprise in data improvement in business demand and technical need is broken, it can be required to provide reform advice according to data standard, it has been truly realized using business as target, using technology as means, realize the complete closed loop in the improvement of enterprise's big data, the quality of data, authority data definition are improved to enterprise, guarantee that effective management of data assets is of great significance, there is good application value.

Description

Big data is based on metadata realization data standard and quality of data association process in administering Method
Technical field
Field is administered the present invention relates to computer software fields more particularly to big data, in particular to a kind of big data is controlled The method that data standard and quality of data association process are realized based on metadata in reason.
Background technique
With the fast development of big data technology, more and more enterprises begin to focus on the data problem of itself, start Data control is carried out using certain means in kernel business system and data schema, enterprise's member is such as managed using metadata system Data, or the quality of data is improved come data the problem of identifying enterprise using quality of data system, or consulting firm is engaged to help to look forward to Industry combs data standard.These means can help enterprise to promote the quality of data to a certain extent, realize the effect that data are administered Fruit, and as IT application in enterprise paces are accelerated, enterprise faces more and more data problems, only goes to manage from a visual angle Data have been unable to satisfy the demand that enterprise administers data.Therefore, this just needs to get through metadata, data standard, the quality of data The dimension barrier of three formulates quality rule by data standard, checks metadata by quality rule, found by metadata Corresponding data standard, allows data problem to have good grounds, there are laws to abide by, to improve the quality of data, authority data definition, guarantees Effective management of data assets constructs the data control system of benign closed loop.
It is as follows that existing big data administers the relevant technologies:
(1) the data genetic connection visualized graphs system in data improvement (application number: 201711383801.0), mentions For the data genetic connection visualized graphs system in a kind of improvement of data, including information node, also comprising with lower module: data Circulate route;Refer to the path of the stream compression;Extract polices node, cleaning rule node, transformation rule node, load rule Then extraction polices node described at least one of node and processing regular node node is for illustrating how data extract;Institute State the screening criteria that cleaning rule node is used to indicate the data during the stream compression;The transformation rule node is used The variation standard of the data during indicating the stream compression;The loading rule node is for illustrating how data are Storage;The processing regular node is for indicating the data filing or destruction.This application is closed by the blood relationship of different levels System, the understandings data that can be will be apparent that migrate circulation, are that the assessment of data value, data provide foundation.
(2) (application number: 201811356788.4) it is provided for a kind of data standard processing method, device and its storage medium A kind of data standard processing method, device and its storage medium, are related to big data processing technology field.At the data standard Reason method includes: to acquire metadata from the service database of storage production source data;N number of data are taken out from the metadata Standard, N number of data standard include at least title, and N is positive integer;M in N number of data standard is selected to constitute Data standard collection, M are the positive integer less than N;Standard set generates check results table based on the data.The data standard processing side Method constitutes data standard collection based on the data standard of metadata, improves the correlation of data standard.
By the technology of the data genetic connection visualized graphs system in the improvement of above-mentioned data, passes through and acquire stream compression Route: it extracts in polices node, cleaning rule node, transformation rule node, loading rule node and processing regular node at least The genetic connection of metadata may be implemented in a kind of mode of node, and understand data migrates circulation, is assessment, the number of data value According to offer foundation.But shortage is associated with data standard, can not establish metadata and the quick of data standard is traced to the source, it is even more impossible to The problem of finding enterprise by metadata data, therefore cannot achieve the benign closed loop of enterprise's big data improvement.
Pass through a kind of above-mentioned data standard processing method, the technology of device and its storage medium: producing source data from storage Service database acquire metadata;N number of data standard is taken out from the metadata, N number of data standard at least wraps Title is included, N is positive integer;M composition data standard collection in N number of data standard is selected, M is just whole less than N Number;Standard set generates check results table based on the data.The source of its data standard is metadata, and metadata is from each The database of operation system, it is no therefore, it is necessary to ensure that each operation system database is built fully according to company standard in advance Once then deviation occurs in the correctness of metadata, the data standard thus extracted will become meaningless, corresponding data matter Amount will also lack authenticity, availability.
Summary of the invention
The purpose of the present invention is overcoming the above-mentioned prior art, provide a kind of quality of data is high, authenticity is high, The method that data standard and quality of data association process are realized based on metadata in the good big data improvement of availability.
To achieve the goals above, data standard and the quality of data are realized based on metadata in big data of the invention improvement The method of association process is as follows:
The method that data standard and quality of data association process are realized based on metadata in big data improvement, it is main special Point be, the method the following steps are included:
(1) metadata is acquired;
(2) business data standard is imported;
(3) metadata is sorted out according to data standard, and is stored using data standard number as critical field;
(4) data quality standard is formulated according to data standard;
(5) quality rule is write according to data quality standard;
(6) metadata is checked according to quality rule.
Preferably, the step (1) specifically includes the following steps:
(1.1) data source configuration is obtained, the database information in metadata adapter scan data source is passed through;
(1.2) data are converted, writes data into metadata system.
Preferably, the database information in the step (1.1) includes the tissue and structure, table name, field of database Name, view, relationship, major key and external key.
Preferably, the step (2) specifically includes the following steps:
(2.1) business data standard is arranged into the identifiable document template of metadata;
(2.2) data standard is directed into metadata system in a manner of metadata acquisition;
(2.3) it is managed data standard as independent metadata.
Preferably, the data standard in the step (3) is applicable in multiple metadata, single metadata only corresponds to single number According to standard.
Preferably, the step (4) specifically includes the following steps:
(4.1) data quality standard is imported into metadata system, is managed as independent metadata.
Preferably, the data standard in the step (4) corresponds to a plurality of data quality standard, single data quality standard Only corresponding single data standard.
Preferably, the data quality standard in the step (5) corresponds to a plurality of quality rule, single quality rule only comes From in single data quality standard.
Preferably, the quality rule in the step (5) includes detection range, detection attribute and detected rule.
Preferably, the step (6) specifically includes the following steps:
(6.1) quality rule is executed, and collects and leads to the problem of data in the process of implementation;
(6.2) corresponding metadata is found according to the field name of data and affiliated table name, obtains the corresponding data of metadata Standard;
(6.3) finish message will be checked and form quality of data report.
Data standard and quality of data association process are realized based on metadata using in big data improvement of the invention Method, due to passing through the association of metadata, data standard, quality of data three, having broken enterprise's business in data improvement is needed Ask with the barrier in technical need, the quality of data is formulated by data standard, the quality of data checks metadata, reached to data Quality manages the purpose dispatched troops with just cause, meanwhile, it can provide the problem corresponding business foundation when finding enterprise's problem data, Furthermore it is also possible to be required to provide reform advice according to data standard, it has been truly realized using business as target, using technology as means, The complete closed loop in the improvement of enterprise's big data is realized, the quality of data is improved to enterprise, authority data defines, guarantee data money The effective management produced is of great significance, and has good application value.
Detailed description of the invention
Fig. 1 is the side for realizing data standard and quality of data association process during big data of the invention is administered based on metadata The flow diagram of method.
Fig. 2 is the side for realizing data standard and quality of data association process during big data of the invention is administered based on metadata Metadata, data standard, quality standard, the relational graph of quality rule of method.
Fig. 3 is the side for realizing data standard and quality of data association process during big data of the invention is administered based on metadata The functional frame composition of each module of data assets platform used of the embodiment of method.
Fig. 4 is the side for realizing data standard and quality of data association process during big data of the invention is administered based on metadata The quality rule of method checks flow chart.
Specific embodiment
It is further to carry out combined with specific embodiments below in order to more clearly describe technology contents of the invention Description.
The method that data standard and quality of data association process are realized based on metadata in big data improvement of the invention, Including following steps:
(1) metadata is acquired;
(1.1) data source configuration is obtained, the database information in metadata adapter scan data source is passed through;
(1.2) data are converted, writes data into metadata system;
(2) business data standard is imported;
(2.1) business data standard is arranged into the identifiable document template of metadata;
(2.2) data standard is directed into metadata system in a manner of metadata acquisition;
(2.3) it is managed data standard as independent metadata;
(3) metadata is sorted out according to data standard, and is stored using data standard number as critical field;
(4) data quality standard is formulated according to data standard;
(4.1) data quality standard is imported into metadata system, is managed as independent metadata;
(5) quality rule is write according to data quality standard;
(6) metadata is checked according to quality rule;
(6.1) quality rule is executed, and collects and leads to the problem of data in the process of implementation;
(6.2) corresponding metadata is found according to the field name of data and affiliated table name, obtains the corresponding data of metadata Standard;
(6.3) finish message will be checked and form quality of data report.
As the preferred embodiment of the present invention, the database information in the step (1.1) includes the group of database It knits and structure, table name, field name, view, relationship, major key and external key.
As the preferred embodiment of the present invention, the data standard in the step (3) is applicable in multiple metadata, individually Metadata only corresponds to single data standard.
As the preferred embodiment of the present invention, the data standard in the step (4) corresponds to a plurality of quality of data mark Standard, single data quality standard only correspond to single data standard.
As the preferred embodiment of the present invention, the data quality standard in the step (5) corresponds to a plurality of quality rule Then, single quality rule only is from single data quality standard.
As the preferred embodiment of the present invention, the quality rule in the step (5) includes detection range, detection category Property and detected rule.
In a specific embodiment of the invention, the present invention is proposed for disadvantage present in above-mentioned background technique by number According to standard and metadata association, quality standard is created according to data standard, quality rule is reconfigured, finally according to quality rule pair The method that metadata is checked, gets through business and technical barrier, using the real demand of enterprise as standard, with metadata be according to Support allows data problem to have good grounds, there are laws to abide by using quality rule as means, so that it is fixed to improve the quality of data, authority data Justice guarantees effective management of data assets, constructs the data control system of benign closed loop.
The invention discloses it is a kind of big data improvement in data standard and the quality of data are got through based on metadata method, It include: system metadata acquisition, metadata and data standard are associated by the importing of business data standard later, and according to Data standard creates quality standard, is reconfigured quality rule, is finally checked according to quality rule to metadata.Utilize this Invention, can quickly identify the quality difference of metadata in Enterprise Informatization system, pass through the pass of data standard and the quality of data Connection has broken barrier of the enterprise in data improvement in business demand and technical need, has allowed data problem to have good grounds, has method can According to improving the quality of data to enterprise, authority data defines, guarantee that effective management of data assets is of great significance, with very Good application value.
The purpose of the present invention is to provide one kind to get through data standard and data matter based on metadata in big data improvement The method of amount can quickly identify the quality difference of metadata in Enterprise Informatization system, pass through data standard and the quality of data Association, break barrier of the enterprise in big data improvement in business demand and technical need, identify and be unsatisfactory for quality standard Metadata, ensure the authentic and valid of business data from source, it is specific to grasp to realize effective improvement to business data assets Steps are as follows for work:
Step 1, metadata acquisition: including obtaining data source configuration, then by metadata adapter scan data source Database information, such as: schema, table name, field name, view, relationship, major key, external key, wherein schema refers to database Tissue and structure, and data are converted, are finally write data into metadata system, can divide on the whole client with Server end, client include adapter, and data source, the configuration of acquisition tasks etc., server end is then responsible really to acquire number According to operations such as, change data, storage landings.Common metadata schema is generally comprised but is not limited to: packet, class, data type three Kind element, packet: being a container, it can be the relevant class of metadata schema and data type according to specific metadata source Grouping, class: defining the type of metadata object, such as type of database, ETL type, and class itself has attribute, has between class There are relationship, including syntagmatic, dependence and inheritance.Data type: being for defined attribute, such as class database " description " attribute, data type is text-type, such metadata system can identify this how to user show This attribute.
Step 2 imports business data standard: business data standard arranged into the identifiable document template of metadata, If data standard is imported into metadata system by Excel, Xml in a manner of metadata acquisition, using data standard as one Kind independent metadata is managed, and data standard template need to be including but not limited to:
1) data standard is numbered
2) standard first-level class
3) standard secondary classification
4) standard Chinese title
5) standard aliases
6) service definition
7) foundation is defined
8) data type
9) value range
10) data length
11) data precision
12) data presentation format
13) authoritative system
14) data standard state
15) filling date.
Step 3, metadata association data standard: which data standard, which is sorted out, is belonged to metadata, with data standard Number is that critical field is stored, and the relationship of data standard and metadata is 1:N, and a data standard can be useful in multiple In metadata, and a metadata can only correspond to a data standard.
Step 4 formulates quality standard according to data standard: according in data standard to the integrality of data, consistency, only The requirement of one property, normalization, timeliness and accuracy carries out the establishment of quality standard, and authorized strength work can be with complete on line or under line At, if completed under line, data quality standard can be imported into metadata system by the way of step 2, it is only as one kind Vertical metadata is managed, and the relationship of data standard and data quality standard is 1:N, and a data standard can correspond to a plurality of Quality standard, and a quality standard can only correspond to a data standard, the construction content of quality standard includes but is not limited to:
1) corresponding data standard number
2) corresponding data title
3) quality standard is numbered
4) quality of data dimension
5) quality of data dimension encodes
6) data quality standard describes
7) references object standard number
8) references object title
9) cause description.
Step 5, according to Writing Quality Standards quality rule: quality of data rule is that the technicalization of data quality standard is real Existing, generally executable SQL statement (database language), the quality of data system that profession also can be used pass through configuration Mode is completed, and the relationship of data quality standard and quality rule is 1:N, and a data quality standard can write a plurality of quality Rule, and a quality rule can only be from a data quality standard.
One quality rule should include at least detection range, detection three pieces of attribute, detected rule contents.
Detection range is definition, safeguards basic scope element involved in data quality checking rule.Detection range Definition can be specific data item, can be SQL statement value, is also possible to be combined using other attributes.Detection range Purpose be to define the detection range of standard criterion, facilitate the definition of base rule to safeguard.Detection range include title, explanation, Value, addition time, addition people etc..Common detection range is such as: registration and certificate granting date, system current date, authorised representative's name Title, ID card No. etc..
Detection attribute is to be managed to require according to the quality of data, defines the quality of data judgment rule on basis.Belonged to by detection Property management, is organically combined with detection range, realizes the flexible definition of detected rule.Detecting attribute includes but is not limited to null value Inspection, codomain inspection, normalized checking, repeated data inspection, record missing inspections, referential integrity inspection, result set comparison, The quality rules such as the inspection of SQL script, outlier inspection, balance check, fluctuation inspection, timeliness inspection, logicality inspection.
Detected rule is to judge data with the presence or absence of abnormal logic rules, and detected rule is based on detection range, detection belongs to Property, belong to a correct side or an incorrect side for defining the result set that detected.
Step 6 checks metadata according to quality rule: executing quality rule, and collects and generate in the process of implementation Problem data, problem data including but not limited to field name, field description, data value, data type, affiliated table name, according to The field name of data and affiliated table name can find corresponding metadata, so that the corresponding data standard of metadata is got, it will Quality of data report is formed after these finish messages, including but not limited to: metadata title, data detail, is asked at data problem rate Target problem rate, suggestion for revision, corresponding data standard title, standard foundation after topic reason, standard value, modification.This report can It is submitted to operation system responsible person or data administers group, provide strong foundation for the improvement of enterprise's big data.
Enterprise has been broken in number by metadata, the association of data standard, quality of data three through above-mentioned six steps According to the barrier in business demand in improvement and technical need, the quality of data is formulated by data standard, the quality of data checks first number According to, achieve the purpose that manage the quality of data and dispatch troops with just cause, meanwhile, the problem can be provided when finding enterprise's problem data Corresponding business foundation has been truly realized furthermore it is also possible to be required to provide reform advice according to data standard using business as mesh Mark realizes the complete closed loop in the improvement of enterprise's big data using technology as means, improves the quality of data, specification number to enterprise According to definition, guarantees that effective management of data assets is of great significance, there is good application value.
Using the method for getting through data standard and the quality of data based on metadata in big data of the present invention improvement, due to logical Enterprise business demand and technical need in data improvement have been broken in the association for crossing metadata, data standard, quality of data three On barrier, the quality of data is formulated by data standard, the quality of data checks metadata, has reached and has gone out to quality of data control teacher Famous purpose, meanwhile, it can provide the problem corresponding business foundation when finding enterprise's problem data, furthermore it is also possible to It is required to provide reform advice according to data standard, has been truly realized using business as target, using technology as means, has realized in enterprise Complete closed loop in big data improvement improves the quality of data, authority data definition to enterprise, guarantees effective management of data assets It is of great significance, there is good application value.
It is specifically described in conjunction with attached drawing 1 to embodiment of the attached drawing 4 to technical solution of the present invention:
The present invention provides it is a kind of big data improvement in data standard and the quality of data are got through based on metadata method, Specific implementation step of the invention please refers to attached drawing 1, and attached drawing 3 is data assets platform feature framework used in the present embodiment:
Step 1, metadata acquisition: in specific implementation, we can complete this step operation by metadata acquisition module, first First, each operation system data source is acquired, table name, field name, view, relationship, major key, the external key etc. in business library are collected into member In data system, as the meta-data preservation of type of database, secondly, doing data exchange between acquisition enterprise Zhong Ku and library ETL process, as the meta-data preservation of ETL type, such as: PowerCenter, storing process, kettle, DataStage, SQL Server Integration Services, SQL Server Analysis Services, perl script etc., finally, Source library during ETL is mounted in corresponding database metadata with object library, forms the blood relationship map of metadata.
Step 2 imports business data standard: in specific implementation, by enterprise or the data standard achievement of consulting firm's combing It is organized into Excel format according to metadata acquisition template, by metadata Excel collector, data standard is collected into first number According in library, it is managed as a kind of independent metadata.Following table illustrates the data mark of certain enterprise defining in implementation process Quasi-mode version:
Step 3, metadata association data standard: it in specific implementation, provides at metadata management interface to data standard Correlation function, according to attributes such as criteria classification, authoritative systems in metadata title, said system and data standard, system from It is dynamic that the metadata is recommended to correspond to the maximum data standard of possibility, while the mode that also other data standards are searched in offer is closed Connection, the metadata ways of presentation for being associated with completion have certain variation, not yet do associated metadata for distinguishing, with all first numbers According to all associated data standard is to terminate.
Step 4 formulates quality standard according to data standard: in specific implementation, a data standard can be to data Go out a plurality of quality standard derived from integrality, consistency, uniqueness, normalization, timeliness and accuracy requirement, by quality standard Slave table as data standard saves.After working out, then by metadata parsing, quality standard is collected into metadatabase In, it is managed as a kind of independent metadata.Following table is the data quality standard that certain enterprise formulates in implementation process:
Step 5, according to Writing Quality Standards quality rule: in specific implementation, in data quality management module according to quality Quality rule is write in the requirement of standard, and quality rule should include at least detection range, detection three pieces of attribute, detected rule contents, For example, the integrity quality standard of personal information is that employee's coding, employee ID, department cannot be for null values, in detection range Specific database user name, table name, field name are configured, addition null value in attribute is detected and checks rule, configured in detected rule Value is handled to be empty as problem data, and the following are the part SQL statements run when quality rule execution:
(1) null value checks total SQL:
SELECT COUNT (*) AS COUNT FROM TEST.EMP_TABLE WHERE 1=1;
(2) null value checks problem number SQL:
SELECT COUNT (*) AS COUNT FROM TEST.EMP_TABLE WHERE 1=1AND (TEST.EMP_ TABLE.EMPCODE IS NULL OR TEST.EMP_TABLE.EMPNAME IS NULL OR TEST.EMP_ TABLE.ORGID IS NULL);
Step 6 checks metadata according to quality rule: in specific implementation, quality rule is added in execution task, if It sets and executes the period, such as 22:00 executes the task every night, and system will record the implementing result of this rule after execution, such as problem number, Sum executes time etc., and collects and lead to the problem of data in the process of implementation, and problem data includes field name, field Description, data value, data type, affiliated table name can find corresponding metadata according to the field name of data and affiliated table name, To get the corresponding data standard of metadata, quality of data report will be formed after these finish messages, this report can be submitted Group is administered to operation system responsible person or data.
Following table is the quality of data report of certain enterprise in an implementation:
Enterprise has been broken in number by metadata, the association of data standard, quality of data three through above-mentioned six steps According to the barrier in business demand in improvement and technical need, the quality of data is formulated by data standard, the quality of data checks first number According to, achieve the purpose that manage the quality of data and dispatch troops with just cause, meanwhile, the problem can be provided when finding enterprise's problem data Corresponding business foundation has been truly realized furthermore it is also possible to be required to provide reform advice according to data standard using business as mesh Mark realizes the complete closed loop in the improvement of enterprise's big data using technology as means, improves the quality of data, specification number to enterprise According to definition, guarantees that effective management of data assets is of great significance, there is good application value.
Data standard and quality of data association process are realized based on metadata using in big data improvement of the invention Method, due to passing through the association of metadata, data standard, quality of data three, having broken enterprise's business in data improvement is needed Ask with the barrier in technical need, the quality of data is formulated by data standard, the quality of data checks metadata, reached to data Quality manages the purpose dispatched troops with just cause, meanwhile, it can provide the problem corresponding business foundation when finding enterprise's problem data, Furthermore it is also possible to be required to provide reform advice according to data standard, it has been truly realized using business as target, using technology as means, The complete closed loop in the improvement of enterprise's big data is realized, the quality of data is improved to enterprise, authority data defines, guarantee data money The effective management produced is of great significance, and has good application value.
In this description, the present invention is described with reference to its specific embodiment.But it is clear that can still make Various modifications and alterations are without departing from the spirit and scope of the invention.Therefore, the description and the appended drawings should be considered as illustrative And not restrictive.

Claims (10)

1. a kind of method for realizing data standard and quality of data association process based on metadata in big data improvement, feature exist In, the method the following steps are included:
(1) metadata is acquired;
(2) business data standard is imported;
(3) metadata is sorted out according to data standard, and is stored using data standard number as critical field;
(4) data quality standard is formulated according to data standard;
(5) quality rule is write according to data quality standard;
(6) metadata is checked according to quality rule.
2. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering Method, which is characterized in that the step (1) specifically includes the following steps:
(1.1) data source configuration is obtained, the database information in metadata adapter scan data source is passed through;
(1.2) data are converted, writes data into metadata system.
3. big data according to claim 2 is based on metadata realization data standard and quality of data association process in administering Method, which is characterized in that the database information in the step (1.1) includes the tissue and structure, table name, word of database Section name, view, relationship, major key and external key.
4. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering Method, which is characterized in that the step (2) specifically includes the following steps:
(2.1) business data standard is arranged into the identifiable document template of metadata;
(2.2) data standard is directed into metadata system in a manner of metadata acquisition;
(2.3) it is managed data standard as independent metadata.
5. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering Method, which is characterized in that the data standard in the step (3) is applicable in multiple metadata, and single metadata is only corresponding single Data standard.
6. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering Method, which is characterized in that the step (4) specifically includes the following steps:
(4.1) data quality standard is imported into metadata system, is managed as independent metadata.
7. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering Method, which is characterized in that the data standard in the step (4) corresponds to a plurality of data quality standard, the single quality of data Standard only corresponds to single data standard.
8. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering Method, which is characterized in that the data quality standard in the step (5) corresponds to a plurality of quality rule, single quality rule It only is from single data quality standard.
9. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering Method, which is characterized in that the quality rule in the step (5) includes detection range, detection attribute and detected rule.
10. realizing that data standard is associated with place with the quality of data based on metadata in big data improvement according to claim 1 The method of reason, which is characterized in that the step (6) specifically includes the following steps:
(6.1) quality rule is executed, and collects and leads to the problem of data in the process of implementation;
(6.2) corresponding metadata is found according to the field name of data and affiliated table name, obtains the corresponding data standard of metadata;
(6.3) finish message will be checked and form quality of data report.
CN201910446036.5A 2019-05-27 2019-05-27 Method for realizing association processing of data standard and data quality based on metadata in big data management Active CN110119395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910446036.5A CN110119395B (en) 2019-05-27 2019-05-27 Method for realizing association processing of data standard and data quality based on metadata in big data management

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910446036.5A CN110119395B (en) 2019-05-27 2019-05-27 Method for realizing association processing of data standard and data quality based on metadata in big data management

Publications (2)

Publication Number Publication Date
CN110119395A true CN110119395A (en) 2019-08-13
CN110119395B CN110119395B (en) 2023-09-15

Family

ID=67523306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910446036.5A Active CN110119395B (en) 2019-05-27 2019-05-27 Method for realizing association processing of data standard and data quality based on metadata in big data management

Country Status (1)

Country Link
CN (1) CN110119395B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078780A (en) * 2019-12-23 2020-04-28 北京中创信测科技股份有限公司 AI optimization data management method
CN111125075A (en) * 2019-12-17 2020-05-08 国网天津市电力公司电力科学研究院 Data management method and system for non-computable region
CN111177134A (en) * 2019-12-26 2020-05-19 上海科技发展有限公司 Data quality analysis method, device, terminal and medium suitable for mass data
CN112131264A (en) * 2020-09-15 2020-12-25 杭州城市大数据运营有限公司 Method, device and system for recommending different source difference information
CN112800046A (en) * 2021-02-26 2021-05-14 上海帕科信息科技有限公司 Artificial intelligence platform applied to field data management
CN112905329A (en) * 2021-03-24 2021-06-04 武汉众邦银行股份有限公司 Full life cycle management and control method for improving standard falling rate of data
CN113918774A (en) * 2021-10-28 2022-01-11 中国平安财产保险股份有限公司 Data management method, device, equipment and storage medium
WO2022135973A1 (en) * 2020-12-22 2022-06-30 Collibra Nv Bespoke transformation and quality assessment for term definition

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170293641A1 (en) * 2016-04-06 2017-10-12 International Business Machines Corporation Data warehouse model validation
CN107748775A (en) * 2017-10-17 2018-03-02 上海计算机软件技术开发中心 A kind of data governing system based on the quality of data
US20180113898A1 (en) * 2016-10-25 2018-04-26 Mastercard International Incorporated Systems and methods for assessing data quality
US20180137151A1 (en) * 2016-11-11 2018-05-17 International Business Machines Corporation Computing the need for standardization of a set of values
CN108717456A (en) * 2018-05-22 2018-10-30 浪潮软件股份有限公司 A kind of data lifecycle management platform that data source is unrelated and method
CN109034532A (en) * 2018-06-20 2018-12-18 江苏网域科技有限公司 A kind of data managing and control system based on big data
CN109272155A (en) * 2018-09-11 2019-01-25 郑州向心力通信技术股份有限公司 A kind of corporate behavior analysis system based on big data
CN109344133A (en) * 2018-08-27 2019-02-15 成都四方伟业软件股份有限公司 A kind of data administer driving data and share exchange system and its working method
CN109408502A (en) * 2018-11-14 2019-03-01 成都四方伟业软件股份有限公司 A kind of data standard processing method, device and its storage medium
CN109523423A (en) * 2018-11-28 2019-03-26 中国海洋石油集团有限公司 A kind of application system generation method, device, equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170293641A1 (en) * 2016-04-06 2017-10-12 International Business Machines Corporation Data warehouse model validation
US20180113898A1 (en) * 2016-10-25 2018-04-26 Mastercard International Incorporated Systems and methods for assessing data quality
US20180137151A1 (en) * 2016-11-11 2018-05-17 International Business Machines Corporation Computing the need for standardization of a set of values
CN107748775A (en) * 2017-10-17 2018-03-02 上海计算机软件技术开发中心 A kind of data governing system based on the quality of data
CN108717456A (en) * 2018-05-22 2018-10-30 浪潮软件股份有限公司 A kind of data lifecycle management platform that data source is unrelated and method
CN109034532A (en) * 2018-06-20 2018-12-18 江苏网域科技有限公司 A kind of data managing and control system based on big data
CN109344133A (en) * 2018-08-27 2019-02-15 成都四方伟业软件股份有限公司 A kind of data administer driving data and share exchange system and its working method
CN109272155A (en) * 2018-09-11 2019-01-25 郑州向心力通信技术股份有限公司 A kind of corporate behavior analysis system based on big data
CN109408502A (en) * 2018-11-14 2019-03-01 成都四方伟业软件股份有限公司 A kind of data standard processing method, device and its storage medium
CN109523423A (en) * 2018-11-28 2019-03-26 中国海洋石油集团有限公司 A kind of application system generation method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李晶晶等: "数据质量评估管理工具的设计与实现", 《信息技术与标准化》, pages 61 - 65 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125075A (en) * 2019-12-17 2020-05-08 国网天津市电力公司电力科学研究院 Data management method and system for non-computable region
CN111078780A (en) * 2019-12-23 2020-04-28 北京中创信测科技股份有限公司 AI optimization data management method
CN111177134A (en) * 2019-12-26 2020-05-19 上海科技发展有限公司 Data quality analysis method, device, terminal and medium suitable for mass data
CN112131264A (en) * 2020-09-15 2020-12-25 杭州城市大数据运营有限公司 Method, device and system for recommending different source difference information
WO2022135973A1 (en) * 2020-12-22 2022-06-30 Collibra Nv Bespoke transformation and quality assessment for term definition
US11669682B2 (en) 2020-12-22 2023-06-06 Collibra Belgium Bv Bespoke transformation and quality assessment for term definition
US11966696B2 (en) 2020-12-22 2024-04-23 Collibra Belgium Bv Bespoke transformation and quality assessment for term definition
CN112800046A (en) * 2021-02-26 2021-05-14 上海帕科信息科技有限公司 Artificial intelligence platform applied to field data management
CN112905329A (en) * 2021-03-24 2021-06-04 武汉众邦银行股份有限公司 Full life cycle management and control method for improving standard falling rate of data
CN113918774A (en) * 2021-10-28 2022-01-11 中国平安财产保险股份有限公司 Data management method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN110119395B (en) 2023-09-15

Similar Documents

Publication Publication Date Title
CN110119395A (en) The method that data standard and quality of data association process are realized based on metadata in big data improvement
Wang et al. Data quality
Milo et al. Next-step suggestions for modern interactive data analysis platforms
Lenz et al. Summarizability in OLAP and statistical data bases
Price et al. A semiotic information quality framework: development and comparative analysis
US7328428B2 (en) System and method for generating data validation rules
US10430413B2 (en) Data information framework
CN101308486A (en) Test question automatic generation system and method
Neal Validity in world city network measurements
CN106528828A (en) Multi-dimensional checking rule-based data quality detection method
CN103262076A (en) Analytical data processing
CN107533554A (en) Document verification system
Ferrara et al. Evaluation of instance matching tools: The experience of OAEI
Sneed et al. Testing big data (Assuring the quality of large databases)
Ballou et al. Sample-based quality estimation of query results in relational database environments
CN113450928A (en) Drug test data control method and system
Bicevskis et al. Data quality evaluation: a comparative analysis of company registers' open data in four European countries.
CN108268462A (en) A kind of data quality checking system of relation integraity
CN107680690B (en) Clinical information system based on metadata
Azeroual et al. Putting FAIR principles in the context of research information: FAIRness for CRIS and CRIS for FAIRness
KR101178998B1 (en) Method and System for Certificating Data
CN112966901B (en) Lineage data quality analysis and verification method for inspection business collaborative flow
Zhou et al. Big data validity evaluation based on MMTD
KR20140054913A (en) Apparatus and method for processing data error for distributed system
Presser et al. A scope classification of data quality requirements for food composition data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant