CN110119395A - The method that data standard and quality of data association process are realized based on metadata in big data improvement - Google Patents
The method that data standard and quality of data association process are realized based on metadata in big data improvement Download PDFInfo
- Publication number
- CN110119395A CN110119395A CN201910446036.5A CN201910446036A CN110119395A CN 110119395 A CN110119395 A CN 110119395A CN 201910446036 A CN201910446036 A CN 201910446036A CN 110119395 A CN110119395 A CN 110119395A
- Authority
- CN
- China
- Prior art keywords
- data
- standard
- metadata
- quality
- rule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to the methods for realizing data standard and quality of data association process based on metadata in a kind of improvement of big data, including (1) to acquire metadata;(2) business data standard is imported;(3) metadata is sorted out according to data standard, and is stored using data standard number as critical field;(4) data quality standard is formulated according to data standard;(5) quality rule is write according to data quality standard;(6) metadata is checked according to quality rule.Using the method for realizing data standard and quality of data association process based on metadata in big data improvement of the invention, barrier of the enterprise in data improvement in business demand and technical need is broken, it can be required to provide reform advice according to data standard, it has been truly realized using business as target, using technology as means, realize the complete closed loop in the improvement of enterprise's big data, the quality of data, authority data definition are improved to enterprise, guarantee that effective management of data assets is of great significance, there is good application value.
Description
Technical field
Field is administered the present invention relates to computer software fields more particularly to big data, in particular to a kind of big data is controlled
The method that data standard and quality of data association process are realized based on metadata in reason.
Background technique
With the fast development of big data technology, more and more enterprises begin to focus on the data problem of itself, start
Data control is carried out using certain means in kernel business system and data schema, enterprise's member is such as managed using metadata system
Data, or the quality of data is improved come data the problem of identifying enterprise using quality of data system, or consulting firm is engaged to help to look forward to
Industry combs data standard.These means can help enterprise to promote the quality of data to a certain extent, realize the effect that data are administered
Fruit, and as IT application in enterprise paces are accelerated, enterprise faces more and more data problems, only goes to manage from a visual angle
Data have been unable to satisfy the demand that enterprise administers data.Therefore, this just needs to get through metadata, data standard, the quality of data
The dimension barrier of three formulates quality rule by data standard, checks metadata by quality rule, found by metadata
Corresponding data standard, allows data problem to have good grounds, there are laws to abide by, to improve the quality of data, authority data definition, guarantees
Effective management of data assets constructs the data control system of benign closed loop.
It is as follows that existing big data administers the relevant technologies:
(1) the data genetic connection visualized graphs system in data improvement (application number: 201711383801.0), mentions
For the data genetic connection visualized graphs system in a kind of improvement of data, including information node, also comprising with lower module: data
Circulate route;Refer to the path of the stream compression;Extract polices node, cleaning rule node, transformation rule node, load rule
Then extraction polices node described at least one of node and processing regular node node is for illustrating how data extract;Institute
State the screening criteria that cleaning rule node is used to indicate the data during the stream compression;The transformation rule node is used
The variation standard of the data during indicating the stream compression;The loading rule node is for illustrating how data are
Storage;The processing regular node is for indicating the data filing or destruction.This application is closed by the blood relationship of different levels
System, the understandings data that can be will be apparent that migrate circulation, are that the assessment of data value, data provide foundation.
(2) (application number: 201811356788.4) it is provided for a kind of data standard processing method, device and its storage medium
A kind of data standard processing method, device and its storage medium, are related to big data processing technology field.At the data standard
Reason method includes: to acquire metadata from the service database of storage production source data;N number of data are taken out from the metadata
Standard, N number of data standard include at least title, and N is positive integer;M in N number of data standard is selected to constitute
Data standard collection, M are the positive integer less than N;Standard set generates check results table based on the data.The data standard processing side
Method constitutes data standard collection based on the data standard of metadata, improves the correlation of data standard.
By the technology of the data genetic connection visualized graphs system in the improvement of above-mentioned data, passes through and acquire stream compression
Route: it extracts in polices node, cleaning rule node, transformation rule node, loading rule node and processing regular node at least
The genetic connection of metadata may be implemented in a kind of mode of node, and understand data migrates circulation, is assessment, the number of data value
According to offer foundation.But shortage is associated with data standard, can not establish metadata and the quick of data standard is traced to the source, it is even more impossible to
The problem of finding enterprise by metadata data, therefore cannot achieve the benign closed loop of enterprise's big data improvement.
Pass through a kind of above-mentioned data standard processing method, the technology of device and its storage medium: producing source data from storage
Service database acquire metadata;N number of data standard is taken out from the metadata, N number of data standard at least wraps
Title is included, N is positive integer;M composition data standard collection in N number of data standard is selected, M is just whole less than N
Number;Standard set generates check results table based on the data.The source of its data standard is metadata, and metadata is from each
The database of operation system, it is no therefore, it is necessary to ensure that each operation system database is built fully according to company standard in advance
Once then deviation occurs in the correctness of metadata, the data standard thus extracted will become meaningless, corresponding data matter
Amount will also lack authenticity, availability.
Summary of the invention
The purpose of the present invention is overcoming the above-mentioned prior art, provide a kind of quality of data is high, authenticity is high,
The method that data standard and quality of data association process are realized based on metadata in the good big data improvement of availability.
To achieve the goals above, data standard and the quality of data are realized based on metadata in big data of the invention improvement
The method of association process is as follows:
The method that data standard and quality of data association process are realized based on metadata in big data improvement, it is main special
Point be, the method the following steps are included:
(1) metadata is acquired;
(2) business data standard is imported;
(3) metadata is sorted out according to data standard, and is stored using data standard number as critical field;
(4) data quality standard is formulated according to data standard;
(5) quality rule is write according to data quality standard;
(6) metadata is checked according to quality rule.
Preferably, the step (1) specifically includes the following steps:
(1.1) data source configuration is obtained, the database information in metadata adapter scan data source is passed through;
(1.2) data are converted, writes data into metadata system.
Preferably, the database information in the step (1.1) includes the tissue and structure, table name, field of database
Name, view, relationship, major key and external key.
Preferably, the step (2) specifically includes the following steps:
(2.1) business data standard is arranged into the identifiable document template of metadata;
(2.2) data standard is directed into metadata system in a manner of metadata acquisition;
(2.3) it is managed data standard as independent metadata.
Preferably, the data standard in the step (3) is applicable in multiple metadata, single metadata only corresponds to single number
According to standard.
Preferably, the step (4) specifically includes the following steps:
(4.1) data quality standard is imported into metadata system, is managed as independent metadata.
Preferably, the data standard in the step (4) corresponds to a plurality of data quality standard, single data quality standard
Only corresponding single data standard.
Preferably, the data quality standard in the step (5) corresponds to a plurality of quality rule, single quality rule only comes
From in single data quality standard.
Preferably, the quality rule in the step (5) includes detection range, detection attribute and detected rule.
Preferably, the step (6) specifically includes the following steps:
(6.1) quality rule is executed, and collects and leads to the problem of data in the process of implementation;
(6.2) corresponding metadata is found according to the field name of data and affiliated table name, obtains the corresponding data of metadata
Standard;
(6.3) finish message will be checked and form quality of data report.
Data standard and quality of data association process are realized based on metadata using in big data improvement of the invention
Method, due to passing through the association of metadata, data standard, quality of data three, having broken enterprise's business in data improvement is needed
Ask with the barrier in technical need, the quality of data is formulated by data standard, the quality of data checks metadata, reached to data
Quality manages the purpose dispatched troops with just cause, meanwhile, it can provide the problem corresponding business foundation when finding enterprise's problem data,
Furthermore it is also possible to be required to provide reform advice according to data standard, it has been truly realized using business as target, using technology as means,
The complete closed loop in the improvement of enterprise's big data is realized, the quality of data is improved to enterprise, authority data defines, guarantee data money
The effective management produced is of great significance, and has good application value.
Detailed description of the invention
Fig. 1 is the side for realizing data standard and quality of data association process during big data of the invention is administered based on metadata
The flow diagram of method.
Fig. 2 is the side for realizing data standard and quality of data association process during big data of the invention is administered based on metadata
Metadata, data standard, quality standard, the relational graph of quality rule of method.
Fig. 3 is the side for realizing data standard and quality of data association process during big data of the invention is administered based on metadata
The functional frame composition of each module of data assets platform used of the embodiment of method.
Fig. 4 is the side for realizing data standard and quality of data association process during big data of the invention is administered based on metadata
The quality rule of method checks flow chart.
Specific embodiment
It is further to carry out combined with specific embodiments below in order to more clearly describe technology contents of the invention
Description.
The method that data standard and quality of data association process are realized based on metadata in big data improvement of the invention,
Including following steps:
(1) metadata is acquired;
(1.1) data source configuration is obtained, the database information in metadata adapter scan data source is passed through;
(1.2) data are converted, writes data into metadata system;
(2) business data standard is imported;
(2.1) business data standard is arranged into the identifiable document template of metadata;
(2.2) data standard is directed into metadata system in a manner of metadata acquisition;
(2.3) it is managed data standard as independent metadata;
(3) metadata is sorted out according to data standard, and is stored using data standard number as critical field;
(4) data quality standard is formulated according to data standard;
(4.1) data quality standard is imported into metadata system, is managed as independent metadata;
(5) quality rule is write according to data quality standard;
(6) metadata is checked according to quality rule;
(6.1) quality rule is executed, and collects and leads to the problem of data in the process of implementation;
(6.2) corresponding metadata is found according to the field name of data and affiliated table name, obtains the corresponding data of metadata
Standard;
(6.3) finish message will be checked and form quality of data report.
As the preferred embodiment of the present invention, the database information in the step (1.1) includes the group of database
It knits and structure, table name, field name, view, relationship, major key and external key.
As the preferred embodiment of the present invention, the data standard in the step (3) is applicable in multiple metadata, individually
Metadata only corresponds to single data standard.
As the preferred embodiment of the present invention, the data standard in the step (4) corresponds to a plurality of quality of data mark
Standard, single data quality standard only correspond to single data standard.
As the preferred embodiment of the present invention, the data quality standard in the step (5) corresponds to a plurality of quality rule
Then, single quality rule only is from single data quality standard.
As the preferred embodiment of the present invention, the quality rule in the step (5) includes detection range, detection category
Property and detected rule.
In a specific embodiment of the invention, the present invention is proposed for disadvantage present in above-mentioned background technique by number
According to standard and metadata association, quality standard is created according to data standard, quality rule is reconfigured, finally according to quality rule pair
The method that metadata is checked, gets through business and technical barrier, using the real demand of enterprise as standard, with metadata be according to
Support allows data problem to have good grounds, there are laws to abide by using quality rule as means, so that it is fixed to improve the quality of data, authority data
Justice guarantees effective management of data assets, constructs the data control system of benign closed loop.
The invention discloses it is a kind of big data improvement in data standard and the quality of data are got through based on metadata method,
It include: system metadata acquisition, metadata and data standard are associated by the importing of business data standard later, and according to
Data standard creates quality standard, is reconfigured quality rule, is finally checked according to quality rule to metadata.Utilize this
Invention, can quickly identify the quality difference of metadata in Enterprise Informatization system, pass through the pass of data standard and the quality of data
Connection has broken barrier of the enterprise in data improvement in business demand and technical need, has allowed data problem to have good grounds, has method can
According to improving the quality of data to enterprise, authority data defines, guarantee that effective management of data assets is of great significance, with very
Good application value.
The purpose of the present invention is to provide one kind to get through data standard and data matter based on metadata in big data improvement
The method of amount can quickly identify the quality difference of metadata in Enterprise Informatization system, pass through data standard and the quality of data
Association, break barrier of the enterprise in big data improvement in business demand and technical need, identify and be unsatisfactory for quality standard
Metadata, ensure the authentic and valid of business data from source, it is specific to grasp to realize effective improvement to business data assets
Steps are as follows for work:
Step 1, metadata acquisition: including obtaining data source configuration, then by metadata adapter scan data source
Database information, such as: schema, table name, field name, view, relationship, major key, external key, wherein schema refers to database
Tissue and structure, and data are converted, are finally write data into metadata system, can divide on the whole client with
Server end, client include adapter, and data source, the configuration of acquisition tasks etc., server end is then responsible really to acquire number
According to operations such as, change data, storage landings.Common metadata schema is generally comprised but is not limited to: packet, class, data type three
Kind element, packet: being a container, it can be the relevant class of metadata schema and data type according to specific metadata source
Grouping, class: defining the type of metadata object, such as type of database, ETL type, and class itself has attribute, has between class
There are relationship, including syntagmatic, dependence and inheritance.Data type: being for defined attribute, such as class database
" description " attribute, data type is text-type, such metadata system can identify this how to user show
This attribute.
Step 2 imports business data standard: business data standard arranged into the identifiable document template of metadata,
If data standard is imported into metadata system by Excel, Xml in a manner of metadata acquisition, using data standard as one
Kind independent metadata is managed, and data standard template need to be including but not limited to:
1) data standard is numbered
2) standard first-level class
3) standard secondary classification
4) standard Chinese title
5) standard aliases
6) service definition
7) foundation is defined
8) data type
9) value range
10) data length
11) data precision
12) data presentation format
13) authoritative system
14) data standard state
15) filling date.
Step 3, metadata association data standard: which data standard, which is sorted out, is belonged to metadata, with data standard
Number is that critical field is stored, and the relationship of data standard and metadata is 1:N, and a data standard can be useful in multiple
In metadata, and a metadata can only correspond to a data standard.
Step 4 formulates quality standard according to data standard: according in data standard to the integrality of data, consistency, only
The requirement of one property, normalization, timeliness and accuracy carries out the establishment of quality standard, and authorized strength work can be with complete on line or under line
At, if completed under line, data quality standard can be imported into metadata system by the way of step 2, it is only as one kind
Vertical metadata is managed, and the relationship of data standard and data quality standard is 1:N, and a data standard can correspond to a plurality of
Quality standard, and a quality standard can only correspond to a data standard, the construction content of quality standard includes but is not limited to:
1) corresponding data standard number
2) corresponding data title
3) quality standard is numbered
4) quality of data dimension
5) quality of data dimension encodes
6) data quality standard describes
7) references object standard number
8) references object title
9) cause description.
Step 5, according to Writing Quality Standards quality rule: quality of data rule is that the technicalization of data quality standard is real
Existing, generally executable SQL statement (database language), the quality of data system that profession also can be used pass through configuration
Mode is completed, and the relationship of data quality standard and quality rule is 1:N, and a data quality standard can write a plurality of quality
Rule, and a quality rule can only be from a data quality standard.
One quality rule should include at least detection range, detection three pieces of attribute, detected rule contents.
Detection range is definition, safeguards basic scope element involved in data quality checking rule.Detection range
Definition can be specific data item, can be SQL statement value, is also possible to be combined using other attributes.Detection range
Purpose be to define the detection range of standard criterion, facilitate the definition of base rule to safeguard.Detection range include title, explanation,
Value, addition time, addition people etc..Common detection range is such as: registration and certificate granting date, system current date, authorised representative's name
Title, ID card No. etc..
Detection attribute is to be managed to require according to the quality of data, defines the quality of data judgment rule on basis.Belonged to by detection
Property management, is organically combined with detection range, realizes the flexible definition of detected rule.Detecting attribute includes but is not limited to null value
Inspection, codomain inspection, normalized checking, repeated data inspection, record missing inspections, referential integrity inspection, result set comparison,
The quality rules such as the inspection of SQL script, outlier inspection, balance check, fluctuation inspection, timeliness inspection, logicality inspection.
Detected rule is to judge data with the presence or absence of abnormal logic rules, and detected rule is based on detection range, detection belongs to
Property, belong to a correct side or an incorrect side for defining the result set that detected.
Step 6 checks metadata according to quality rule: executing quality rule, and collects and generate in the process of implementation
Problem data, problem data including but not limited to field name, field description, data value, data type, affiliated table name, according to
The field name of data and affiliated table name can find corresponding metadata, so that the corresponding data standard of metadata is got, it will
Quality of data report is formed after these finish messages, including but not limited to: metadata title, data detail, is asked at data problem rate
Target problem rate, suggestion for revision, corresponding data standard title, standard foundation after topic reason, standard value, modification.This report can
It is submitted to operation system responsible person or data administers group, provide strong foundation for the improvement of enterprise's big data.
Enterprise has been broken in number by metadata, the association of data standard, quality of data three through above-mentioned six steps
According to the barrier in business demand in improvement and technical need, the quality of data is formulated by data standard, the quality of data checks first number
According to, achieve the purpose that manage the quality of data and dispatch troops with just cause, meanwhile, the problem can be provided when finding enterprise's problem data
Corresponding business foundation has been truly realized furthermore it is also possible to be required to provide reform advice according to data standard using business as mesh
Mark realizes the complete closed loop in the improvement of enterprise's big data using technology as means, improves the quality of data, specification number to enterprise
According to definition, guarantees that effective management of data assets is of great significance, there is good application value.
Using the method for getting through data standard and the quality of data based on metadata in big data of the present invention improvement, due to logical
Enterprise business demand and technical need in data improvement have been broken in the association for crossing metadata, data standard, quality of data three
On barrier, the quality of data is formulated by data standard, the quality of data checks metadata, has reached and has gone out to quality of data control teacher
Famous purpose, meanwhile, it can provide the problem corresponding business foundation when finding enterprise's problem data, furthermore it is also possible to
It is required to provide reform advice according to data standard, has been truly realized using business as target, using technology as means, has realized in enterprise
Complete closed loop in big data improvement improves the quality of data, authority data definition to enterprise, guarantees effective management of data assets
It is of great significance, there is good application value.
It is specifically described in conjunction with attached drawing 1 to embodiment of the attached drawing 4 to technical solution of the present invention:
The present invention provides it is a kind of big data improvement in data standard and the quality of data are got through based on metadata method,
Specific implementation step of the invention please refers to attached drawing 1, and attached drawing 3 is data assets platform feature framework used in the present embodiment:
Step 1, metadata acquisition: in specific implementation, we can complete this step operation by metadata acquisition module, first
First, each operation system data source is acquired, table name, field name, view, relationship, major key, the external key etc. in business library are collected into member
In data system, as the meta-data preservation of type of database, secondly, doing data exchange between acquisition enterprise Zhong Ku and library
ETL process, as the meta-data preservation of ETL type, such as: PowerCenter, storing process, kettle, DataStage,
SQL Server Integration Services, SQL Server Analysis Services, perl script etc., finally,
Source library during ETL is mounted in corresponding database metadata with object library, forms the blood relationship map of metadata.
Step 2 imports business data standard: in specific implementation, by enterprise or the data standard achievement of consulting firm's combing
It is organized into Excel format according to metadata acquisition template, by metadata Excel collector, data standard is collected into first number
According in library, it is managed as a kind of independent metadata.Following table illustrates the data mark of certain enterprise defining in implementation process
Quasi-mode version:
Step 3, metadata association data standard: it in specific implementation, provides at metadata management interface to data standard
Correlation function, according to attributes such as criteria classification, authoritative systems in metadata title, said system and data standard, system from
It is dynamic that the metadata is recommended to correspond to the maximum data standard of possibility, while the mode that also other data standards are searched in offer is closed
Connection, the metadata ways of presentation for being associated with completion have certain variation, not yet do associated metadata for distinguishing, with all first numbers
According to all associated data standard is to terminate.
Step 4 formulates quality standard according to data standard: in specific implementation, a data standard can be to data
Go out a plurality of quality standard derived from integrality, consistency, uniqueness, normalization, timeliness and accuracy requirement, by quality standard
Slave table as data standard saves.After working out, then by metadata parsing, quality standard is collected into metadatabase
In, it is managed as a kind of independent metadata.Following table is the data quality standard that certain enterprise formulates in implementation process:
Step 5, according to Writing Quality Standards quality rule: in specific implementation, in data quality management module according to quality
Quality rule is write in the requirement of standard, and quality rule should include at least detection range, detection three pieces of attribute, detected rule contents,
For example, the integrity quality standard of personal information is that employee's coding, employee ID, department cannot be for null values, in detection range
Specific database user name, table name, field name are configured, addition null value in attribute is detected and checks rule, configured in detected rule
Value is handled to be empty as problem data, and the following are the part SQL statements run when quality rule execution:
(1) null value checks total SQL:
SELECT COUNT (*) AS COUNT FROM TEST.EMP_TABLE WHERE 1=1;
(2) null value checks problem number SQL:
SELECT COUNT (*) AS COUNT FROM TEST.EMP_TABLE WHERE 1=1AND (TEST.EMP_
TABLE.EMPCODE IS NULL OR TEST.EMP_TABLE.EMPNAME IS NULL OR TEST.EMP_
TABLE.ORGID IS NULL);
Step 6 checks metadata according to quality rule: in specific implementation, quality rule is added in execution task, if
It sets and executes the period, such as 22:00 executes the task every night, and system will record the implementing result of this rule after execution, such as problem number,
Sum executes time etc., and collects and lead to the problem of data in the process of implementation, and problem data includes field name, field
Description, data value, data type, affiliated table name can find corresponding metadata according to the field name of data and affiliated table name,
To get the corresponding data standard of metadata, quality of data report will be formed after these finish messages, this report can be submitted
Group is administered to operation system responsible person or data.
Following table is the quality of data report of certain enterprise in an implementation:
Enterprise has been broken in number by metadata, the association of data standard, quality of data three through above-mentioned six steps
According to the barrier in business demand in improvement and technical need, the quality of data is formulated by data standard, the quality of data checks first number
According to, achieve the purpose that manage the quality of data and dispatch troops with just cause, meanwhile, the problem can be provided when finding enterprise's problem data
Corresponding business foundation has been truly realized furthermore it is also possible to be required to provide reform advice according to data standard using business as mesh
Mark realizes the complete closed loop in the improvement of enterprise's big data using technology as means, improves the quality of data, specification number to enterprise
According to definition, guarantees that effective management of data assets is of great significance, there is good application value.
Data standard and quality of data association process are realized based on metadata using in big data improvement of the invention
Method, due to passing through the association of metadata, data standard, quality of data three, having broken enterprise's business in data improvement is needed
Ask with the barrier in technical need, the quality of data is formulated by data standard, the quality of data checks metadata, reached to data
Quality manages the purpose dispatched troops with just cause, meanwhile, it can provide the problem corresponding business foundation when finding enterprise's problem data,
Furthermore it is also possible to be required to provide reform advice according to data standard, it has been truly realized using business as target, using technology as means,
The complete closed loop in the improvement of enterprise's big data is realized, the quality of data is improved to enterprise, authority data defines, guarantee data money
The effective management produced is of great significance, and has good application value.
In this description, the present invention is described with reference to its specific embodiment.But it is clear that can still make
Various modifications and alterations are without departing from the spirit and scope of the invention.Therefore, the description and the appended drawings should be considered as illustrative
And not restrictive.
Claims (10)
1. a kind of method for realizing data standard and quality of data association process based on metadata in big data improvement, feature exist
In, the method the following steps are included:
(1) metadata is acquired;
(2) business data standard is imported;
(3) metadata is sorted out according to data standard, and is stored using data standard number as critical field;
(4) data quality standard is formulated according to data standard;
(5) quality rule is write according to data quality standard;
(6) metadata is checked according to quality rule.
2. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering
Method, which is characterized in that the step (1) specifically includes the following steps:
(1.1) data source configuration is obtained, the database information in metadata adapter scan data source is passed through;
(1.2) data are converted, writes data into metadata system.
3. big data according to claim 2 is based on metadata realization data standard and quality of data association process in administering
Method, which is characterized in that the database information in the step (1.1) includes the tissue and structure, table name, word of database
Section name, view, relationship, major key and external key.
4. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering
Method, which is characterized in that the step (2) specifically includes the following steps:
(2.1) business data standard is arranged into the identifiable document template of metadata;
(2.2) data standard is directed into metadata system in a manner of metadata acquisition;
(2.3) it is managed data standard as independent metadata.
5. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering
Method, which is characterized in that the data standard in the step (3) is applicable in multiple metadata, and single metadata is only corresponding single
Data standard.
6. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering
Method, which is characterized in that the step (4) specifically includes the following steps:
(4.1) data quality standard is imported into metadata system, is managed as independent metadata.
7. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering
Method, which is characterized in that the data standard in the step (4) corresponds to a plurality of data quality standard, the single quality of data
Standard only corresponds to single data standard.
8. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering
Method, which is characterized in that the data quality standard in the step (5) corresponds to a plurality of quality rule, single quality rule
It only is from single data quality standard.
9. big data according to claim 1 is based on metadata realization data standard and quality of data association process in administering
Method, which is characterized in that the quality rule in the step (5) includes detection range, detection attribute and detected rule.
10. realizing that data standard is associated with place with the quality of data based on metadata in big data improvement according to claim 1
The method of reason, which is characterized in that the step (6) specifically includes the following steps:
(6.1) quality rule is executed, and collects and leads to the problem of data in the process of implementation;
(6.2) corresponding metadata is found according to the field name of data and affiliated table name, obtains the corresponding data standard of metadata;
(6.3) finish message will be checked and form quality of data report.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910446036.5A CN110119395B (en) | 2019-05-27 | 2019-05-27 | Method for realizing association processing of data standard and data quality based on metadata in big data management |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910446036.5A CN110119395B (en) | 2019-05-27 | 2019-05-27 | Method for realizing association processing of data standard and data quality based on metadata in big data management |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110119395A true CN110119395A (en) | 2019-08-13 |
CN110119395B CN110119395B (en) | 2023-09-15 |
Family
ID=67523306
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910446036.5A Active CN110119395B (en) | 2019-05-27 | 2019-05-27 | Method for realizing association processing of data standard and data quality based on metadata in big data management |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110119395B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111078780A (en) * | 2019-12-23 | 2020-04-28 | 北京中创信测科技股份有限公司 | AI optimization data management method |
CN111125075A (en) * | 2019-12-17 | 2020-05-08 | 国网天津市电力公司电力科学研究院 | Data management method and system for non-computable region |
CN111177134A (en) * | 2019-12-26 | 2020-05-19 | 上海科技发展有限公司 | Data quality analysis method, device, terminal and medium suitable for mass data |
CN112131264A (en) * | 2020-09-15 | 2020-12-25 | 杭州城市大数据运营有限公司 | Method, device and system for recommending different source difference information |
CN112800046A (en) * | 2021-02-26 | 2021-05-14 | 上海帕科信息科技有限公司 | Artificial intelligence platform applied to field data management |
CN112905329A (en) * | 2021-03-24 | 2021-06-04 | 武汉众邦银行股份有限公司 | Full life cycle management and control method for improving standard falling rate of data |
CN113918774A (en) * | 2021-10-28 | 2022-01-11 | 中国平安财产保险股份有限公司 | Data management method, device, equipment and storage medium |
WO2022135973A1 (en) * | 2020-12-22 | 2022-06-30 | Collibra Nv | Bespoke transformation and quality assessment for term definition |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170293641A1 (en) * | 2016-04-06 | 2017-10-12 | International Business Machines Corporation | Data warehouse model validation |
CN107748775A (en) * | 2017-10-17 | 2018-03-02 | 上海计算机软件技术开发中心 | A kind of data governing system based on the quality of data |
US20180113898A1 (en) * | 2016-10-25 | 2018-04-26 | Mastercard International Incorporated | Systems and methods for assessing data quality |
US20180137151A1 (en) * | 2016-11-11 | 2018-05-17 | International Business Machines Corporation | Computing the need for standardization of a set of values |
CN108717456A (en) * | 2018-05-22 | 2018-10-30 | 浪潮软件股份有限公司 | A kind of data lifecycle management platform that data source is unrelated and method |
CN109034532A (en) * | 2018-06-20 | 2018-12-18 | 江苏网域科技有限公司 | A kind of data managing and control system based on big data |
CN109272155A (en) * | 2018-09-11 | 2019-01-25 | 郑州向心力通信技术股份有限公司 | A kind of corporate behavior analysis system based on big data |
CN109344133A (en) * | 2018-08-27 | 2019-02-15 | 成都四方伟业软件股份有限公司 | A kind of data administer driving data and share exchange system and its working method |
CN109408502A (en) * | 2018-11-14 | 2019-03-01 | 成都四方伟业软件股份有限公司 | A kind of data standard processing method, device and its storage medium |
CN109523423A (en) * | 2018-11-28 | 2019-03-26 | 中国海洋石油集团有限公司 | A kind of application system generation method, device, equipment and storage medium |
-
2019
- 2019-05-27 CN CN201910446036.5A patent/CN110119395B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170293641A1 (en) * | 2016-04-06 | 2017-10-12 | International Business Machines Corporation | Data warehouse model validation |
US20180113898A1 (en) * | 2016-10-25 | 2018-04-26 | Mastercard International Incorporated | Systems and methods for assessing data quality |
US20180137151A1 (en) * | 2016-11-11 | 2018-05-17 | International Business Machines Corporation | Computing the need for standardization of a set of values |
CN107748775A (en) * | 2017-10-17 | 2018-03-02 | 上海计算机软件技术开发中心 | A kind of data governing system based on the quality of data |
CN108717456A (en) * | 2018-05-22 | 2018-10-30 | 浪潮软件股份有限公司 | A kind of data lifecycle management platform that data source is unrelated and method |
CN109034532A (en) * | 2018-06-20 | 2018-12-18 | 江苏网域科技有限公司 | A kind of data managing and control system based on big data |
CN109344133A (en) * | 2018-08-27 | 2019-02-15 | 成都四方伟业软件股份有限公司 | A kind of data administer driving data and share exchange system and its working method |
CN109272155A (en) * | 2018-09-11 | 2019-01-25 | 郑州向心力通信技术股份有限公司 | A kind of corporate behavior analysis system based on big data |
CN109408502A (en) * | 2018-11-14 | 2019-03-01 | 成都四方伟业软件股份有限公司 | A kind of data standard processing method, device and its storage medium |
CN109523423A (en) * | 2018-11-28 | 2019-03-26 | 中国海洋石油集团有限公司 | A kind of application system generation method, device, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
李晶晶等: "数据质量评估管理工具的设计与实现", 《信息技术与标准化》, pages 61 - 65 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111125075A (en) * | 2019-12-17 | 2020-05-08 | 国网天津市电力公司电力科学研究院 | Data management method and system for non-computable region |
CN111078780A (en) * | 2019-12-23 | 2020-04-28 | 北京中创信测科技股份有限公司 | AI optimization data management method |
CN111177134A (en) * | 2019-12-26 | 2020-05-19 | 上海科技发展有限公司 | Data quality analysis method, device, terminal and medium suitable for mass data |
CN112131264A (en) * | 2020-09-15 | 2020-12-25 | 杭州城市大数据运营有限公司 | Method, device and system for recommending different source difference information |
WO2022135973A1 (en) * | 2020-12-22 | 2022-06-30 | Collibra Nv | Bespoke transformation and quality assessment for term definition |
US11669682B2 (en) | 2020-12-22 | 2023-06-06 | Collibra Belgium Bv | Bespoke transformation and quality assessment for term definition |
US11966696B2 (en) | 2020-12-22 | 2024-04-23 | Collibra Belgium Bv | Bespoke transformation and quality assessment for term definition |
CN112800046A (en) * | 2021-02-26 | 2021-05-14 | 上海帕科信息科技有限公司 | Artificial intelligence platform applied to field data management |
CN112905329A (en) * | 2021-03-24 | 2021-06-04 | 武汉众邦银行股份有限公司 | Full life cycle management and control method for improving standard falling rate of data |
CN113918774A (en) * | 2021-10-28 | 2022-01-11 | 中国平安财产保险股份有限公司 | Data management method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110119395B (en) | 2023-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110119395A (en) | The method that data standard and quality of data association process are realized based on metadata in big data improvement | |
Wang et al. | Data quality | |
Milo et al. | Next-step suggestions for modern interactive data analysis platforms | |
Lenz et al. | Summarizability in OLAP and statistical data bases | |
Price et al. | A semiotic information quality framework: development and comparative analysis | |
US7328428B2 (en) | System and method for generating data validation rules | |
US10430413B2 (en) | Data information framework | |
CN101308486A (en) | Test question automatic generation system and method | |
Neal | Validity in world city network measurements | |
CN106528828A (en) | Multi-dimensional checking rule-based data quality detection method | |
CN103262076A (en) | Analytical data processing | |
CN107533554A (en) | Document verification system | |
Ferrara et al. | Evaluation of instance matching tools: The experience of OAEI | |
Sneed et al. | Testing big data (Assuring the quality of large databases) | |
Ballou et al. | Sample-based quality estimation of query results in relational database environments | |
CN113450928A (en) | Drug test data control method and system | |
Bicevskis et al. | Data quality evaluation: a comparative analysis of company registers' open data in four European countries. | |
CN108268462A (en) | A kind of data quality checking system of relation integraity | |
CN107680690B (en) | Clinical information system based on metadata | |
Azeroual et al. | Putting FAIR principles in the context of research information: FAIRness for CRIS and CRIS for FAIRness | |
KR101178998B1 (en) | Method and System for Certificating Data | |
CN112966901B (en) | Lineage data quality analysis and verification method for inspection business collaborative flow | |
Zhou et al. | Big data validity evaluation based on MMTD | |
KR20140054913A (en) | Apparatus and method for processing data error for distributed system | |
Presser et al. | A scope classification of data quality requirements for food composition data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |