CN112306997A - Data quality management system - Google Patents

Data quality management system Download PDF

Info

Publication number
CN112306997A
CN112306997A CN201910665324.XA CN201910665324A CN112306997A CN 112306997 A CN112306997 A CN 112306997A CN 201910665324 A CN201910665324 A CN 201910665324A CN 112306997 A CN112306997 A CN 112306997A
Authority
CN
China
Prior art keywords
data
checking
module
data quality
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910665324.XA
Other languages
Chinese (zh)
Inventor
王博
江永渡
万晶
赵志武
张鹤
朱文
吴朝阳
李阳
陈瑞锁
梁力为
周丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Soft Hangzhou Anren Network Communication Co ltd
Original Assignee
China Soft Hangzhou Anren Network Communication Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Soft Hangzhou Anren Network Communication Co ltd filed Critical China Soft Hangzhou Anren Network Communication Co ltd
Priority to CN201910665324.XA priority Critical patent/CN112306997A/en
Publication of CN112306997A publication Critical patent/CN112306997A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Abstract

The invention provides a data quality management system. The system comprises a data quality definition module, a checking task scheduling module and a data quality control module, wherein the data quality definition module is used for providing necessary input for the checking task scheduling module through definition and management of quality dimension, checking category, measurement rule and checking method; the checking task scheduling module is used for generating a corresponding checking result problem data file by executing a checking method; the checking result acquisition module is used for acquiring and warehousing the checking result problem data files, summarizing the checking result data in the acquisition process, and respectively storing the detailed data and the summarized data in a result detail table and a summary table; and the problem data analysis module is used for retrieving and analyzing the problem data and starting a problem treatment process. The invention can realize unified data quality check and data quality monitoring, improve management efficiency and implementation efficiency and standardize quality management flow.

Description

Data quality management system
Technical Field
The invention relates to the technical field of big data, in particular to a data quality management system.
Background
In the big data era, a large amount of data is generated by governments, society and enterprises all the time, and the governments are used as representatives of national mastery and are responsible for main bodies of national management and civil services, so that the big data resources are possessed, and urgent needs are brought to the current big data application. With the rise of the concept of big data in various fields in recent years, government big data is considered as an important strategic resource for promoting the development of national economy and society by countries in the world, and is highly concerned by governments of various countries.
The application of big data in the field of government affairs, and opening and sharing are the core. Government departments and related public enterprises and public institutions need to open data resources as much as possible, and introduce government applications and services in the modes of purchasing services or resource investment and the like to exert the social and market forces to improve the government service capacity.
How to manage the quality of government affair big data is a problem which needs to be solved urgently at present.
Disclosure of Invention
The data quality management system provided by the invention can realize unified data quality checking and data quality monitoring, improve the management efficiency and the implementation efficiency and standardize the quality management process.
In a first aspect, the present invention provides a data quality management system, the system comprising:
the data quality definition module is used for providing necessary input for the checking task scheduling module through definition and management of quality dimension, checking category, measurement rule and checking method;
the checking task scheduling module is used for generating a corresponding checking result problem data file by executing a checking method;
the checking result acquisition module is used for acquiring and warehousing the checking result problem data files, summarizing the checking result data in the acquisition process, and respectively storing the detailed data and the summarized data in a result detail table and a summary table;
and the problem data analysis module is used for retrieving and analyzing the problem data and starting a problem treatment process.
Optionally, the system further comprises:
and the system configuration module is used for providing necessary auxiliary support for the normal operation of the data quality definition module, the checking task scheduling module, the checking result acquisition module and the problem data analysis module.
Optionally, the system configuration module is configured to provide a parameter configuration, a data source configuration, and a result detail page display configuration.
Optionally, the data quality management system provides the following functions or contents: data quality category management, quality measurement rule management, quality check method audit, data quality check scheduling, data quality check execution, data quality check warehousing, problem data presentation, problem data trend analysis, data quality check monitoring, check log management, page configuration management and data quality reporting.
The data quality management system provided by the embodiment of the invention finds the problem of the data quality collected by a large social co-treatment data platform by formulating and implementing data quality check. And continuously monitoring the data quality fluctuation condition of each system and the data quality rule proportion analysis, periodically generating a key data quality report of each system, and mastering the data quality condition of the system. Based on unified quality check and data quality monitoring, management and service related personnel can be ensured to acquire information and process related work in time. And information vertical wells caused by different service systems are avoided, and repeated checking and processing of the same data are reduced to the maximum extent. The data quality management is uniformly constructed and implemented, and repeated construction and development of data quality function modules of each system are avoided. Based on a uniform data quality checking system, the checking type is clear, and the manageable flow is normalized so as to normalize management. The problem discovery and treatment can achieve closed-loop management and standardized management to improve the data quality through the data quality problem treatment process and related function realization and management.
Drawings
Fig. 1 is a schematic structural diagram of a data quality management system according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating the components and operation of a data quality management system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides a data quality management system, as shown in fig. 1, the system includes:
the data quality definition module 11 is a basis of the whole data quality management system and is used for providing necessary input for the checking task scheduling module 12 through definition and management of quality dimensions, checking categories, measurement rules and checking methods;
the checking task scheduling module 12 is a core of the data quality management system, and is configured to generate a corresponding checking result problem data file by executing a checking method; the problem data of the checking result can reflect the data quality problem concerned by the user;
the checking result acquisition module 13 is used for acquiring and warehousing the checking result problem data files, summarizing the checking result data in the acquisition process, and respectively storing the detailed data and the summarized data in a result detail table and a summary table;
the collection program performs polling collection by using a job mode, and due to the fact that the number of files required to be collected at each time is different, the contained data amount is different, and possible interference of other factors is caused, the time used by each job is not the same, and therefore the last job is unlikely to be finished when each job is started.
In order to avoid the problem of resource contention among different operation batches, the data quality management system adopts a single-operation execution mode, if the last operation is not finished when the operation is started, the operation is automatically finished, and the next operation is not started until the last operation is finished.
And the problem data analysis module 14 is used for retrieving and analyzing the problem data and starting a problem treatment process.
And the problem data analysis module retrieves and analyzes the problem data so as to start a problem treatment process. Therefore, the problem data analysis module is a window for exposing the data quality problem of the checking system, and is the embodiment of the core value of the whole data quality management system.
Further, the data quality management system further includes:
and the system configuration module is used for providing necessary auxiliary support for the normal operation of the data quality definition module, the checking task scheduling module, the checking result acquisition module and the problem data analysis module.
The system configuration module is used for providing parameter configuration, data source configuration and result detail page display configuration.
The data quality management system according to the embodiment of the present invention will be described in detail below.
As shown in fig. 2, the data quality management system specifically includes a data quality management platform and a checking system, where the data quality management platform includes: the measurement rule and check method comprises a measurement rule and check method carding template, a check script generating component, a check script storage library, a check result storage library, a public check component, a scheduling component and a distribution component, wherein the public check component comprises a plurality of threads, and the scheduling component performs thread scheduling execution and sends the threads to each check system for check execution; and storing the check results of all threads in the common check component into a check result storage library.
The data quality management system of the embodiment of the invention can provide the following functions or contents: data quality category management, quality measurement rule management, quality check method audit, data quality check scheduling, data quality check execution, data quality check warehousing, problem data presentation, problem data trend analysis, data quality check monitoring, check log management, page configuration management and data quality report.
The data quality management system of the embodiment of the invention discovers the problem of data quality collected by a large social co-treatment data platform by formulating and implementing data quality check. And continuously monitoring the data quality fluctuation condition of each system and the data quality rule proportion analysis, periodically generating a key data quality report of each system, and mastering the data quality condition of the system. The cleaning assembly provided by the system and the data quality problem processing flow are combined to provide effective support for improving the data quality of each system, and the following support is provided for a social co-treatment big data platform:
1. enhancing management efficiency
And a centralized and unified data quality management platform ensures that management and business related personnel can timely acquire information and process related work based on unified quality check and data quality monitoring. And information vertical wells caused by different service systems are avoided, and repeated checking and processing of the same data are reduced to the maximum extent.
2. Improve the implementation efficiency
The data quality management is uniformly constructed and implemented, and repeated construction and development of data quality function modules of each system are avoided.
3. Normative quality management process
Based on a uniform data quality checking system, the checking type is clear, and the manageable flow is normalized so as to normalize management. The problem discovery and treatment can achieve closed-loop management and standardized management to improve the data quality through the data quality problem treatment process and related function realization and management.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (4)

1. A data quality management system, characterized in that the system comprises:
the data quality definition module is used for providing necessary input for the checking task scheduling module through definition and management of quality dimension, checking category, measurement rule and checking method;
the checking task scheduling module is used for generating a corresponding checking result problem data file by executing a checking method;
the checking result acquisition module is used for acquiring and warehousing the checking result problem data files, summarizing the checking result data in the acquisition process, and respectively storing the detailed data and the summarized data in a result detail table and a summary table;
and the problem data analysis module is used for retrieving and analyzing the problem data and starting a problem treatment process.
2. The system of claim 1, further comprising:
and the system configuration module is used for providing necessary auxiliary support for the normal operation of the data quality definition module, the checking task scheduling module, the checking result acquisition module and the problem data analysis module.
3. The system of claim 1, wherein the system configuration module is configured to provide a parameter configuration, a data source configuration, and a results detail page display configuration.
4. The system of claim 1, wherein the data quality management system provides the following functions or content: data quality category management, quality measurement rule management, quality check method audit, data quality check scheduling, data quality check execution, data quality check warehousing, problem data presentation, problem data trend analysis, data quality check monitoring, check log management, page configuration management and data quality reporting.
CN201910665324.XA 2019-07-23 2019-07-23 Data quality management system Pending CN112306997A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910665324.XA CN112306997A (en) 2019-07-23 2019-07-23 Data quality management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910665324.XA CN112306997A (en) 2019-07-23 2019-07-23 Data quality management system

Publications (1)

Publication Number Publication Date
CN112306997A true CN112306997A (en) 2021-02-02

Family

ID=74329511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910665324.XA Pending CN112306997A (en) 2019-07-23 2019-07-23 Data quality management system

Country Status (1)

Country Link
CN (1) CN112306997A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114579553A (en) * 2022-03-07 2022-06-03 中国标准化研究院 Data quality assurance method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222088A (en) * 2011-05-30 2011-10-19 大连银行股份有限公司 System and method for checking, summarizing and displaying data quality according to multidimensional attribute
CN103473672A (en) * 2013-09-30 2013-12-25 国家电网公司 System, method and platform for auditing metadata quality of enterprise-level data center
US20160180245A1 (en) * 2014-12-19 2016-06-23 Medidata Solutions, Inc. Method and system for linking heterogeneous data sources
CN107958049A (en) * 2017-11-28 2018-04-24 航天科工智慧产业发展有限公司 A kind of quality of data checking and administration system
CN108416042A (en) * 2018-03-14 2018-08-17 贵州电网有限责任公司 Data analysis management system based on the Mapping implementation informationization of index storehouse data source
CN109902084A (en) * 2019-02-27 2019-06-18 浪潮软件集团有限公司 A kind of system and method for full-automatic detection and the analysis quality of data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222088A (en) * 2011-05-30 2011-10-19 大连银行股份有限公司 System and method for checking, summarizing and displaying data quality according to multidimensional attribute
CN103473672A (en) * 2013-09-30 2013-12-25 国家电网公司 System, method and platform for auditing metadata quality of enterprise-level data center
US20160180245A1 (en) * 2014-12-19 2016-06-23 Medidata Solutions, Inc. Method and system for linking heterogeneous data sources
CN107958049A (en) * 2017-11-28 2018-04-24 航天科工智慧产业发展有限公司 A kind of quality of data checking and administration system
CN108416042A (en) * 2018-03-14 2018-08-17 贵州电网有限责任公司 Data analysis management system based on the Mapping implementation informationization of index storehouse data source
CN109902084A (en) * 2019-02-27 2019-06-18 浪潮软件集团有限公司 A kind of system and method for full-automatic detection and the analysis quality of data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114579553A (en) * 2022-03-07 2022-06-03 中国标准化研究院 Data quality assurance method

Similar Documents

Publication Publication Date Title
CN108039959B (en) Data situation perception method, system and related device
US20120116984A1 (en) Automated evaluation of compliance data from heterogeneous it systems
CN104966172A (en) Large data visualization analysis and processing system for enterprise operation data analysis
CN111177134B (en) Data quality analysis method, device, terminal and medium suitable for mass data
AU2014200563A1 (en) Identifying quality requirements of a software product
CN112162980A (en) Data quality control method and system, storage medium and electronic equipment
CN111400288A (en) Data quality inspection method and system
CN113595761A (en) Micro-service component optimization method of power system information and communication integrated scheduling platform
CN116205396A (en) Data panoramic monitoring method and system based on data center
CN113656245A (en) Data inspection method and device, storage medium and processor
US8839208B2 (en) Rating interestingness of profiling data subsets
CN112306997A (en) Data quality management system
CN112395370A (en) Data processing method, device, equipment and storage medium
CN115016902B (en) Industrial flow digital management system and method
WO2023226461A1 (en) Multi-domain data fusion method and device, and storage medium
CN114168830A (en) Public opinion data processing system and method, computer storage medium and electronic equipment
CN111291106A (en) Efficient flow arrangement method and system for ETL system
CN113641567B (en) Database inspection method and device, electronic equipment and storage medium
CN115168297A (en) Bypassing log auditing method and device
CN110503386B (en) Data processing system for marketized project
CN110908870A (en) Resource monitoring method and device for mainframe, storage medium and equipment
CN112256418A (en) Big data task scheduling method
CN114925045B (en) PaaS platform for big data integration and management
CN111459833B (en) Method for realizing multi-terminal multi-platform automatic test and monitoring of mobile terminal of government and enterprise
Zhou Enterprise Financial Management Informatization under Cloud Computing Environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210202