CN112306997A - Data quality management system - Google Patents
Data quality management system Download PDFInfo
- Publication number
- CN112306997A CN112306997A CN201910665324.XA CN201910665324A CN112306997A CN 112306997 A CN112306997 A CN 112306997A CN 201910665324 A CN201910665324 A CN 201910665324A CN 112306997 A CN112306997 A CN 112306997A
- Authority
- CN
- China
- Prior art keywords
- data
- checking
- module
- data quality
- quality
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003326 Quality management system Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 28
- 238000007405 data analysis Methods 0.000 claims abstract description 9
- 238000005259 measurement Methods 0.000 claims abstract description 9
- 238000012544 monitoring process Methods 0.000 claims abstract description 9
- 238000007726 management method Methods 0.000 claims description 35
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000012550 audit Methods 0.000 claims description 3
- 238000003908 quality control method Methods 0.000 abstract 1
- 238000011278 co-treatment Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009960 carding Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
Abstract
The invention provides a data quality management system. The system comprises a data quality definition module, a checking task scheduling module and a data quality control module, wherein the data quality definition module is used for providing necessary input for the checking task scheduling module through definition and management of quality dimension, checking category, measurement rule and checking method; the checking task scheduling module is used for generating a corresponding checking result problem data file by executing a checking method; the checking result acquisition module is used for acquiring and warehousing the checking result problem data files, summarizing the checking result data in the acquisition process, and respectively storing the detailed data and the summarized data in a result detail table and a summary table; and the problem data analysis module is used for retrieving and analyzing the problem data and starting a problem treatment process. The invention can realize unified data quality check and data quality monitoring, improve management efficiency and implementation efficiency and standardize quality management flow.
Description
Technical Field
The invention relates to the technical field of big data, in particular to a data quality management system.
Background
In the big data era, a large amount of data is generated by governments, society and enterprises all the time, and the governments are used as representatives of national mastery and are responsible for main bodies of national management and civil services, so that the big data resources are possessed, and urgent needs are brought to the current big data application. With the rise of the concept of big data in various fields in recent years, government big data is considered as an important strategic resource for promoting the development of national economy and society by countries in the world, and is highly concerned by governments of various countries.
The application of big data in the field of government affairs, and opening and sharing are the core. Government departments and related public enterprises and public institutions need to open data resources as much as possible, and introduce government applications and services in the modes of purchasing services or resource investment and the like to exert the social and market forces to improve the government service capacity.
How to manage the quality of government affair big data is a problem which needs to be solved urgently at present.
Disclosure of Invention
The data quality management system provided by the invention can realize unified data quality checking and data quality monitoring, improve the management efficiency and the implementation efficiency and standardize the quality management process.
In a first aspect, the present invention provides a data quality management system, the system comprising:
the data quality definition module is used for providing necessary input for the checking task scheduling module through definition and management of quality dimension, checking category, measurement rule and checking method;
the checking task scheduling module is used for generating a corresponding checking result problem data file by executing a checking method;
the checking result acquisition module is used for acquiring and warehousing the checking result problem data files, summarizing the checking result data in the acquisition process, and respectively storing the detailed data and the summarized data in a result detail table and a summary table;
and the problem data analysis module is used for retrieving and analyzing the problem data and starting a problem treatment process.
Optionally, the system further comprises:
and the system configuration module is used for providing necessary auxiliary support for the normal operation of the data quality definition module, the checking task scheduling module, the checking result acquisition module and the problem data analysis module.
Optionally, the system configuration module is configured to provide a parameter configuration, a data source configuration, and a result detail page display configuration.
Optionally, the data quality management system provides the following functions or contents: data quality category management, quality measurement rule management, quality check method audit, data quality check scheduling, data quality check execution, data quality check warehousing, problem data presentation, problem data trend analysis, data quality check monitoring, check log management, page configuration management and data quality reporting.
The data quality management system provided by the embodiment of the invention finds the problem of the data quality collected by a large social co-treatment data platform by formulating and implementing data quality check. And continuously monitoring the data quality fluctuation condition of each system and the data quality rule proportion analysis, periodically generating a key data quality report of each system, and mastering the data quality condition of the system. Based on unified quality check and data quality monitoring, management and service related personnel can be ensured to acquire information and process related work in time. And information vertical wells caused by different service systems are avoided, and repeated checking and processing of the same data are reduced to the maximum extent. The data quality management is uniformly constructed and implemented, and repeated construction and development of data quality function modules of each system are avoided. Based on a uniform data quality checking system, the checking type is clear, and the manageable flow is normalized so as to normalize management. The problem discovery and treatment can achieve closed-loop management and standardized management to improve the data quality through the data quality problem treatment process and related function realization and management.
Drawings
Fig. 1 is a schematic structural diagram of a data quality management system according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating the components and operation of a data quality management system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides a data quality management system, as shown in fig. 1, the system includes:
the data quality definition module 11 is a basis of the whole data quality management system and is used for providing necessary input for the checking task scheduling module 12 through definition and management of quality dimensions, checking categories, measurement rules and checking methods;
the checking task scheduling module 12 is a core of the data quality management system, and is configured to generate a corresponding checking result problem data file by executing a checking method; the problem data of the checking result can reflect the data quality problem concerned by the user;
the checking result acquisition module 13 is used for acquiring and warehousing the checking result problem data files, summarizing the checking result data in the acquisition process, and respectively storing the detailed data and the summarized data in a result detail table and a summary table;
the collection program performs polling collection by using a job mode, and due to the fact that the number of files required to be collected at each time is different, the contained data amount is different, and possible interference of other factors is caused, the time used by each job is not the same, and therefore the last job is unlikely to be finished when each job is started.
In order to avoid the problem of resource contention among different operation batches, the data quality management system adopts a single-operation execution mode, if the last operation is not finished when the operation is started, the operation is automatically finished, and the next operation is not started until the last operation is finished.
And the problem data analysis module 14 is used for retrieving and analyzing the problem data and starting a problem treatment process.
And the problem data analysis module retrieves and analyzes the problem data so as to start a problem treatment process. Therefore, the problem data analysis module is a window for exposing the data quality problem of the checking system, and is the embodiment of the core value of the whole data quality management system.
Further, the data quality management system further includes:
and the system configuration module is used for providing necessary auxiliary support for the normal operation of the data quality definition module, the checking task scheduling module, the checking result acquisition module and the problem data analysis module.
The system configuration module is used for providing parameter configuration, data source configuration and result detail page display configuration.
The data quality management system according to the embodiment of the present invention will be described in detail below.
As shown in fig. 2, the data quality management system specifically includes a data quality management platform and a checking system, where the data quality management platform includes: the measurement rule and check method comprises a measurement rule and check method carding template, a check script generating component, a check script storage library, a check result storage library, a public check component, a scheduling component and a distribution component, wherein the public check component comprises a plurality of threads, and the scheduling component performs thread scheduling execution and sends the threads to each check system for check execution; and storing the check results of all threads in the common check component into a check result storage library.
The data quality management system of the embodiment of the invention can provide the following functions or contents: data quality category management, quality measurement rule management, quality check method audit, data quality check scheduling, data quality check execution, data quality check warehousing, problem data presentation, problem data trend analysis, data quality check monitoring, check log management, page configuration management and data quality report.
The data quality management system of the embodiment of the invention discovers the problem of data quality collected by a large social co-treatment data platform by formulating and implementing data quality check. And continuously monitoring the data quality fluctuation condition of each system and the data quality rule proportion analysis, periodically generating a key data quality report of each system, and mastering the data quality condition of the system. The cleaning assembly provided by the system and the data quality problem processing flow are combined to provide effective support for improving the data quality of each system, and the following support is provided for a social co-treatment big data platform:
1. enhancing management efficiency
And a centralized and unified data quality management platform ensures that management and business related personnel can timely acquire information and process related work based on unified quality check and data quality monitoring. And information vertical wells caused by different service systems are avoided, and repeated checking and processing of the same data are reduced to the maximum extent.
2. Improve the implementation efficiency
The data quality management is uniformly constructed and implemented, and repeated construction and development of data quality function modules of each system are avoided.
3. Normative quality management process
Based on a uniform data quality checking system, the checking type is clear, and the manageable flow is normalized so as to normalize management. The problem discovery and treatment can achieve closed-loop management and standardized management to improve the data quality through the data quality problem treatment process and related function realization and management.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (4)
1. A data quality management system, characterized in that the system comprises:
the data quality definition module is used for providing necessary input for the checking task scheduling module through definition and management of quality dimension, checking category, measurement rule and checking method;
the checking task scheduling module is used for generating a corresponding checking result problem data file by executing a checking method;
the checking result acquisition module is used for acquiring and warehousing the checking result problem data files, summarizing the checking result data in the acquisition process, and respectively storing the detailed data and the summarized data in a result detail table and a summary table;
and the problem data analysis module is used for retrieving and analyzing the problem data and starting a problem treatment process.
2. The system of claim 1, further comprising:
and the system configuration module is used for providing necessary auxiliary support for the normal operation of the data quality definition module, the checking task scheduling module, the checking result acquisition module and the problem data analysis module.
3. The system of claim 1, wherein the system configuration module is configured to provide a parameter configuration, a data source configuration, and a results detail page display configuration.
4. The system of claim 1, wherein the data quality management system provides the following functions or content: data quality category management, quality measurement rule management, quality check method audit, data quality check scheduling, data quality check execution, data quality check warehousing, problem data presentation, problem data trend analysis, data quality check monitoring, check log management, page configuration management and data quality reporting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910665324.XA CN112306997A (en) | 2019-07-23 | 2019-07-23 | Data quality management system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910665324.XA CN112306997A (en) | 2019-07-23 | 2019-07-23 | Data quality management system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112306997A true CN112306997A (en) | 2021-02-02 |
Family
ID=74329511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910665324.XA Pending CN112306997A (en) | 2019-07-23 | 2019-07-23 | Data quality management system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112306997A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114579553A (en) * | 2022-03-07 | 2022-06-03 | 中国标准化研究院 | Data quality assurance method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102222088A (en) * | 2011-05-30 | 2011-10-19 | 大连银行股份有限公司 | System and method for checking, summarizing and displaying data quality according to multidimensional attribute |
CN103473672A (en) * | 2013-09-30 | 2013-12-25 | 国家电网公司 | System, method and platform for auditing metadata quality of enterprise-level data center |
US20160180245A1 (en) * | 2014-12-19 | 2016-06-23 | Medidata Solutions, Inc. | Method and system for linking heterogeneous data sources |
CN107958049A (en) * | 2017-11-28 | 2018-04-24 | 航天科工智慧产业发展有限公司 | A kind of quality of data checking and administration system |
CN108416042A (en) * | 2018-03-14 | 2018-08-17 | 贵州电网有限责任公司 | Data analysis management system based on the Mapping implementation informationization of index storehouse data source |
CN109902084A (en) * | 2019-02-27 | 2019-06-18 | 浪潮软件集团有限公司 | A kind of system and method for full-automatic detection and the analysis quality of data |
-
2019
- 2019-07-23 CN CN201910665324.XA patent/CN112306997A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102222088A (en) * | 2011-05-30 | 2011-10-19 | 大连银行股份有限公司 | System and method for checking, summarizing and displaying data quality according to multidimensional attribute |
CN103473672A (en) * | 2013-09-30 | 2013-12-25 | 国家电网公司 | System, method and platform for auditing metadata quality of enterprise-level data center |
US20160180245A1 (en) * | 2014-12-19 | 2016-06-23 | Medidata Solutions, Inc. | Method and system for linking heterogeneous data sources |
CN107958049A (en) * | 2017-11-28 | 2018-04-24 | 航天科工智慧产业发展有限公司 | A kind of quality of data checking and administration system |
CN108416042A (en) * | 2018-03-14 | 2018-08-17 | 贵州电网有限责任公司 | Data analysis management system based on the Mapping implementation informationization of index storehouse data source |
CN109902084A (en) * | 2019-02-27 | 2019-06-18 | 浪潮软件集团有限公司 | A kind of system and method for full-automatic detection and the analysis quality of data |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114579553A (en) * | 2022-03-07 | 2022-06-03 | 中国标准化研究院 | Data quality assurance method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108039959B (en) | Data situation perception method, system and related device | |
US20120116984A1 (en) | Automated evaluation of compliance data from heterogeneous it systems | |
CN104966172A (en) | Large data visualization analysis and processing system for enterprise operation data analysis | |
CN111177134B (en) | Data quality analysis method, device, terminal and medium suitable for mass data | |
AU2014200563A1 (en) | Identifying quality requirements of a software product | |
CN112162980A (en) | Data quality control method and system, storage medium and electronic equipment | |
CN111400288A (en) | Data quality inspection method and system | |
CN113595761A (en) | Micro-service component optimization method of power system information and communication integrated scheduling platform | |
CN116205396A (en) | Data panoramic monitoring method and system based on data center | |
CN113656245A (en) | Data inspection method and device, storage medium and processor | |
US8839208B2 (en) | Rating interestingness of profiling data subsets | |
CN112306997A (en) | Data quality management system | |
CN112395370A (en) | Data processing method, device, equipment and storage medium | |
CN115016902B (en) | Industrial flow digital management system and method | |
WO2023226461A1 (en) | Multi-domain data fusion method and device, and storage medium | |
CN114168830A (en) | Public opinion data processing system and method, computer storage medium and electronic equipment | |
CN111291106A (en) | Efficient flow arrangement method and system for ETL system | |
CN113641567B (en) | Database inspection method and device, electronic equipment and storage medium | |
CN115168297A (en) | Bypassing log auditing method and device | |
CN110503386B (en) | Data processing system for marketized project | |
CN110908870A (en) | Resource monitoring method and device for mainframe, storage medium and equipment | |
CN112256418A (en) | Big data task scheduling method | |
CN114925045B (en) | PaaS platform for big data integration and management | |
CN111459833B (en) | Method for realizing multi-terminal multi-platform automatic test and monitoring of mobile terminal of government and enterprise | |
Zhou | Enterprise Financial Management Informatization under Cloud Computing Environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210202 |