CN111159161A - ETL rule-based data quality monitoring and early warning system and method - Google Patents

ETL rule-based data quality monitoring and early warning system and method Download PDF

Info

Publication number
CN111159161A
CN111159161A CN201911420956.6A CN201911420956A CN111159161A CN 111159161 A CN111159161 A CN 111159161A CN 201911420956 A CN201911420956 A CN 201911420956A CN 111159161 A CN111159161 A CN 111159161A
Authority
CN
China
Prior art keywords
data
etl
source
processing
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911420956.6A
Other languages
Chinese (zh)
Inventor
李松前
李昭
陈浩
高靖
崔岩
卢述奇
陈呈
张宵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingwutong Co ltd
Original Assignee
Qingwutong Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingwutong Co ltd filed Critical Qingwutong Co ltd
Priority to CN201911420956.6A priority Critical patent/CN111159161A/en
Publication of CN111159161A publication Critical patent/CN111159161A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data quality monitoring and early warning system and method based on ETL rules, the system comprises a source data standardization processing module, a data warehouse metadata management module, an ETL rule processing module and a visual early warning report generation module, wherein the source data standardization processing module is coupled with the data warehouse metadata management module; the data warehouse metadata management module is respectively coupled with the source data standardization processing module and the ETL rule processing module; the ETL rule processing module is respectively coupled with the data warehouse metadata management module and the visual early warning report generation module; the visual early warning report generation module is coupled with the ETL rule processing module. The invention monitors the data quality in the process of establishing the data warehouse and the ETL, and improves the readability and the accuracy of the data.

Description

ETL rule-based data quality monitoring and early warning system and method
Technical Field
The invention relates to the field of computers, in particular to a data quality monitoring and early warning system and method based on an ETL rule.
Background
The invention mainly solves the problems that the quality of data is monitored in the process of establishing a data warehouse and an ETL (extract transform load) process, and the data is easy to read and accurate; the following problems mainly exist in the field at present:
1. due to the fact that enterprise data come from different business systems, data of an upstream data source is abnormal, for example, structural changes of crawler data and crawler failure and the like cause errors of downstream ETL data;
2. metadata is lack of effective management in the process of establishing a data warehouse and ETL, so that the readability of the data is poor, and the functionality of the metadata cannot be exerted to the maximum extent;
3. the data quality problem in the data table is mainly highlighted by the following:
1) the lack of valid primary keys for the data results in duplication of the data;
2) the data does not conform to a standard data type;
3) the data does not conform to the standard data business rule, for example, the interval of the data is 1-100, and more than 100 data appear in the table;
4) the data sheet main indexes are abnormal data, such as abnormal daily traffic, for example, about 2000 people are in the office every day, about 100 people are in the office every day due to data problems, and the like;
4. enterprises do not effectively monitor and early warn in the data warehouse and ETL process, data developers cannot quickly know the current data condition, data abnormity cannot be notified timely, problems of downstream related reference of the data are caused, and wrong guidance is brought to data analysis and decision-making personnel.
In view of the above, it is an urgent problem in the art to overcome the drawbacks of the prior art.
Disclosure of Invention
In view of this, the invention provides a data quality monitoring and early warning system and method based on the ETL rule, so that the data quality is monitored in the process of establishing a data warehouse and ETL, and the readability and the accuracy of the data are improved.
On one hand, the invention discloses a data quality monitoring and early warning system based on ETL rules, which comprises a source data standardization processing module, a data warehouse metadata management module, an ETL rule processing module and a visual early warning report generation module, wherein,
the source data standardization processing module is coupled with the data warehouse metadata management module and is used for standardizing the source data synchronized into the data warehouse to obtain standardized source data and sending the standardized source data to the data warehouse metadata management module;
the data warehouse metadata management module is respectively coupled with the source data standardization processing module and the ETL rule processing module, and is used for performing metadata management on the standardized source data and performing service description on a target table and a target field in a massive report through metadata to obtain first data;
the ETL rule processing module is respectively coupled with the data warehouse metadata management module and the visual pre-alarm table generation module and is used for selecting data to be processed to perform ETL processing on the source end and the destination end, the first data is the source end, the data to be processed is extracted from the first data according to a service rule and a data standard, corresponding ETL processing is performed on the data to be processed, dirty data in the first data is removed to obtain a processing result of the data to be processed, the processing result is loaded to the destination end to obtain second data, and the dirty data refers to data which is not in a given range or has no meaning to actual service, or has illegal data format or has irregular coding and vague data;
and the visual early warning report generation module is coupled with the ETL rule processing module, visualizes the second data and highlights the abnormal database table.
On the other hand, the invention also provides a data quality monitoring and early warning method based on the ETL rule, which comprises the following steps:
synchronizing source data to a data warehouse and setting an early warning threshold, and when the source data is within the early warning threshold, carrying out standardization processing on the source data to obtain standardized source data;
performing metadata management on the standardized source data, and performing service description on a target table and a target field in the massive report through metadata to obtain first data;
selecting data to be processed, carrying out ETL processing on the data to be processed to obtain a source end and a destination end, wherein the first data is the source end, extracting the data to be processed from the first data according to a service rule and a data standard, carrying out corresponding ETL processing on the data to be processed, removing dirty data in the first data to obtain a processing result of the data to be processed, and loading the processing result to the destination end to obtain second data, wherein the dirty data refers to coding and fuzzy data which are not in a given range or have no meaning to actual service, or are illegal in data format, or have irregular format;
and generating a visual early warning report, visualizing the second data and highlighting the abnormal database table.
Compared with the prior art, the data quality monitoring and early warning system and method based on the ETL rule at least realize the following beneficial effects:
because the source data are subjected to source data standardization processing, data warehouse metadata management and ETL rule processing in sequence, the detection quality can be improved, the detection effect and the detection precision are enhanced, a manager can conveniently judge the related quality of the current data assets and data warehouses, and the manager and the developer can be guided to provide more detailed data problem positioning for improving the data quality; the data can be easily understood and read, and a data user is facilitated;
the invention can generate a visual early warning report form, highlight the abnormal database table and facilitate the monitoring of the data;
according to the invention, different types of data sources are added into one data warehouse for management, when ETL processing is required, ETL processing is carried out by selecting from the added data sources, for a multi-remote ETL tool, all data source information of a developer can be conveniently stored in the data warehouse, respective business is not required to be stored in respective internal documents, and when the data source is added, only a new data source is required to be added, so that switching among different structure types of a plurality of data sources is not required during data development, the workload of the developer is reduced, and the development efficiency is improved.
Of course, it is not necessary for any product in which the present invention is practiced to achieve all of the above-described technical effects simultaneously.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a block diagram of a data quality monitoring and early warning system based on ETL rules according to the present invention;
fig. 2 is a flowchart of a data quality monitoring and early warning method based on ETL rules provided in the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Referring to fig. 1, fig. 1 is a block diagram of a data quality monitoring and early warning system based on ETL rules according to the present invention. In fig. 1, the ETL rule-based data quality monitoring and early warning system includes a source data standardization processing module 101, a data warehouse metadata management module 102, an ETL rule processing module 103, and a visualization pre-alarm table generation module 104.
The source data standardization processing module 101 is coupled to the data warehouse metadata management module 102, and is configured to standardize source data synchronized in the data warehouse to obtain standardized source data, and send the standardized source data to the data warehouse metadata management module 102;
adding early warning judgment in the process of data transmission according to different service systems; for example, when the crawler system is abnormal, the system determines whether to perform the next step according to the conditions according to the data quantity standard set by service personnel, if the data quantity standard is lower than a certain minimum value;
the source data standardization processing module synchronizes the data source according to the type of the data source, wherein the data source comprises related information, and the related information comprises the name of the data source, the type of the data source and the access mode of a data table in the data source.
In this embodiment, when the data source needs to be synchronized into the data warehouse, the data source may be imported into the data warehouse according to the address stored in the data source, and then, the related information of the data source is set. The access mode of the data table in the data source can comprise a port, a user name, a password and the like. After the relevant information is set, the set information and the data source are stored according to a preset rule, for example, the preset rule is a one-to-one correspondence relationship between the relevant information of the data source and the data source.
The data warehouse metadata management module 102 is respectively coupled with the source data standardization processing module 101 and the ETL rule processing module 103, and is used for performing metadata management on standardized source data and performing service description on a target table and a target field in a massive report through metadata to obtain first data;
some tables or related fields are inquired in the massive reports, and because developers do not maintain metadata well, data users are difficult to find the meanings of the related data tables and the related fields corresponding to the services of the data users; in the early warning system, statistics and visualization of metadata related to each module are performed, for example, statistics of tables and fields which are not described by services are performed on databases in a service system through the metadata, development and service related personnel are promoted to perfect data indexes through monitoring and early warning visualization, and data are easier to read and understand.
The ETL rule processing module 103 is coupled to the data warehouse metadata management module 102 and the visual pre-alarm table generation module 104, respectively, and is configured to remove dirty data in the first data according to the business rules and the data standards to obtain second data, where the dirty data is data that is not in a given range or has no meaning to the actual business, or is illegal in data format, or has irregular coding and ambiguity;
the ETL process is important in the data warehouse construction, and certain dirty data are removed firstly according to business rules and data standards in the ETL process; for example, if there is no room number in the receipt contract in the receipt detail data, relevant conditions can be directly added in the process of extracting the receipt details to the ODS layer to filter out the data; related data can also be processed according to business rules, for example, when the age is calculated according to the identification number, and when the identification number is not in the correct format, the data is considered as garbage data if the first 4 bits of the identification number contain letters, and data processing is added to the garbage data, for example, the data is set to 0.
It is understood that the Data processing behavior can be divided into operation type Data processing and analysis type Data processing, the operation type Data processing is generally performed in a conventional Database (DB), and the analysis type Data processing is performed in a Data Warehouse (DW). However, not all data processing can be divided, and the processing requirements for data are not only two types, for example, some operation type processing is not suitable for being performed on a traditional database, and some analysis type processing is not suitable for being performed in a data warehouse. This time, a third Data storage hierarchy is required, and an Operation Data Storage (ODS) system is generated accordingly. It also turns the DB-DW two-layer data architecture into a DB-ODS-DW three-layer data architecture. The ODS is a data storage system, which integrates data (various operation type databases, external data sources and the like) from different data sources into a theme-oriented, integrated, enterprise-global and consistent data set (mainly latest or latest detail data and possibly required summarized data) through an ETL process, is used for meeting enterprise near-real-time OLAP (online analytical processing) operation and enterprise-global OLTP (online analytical processing) operation, provides integrated data for a data warehouse, and sinks the ETL process in the data warehouse system into the ODS to be completed so as to relieve the pressure of the data warehouse.
The visualized pre-alarm table generation module 104 is coupled with the ETL rule processing module 103, and visualizes the second data to highlight the abnormal database table.
Developers and related business personnel formulate management rules of related data quality, the management rules are recorded into a system, the developers convert the rules into scripts to execute, check records are put into a check log table, a check result table is formed according to the check log table, and data support is made for a visual table system and error notification; data volume check in a rule system mainly aims at ensuring the integrity of data, and judges whether the data in the current day is an empty table or not and whether the reference of downstream data is influenced or not;
in some optional embodiments, the system further includes a visual warning report sending module, configured to set staff management in the configuration table, and send the abnormal database table in the visual warning report to the staff.
Classifying and displaying each service data, adding special reminding of a warning line to an abnormal index, for example, when a statistical metadata field describes the index, considering that the readability of the data is very poor when no field describes the index to thirty percent, and particularly reminding at the moment, and developing personnel perfecting the data indexes according to log data; and then configuring a development responsible person of the related service table as an attribute of the table in an error notification system, and directly sending the responsible person when the data of the table needs to be reported by mistake, so that the responsible person can quickly process and search related problems conveniently.
In some optional embodiments, the ETL rule-based data quality monitoring and early warning system further includes an ETL processing rule setting module, coupled to the ETL rule processing module, configured to receive an ETL processing rule set by a user, and perform corresponding ETL processing on the data to be processed according to the ETL processing rule, where the ETL processing rule includes an ETL processing rule set according to a preset processing standard of Structured Query Language (SQL).
The ETL processing rule set by the user may be set according to a preset big data processing standard, where the big data processing standard for ETL processing may include multiple standards, and in this embodiment, the processing standard of SQL (structured query residual) language is selected, that is, the user may write the steps for ETL processing according to the SQL standard.
In the embodiment, different types of data sources are added to one data warehouse for management, and when ETL processing is required, ETL processing is performed by selecting from the added data sources, for a multi-remote ETL tool, all data source information of a developer can be conveniently stored in the data warehouse, respective business is not required to be stored in respective internal documents, and when a data source is added, only a new data source needs to be added.
Furthermore, because the source data are subjected to source data standardization processing, data warehouse metadata management and ETL rule processing in sequence, the detection quality can be improved, the detection effect and the detection precision are improved, a manager can conveniently judge the related quality of the current data assets and data warehouses, and the manager and the developer can be guided to provide more detailed data problem positioning for improving the data quality; the data can be easily understood and read, and the data user is convenient.
Referring to fig. 2, fig. 2 is a flowchart of a data quality monitoring and early warning method based on ETL rule provided in the present invention. The ETL rule-based data quality monitoring and early warning method in fig. 2 includes the following steps:
s1: synchronizing source data to a data warehouse and setting an early warning threshold, and when the source data is within the early warning threshold, carrying out standardization processing on the source data to obtain standardized source data;
adding early warning judgment in the process of data transmission according to different service systems;
for example, when the crawler system is abnormal, the system determines whether to perform the next step according to the conditions according to the data quantity standard set by service personnel, if the data quantity standard is lower than a certain minimum value;
synchronizing source data to a data warehouse and carrying out standardized processing on the source data, wherein the data source is synchronized according to the type of the data source, the data source comprises relevant information, and the relevant information comprises the name of the data source, the type of the data source and the access mode of a data table in the data source.
S2: performing metadata management on the standardized source data, and performing service description on a target table and a target field in the massive report through metadata to obtain first data;
some tables or related fields are inquired in the massive reports, and because developers do not maintain metadata well, data users are difficult to find the meanings of the related data tables and the related fields corresponding to the services of the data users; in the early warning system, statistics and visualization of metadata related to each module are performed, for example, statistics of tables and fields which are not described by services are performed on databases in a service system through the metadata, development and service related personnel are promoted to perfect data indexes through monitoring and early warning visualization, and data are easier to read and understand.
S3: selecting a source end and a destination end for ETL processing of data to be processed, wherein the first data is the source end, extracting the data to be processed from the first data according to a service rule and a data standard, carrying out corresponding ETL processing on the data to be processed, removing dirty data in the first data to obtain a processing result of the data to be processed, and loading the processing result to the destination end to obtain second data, wherein the dirty data refers to data which is not in a given range or has no meaning to an actual service, or has illegal data format or irregular codes and vague data;
the ETL (Extract-Transform-Load, Chinese full name: data warehouse technology) processing is performed on the data, and the ETL processing process is a process of extracting data from one data source, converting the extracted data into a standard format and loading the standard format to another target data source. Currently, there are many different types of data sources, such as: relational Mysql, non-relational HBase, a data warehouse with Hive, a file storage HDFS and a file index service Elasticissearch with a storage function; while data sources of different data types may have different interface types.
The ETL rule processing comprises the following steps: when no room number exists in the house-receiving contract in the house-receiving detail data, filtering the data; when the age is calculated based on the identification number, the identification number is regarded as garbage data when it is not in the correct format and is replaced with 0.
In some optional embodiments, the method further includes receiving an ETL processing rule set by a user, and performing corresponding ETL processing on the data to be processed according to the ETL processing rule, where the ETL processing rule includes an ETL processing rule set according to a preset processing standard of Structured Query Language (SQL).
In the embodiment, different types of data sources are added to one data warehouse for management, and when ETL processing is required, ETL processing is performed by selecting from the added data sources, for a multi-remote ETL tool, all data source information of a developer can be conveniently stored in the data warehouse, respective business is not required to be stored in respective internal documents, and when a data source is added, only a new data source needs to be added.
S4: and generating a visual early warning report, visualizing the second data and highlighting the abnormal database table.
And performing classified display on each service data, adding special reminding of a warning line to the abnormal index, for example, when the statistical metadata field describes the index, considering that the readability of the data is poor when the amount of no field description reaches thirty percent, and particularly reminding at the moment, and developing personnel perfecting the data indexes according to log data.
In some optional embodiments, the method further comprises setting personnel management in a configuration table, and sending the abnormal database table in the visual early warning report to the personnel.
By the embodiment, the data quality monitoring and early warning system and method based on the ETL rule at least realize the following beneficial effects:
because the source data are subjected to source data standardization processing, data warehouse metadata management and ETL rule processing in sequence, the detection quality can be improved, the detection effect and the detection precision are enhanced, a manager can conveniently judge the related quality of the current data assets and data warehouses, and the manager and the developer can be guided to provide more detailed data problem positioning for improving the data quality; the data can be easily understood and read, and a data user is facilitated;
the invention can generate a visual early warning report form, highlight the abnormal database table and facilitate the monitoring of the data;
according to the invention, different types of data sources are added into one data warehouse for management, when ETL processing is required, ETL processing is carried out by selecting from the added data sources, for a multi-remote ETL tool, all data source information of a developer can be conveniently stored in the data warehouse, respective business is not required to be stored in respective internal documents, and when the data source is added, only a new data source is required to be added, so that switching among different structure types of a plurality of data sources is not required during data development, the workload of the developer is reduced, and the development efficiency is improved.
Although some specific embodiments of the present invention have been described in detail by way of examples, it should be understood by those skilled in the art that the above examples are for illustrative purposes only and are not intended to limit the scope of the present invention. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims (10)

1. A data quality monitoring and early warning system based on ETL rules is characterized by comprising a source data standardization processing module, a data warehouse metadata management module, an ETL rule processing module and a visual early warning report generation module, wherein,
the source data standardization processing module is coupled with the data warehouse metadata management module and is used for standardizing the source data synchronized into the data warehouse to obtain standardized source data and sending the standardized source data to the data warehouse metadata management module;
the data warehouse metadata management module is respectively coupled with the source data standardization processing module and the ETL rule processing module, and is used for performing metadata management on the standardized source data and performing service description on a target table and a target field in a massive report through metadata to obtain first data;
the ETL rule processing module is respectively coupled with the data warehouse metadata management module and the visual pre-alarm table generation module and is used for selecting data to be processed to perform ETL processing on the source end and the destination end, the first data is the source end, the data to be processed is extracted from the first data according to a service rule and a data standard, corresponding ETL processing is performed on the data to be processed, dirty data in the first data is removed to obtain a processing result of the data to be processed, the processing result is loaded to the destination end to obtain second data, and the dirty data refers to data which is not in a given range or has no meaning to actual service, or has illegal data format or has irregular coding and vague data;
and the visual early warning report generation module is coupled with the ETL rule processing module, visualizes the second data and highlights the abnormal database table.
2. The ETL rule-based data quality monitoring and early warning system of claim 1, further comprising an ETL processing rule setting module coupled to the ETL rule processing module and configured to receive an ETL processing rule set by a user, and perform corresponding ETL processing on the data to be processed according to the ETL processing rule, where the ETL processing rule includes an ETL processing rule set according to a preset processing standard of Structured Query Language (SQL).
3. The ETL rule-based data quality monitoring and early warning system according to claim 1, wherein the ETL rule processing module is configured to filter out the data when no room number exists in the delivery contract in the delivery detail data; when the age is calculated based on the identification number, the identification number is regarded as garbage data when it is not in the correct format and is replaced with 0.
4. The ETL rule-based data quality monitoring and early warning system of claim 1, wherein the source data normalization processing module synchronizes data sources according to the types of the data sources, the data sources include related information, and the related information includes data source names, data source types, and access modes of data tables in the data sources.
5. The ETL rule-based data quality monitoring and early warning system of claim 1, further comprising a visual early warning report sending module, configured to set personnel management in a configuration table, and send an abnormal database table in the visual early warning report to the personnel.
6. A data quality monitoring and early warning method based on ETL rules is characterized by comprising the following steps:
synchronizing source data to a data warehouse and setting an early warning threshold, and when the source data is within the early warning threshold, carrying out standardization processing on the source data to obtain standardized source data;
performing metadata management on the standardized source data, and performing service description on a target table and a target field in the massive report through metadata to obtain first data;
selecting data to be processed, carrying out ETL processing on the data to be processed to obtain a source end and a destination end, wherein the first data is the source end, extracting the data to be processed from the first data according to a service rule and a data standard, carrying out corresponding ETL processing on the data to be processed, removing dirty data in the first data to obtain a processing result of the data to be processed, and loading the processing result to the destination end to obtain second data, wherein the dirty data refers to coding and fuzzy data which are not in a given range or have no meaning to actual service, or are illegal in data format, or have irregular format;
and generating a visual early warning report, visualizing the second data and highlighting the abnormal database table.
7. The ETL rule-based data quality monitoring and early warning method according to claim 6, further comprising receiving an ETL processing rule set by a user, and performing corresponding ETL processing on the data to be processed according to the ETL processing rule, wherein the ETL processing rule comprises an ETL processing rule set according to a processing standard of a preset Structured Query Language (SQL).
8. The ETL rule-based data quality monitoring and early warning method of claim 6, wherein the ETL processing comprises: when no room number exists in the house-receiving contract in the house-receiving detail data, filtering the data; when the age is calculated based on the identification number, the identification number is regarded as garbage data when it is not in the correct format and is replaced with 0.
9. The ETL rule-based data quality monitoring and early warning method of claim 6, wherein synchronizing the source data to the data warehouse comprises synchronizing the data source according to the type of the data source, wherein the data source comprises related information, and the related information comprises a data source name, a data source type and an access mode of a data table in the data source.
10. The ETL rule-based data quality monitoring and early warning method of claim 6, further comprising setting personnel management in a configuration table and sending abnormal database tables in the visual early warning report to the personnel.
CN201911420956.6A 2019-12-31 2019-12-31 ETL rule-based data quality monitoring and early warning system and method Withdrawn CN111159161A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911420956.6A CN111159161A (en) 2019-12-31 2019-12-31 ETL rule-based data quality monitoring and early warning system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911420956.6A CN111159161A (en) 2019-12-31 2019-12-31 ETL rule-based data quality monitoring and early warning system and method

Publications (1)

Publication Number Publication Date
CN111159161A true CN111159161A (en) 2020-05-15

Family

ID=70560548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911420956.6A Withdrawn CN111159161A (en) 2019-12-31 2019-12-31 ETL rule-based data quality monitoring and early warning system and method

Country Status (1)

Country Link
CN (1) CN111159161A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522842A (en) * 2020-07-04 2020-08-11 杭州城市大数据运营有限公司 ETL data processing method and device, computer equipment and storage medium
CN112199423A (en) * 2020-09-01 2021-01-08 河钢数字技术股份有限公司 ETL data quality judgment and feedback method
CN112395350A (en) * 2020-11-17 2021-02-23 中国工商银行股份有限公司 Method and device for visualizing monitoring data of multiple data sources
CN112925769A (en) * 2021-03-08 2021-06-08 浪潮云信息技术股份公司 Digital civil administration internal data gathering and sharing method
CN113836160A (en) * 2021-09-28 2021-12-24 上海市大数据股份有限公司 Data flow state monitoring and warning system based on master-slave synchronization

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512790A (en) * 2015-08-14 2016-04-20 上海合胜计算机科技股份有限公司 Integrated operation and maintenance management system
CN107025285A (en) * 2017-04-07 2017-08-08 广州隆德信息科技有限公司 A kind of data handling system of comprehensive operation
CN107463709A (en) * 2017-08-21 2017-12-12 北京奇艺世纪科技有限公司 A kind of ETL processing method and processing devices based on multi-data source
CN108241724A (en) * 2017-05-11 2018-07-03 新华三大数据技术有限公司 A kind of metadata management method and device
CN108959564A (en) * 2018-07-04 2018-12-07 玖富金科控股集团有限责任公司 Data warehouse metadata management method, readable storage medium storing program for executing and computer equipment
CN109947746A (en) * 2017-10-26 2019-06-28 亿阳信通股份有限公司 A kind of quality of data management-control method and system based on ETL process
CN110457294A (en) * 2019-06-28 2019-11-15 阿里巴巴集团控股有限公司 A kind of data processing method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512790A (en) * 2015-08-14 2016-04-20 上海合胜计算机科技股份有限公司 Integrated operation and maintenance management system
CN107025285A (en) * 2017-04-07 2017-08-08 广州隆德信息科技有限公司 A kind of data handling system of comprehensive operation
CN108241724A (en) * 2017-05-11 2018-07-03 新华三大数据技术有限公司 A kind of metadata management method and device
CN107463709A (en) * 2017-08-21 2017-12-12 北京奇艺世纪科技有限公司 A kind of ETL processing method and processing devices based on multi-data source
CN109947746A (en) * 2017-10-26 2019-06-28 亿阳信通股份有限公司 A kind of quality of data management-control method and system based on ETL process
CN108959564A (en) * 2018-07-04 2018-12-07 玖富金科控股集团有限责任公司 Data warehouse metadata management method, readable storage medium storing program for executing and computer equipment
CN110457294A (en) * 2019-06-28 2019-11-15 阿里巴巴集团控股有限公司 A kind of data processing method and device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522842A (en) * 2020-07-04 2020-08-11 杭州城市大数据运营有限公司 ETL data processing method and device, computer equipment and storage medium
CN112199423A (en) * 2020-09-01 2021-01-08 河钢数字技术股份有限公司 ETL data quality judgment and feedback method
CN112395350A (en) * 2020-11-17 2021-02-23 中国工商银行股份有限公司 Method and device for visualizing monitoring data of multiple data sources
CN112925769A (en) * 2021-03-08 2021-06-08 浪潮云信息技术股份公司 Digital civil administration internal data gathering and sharing method
CN113836160A (en) * 2021-09-28 2021-12-24 上海市大数据股份有限公司 Data flow state monitoring and warning system based on master-slave synchronization
CN113836160B (en) * 2021-09-28 2024-01-23 上海市大数据股份有限公司 Data stream state monitoring alarm system based on master-slave synchronization

Similar Documents

Publication Publication Date Title
CN111159161A (en) ETL rule-based data quality monitoring and early warning system and method
AU2020250205B2 (en) Characterizing data sources in a data storage system
Stvilia et al. A framework for information quality assessment
CA2701046C (en) Analysis of a system for matching data records
US8150803B2 (en) Relationship data management
CN111159272A (en) Data quality monitoring and early warning method and system based on data warehouse and ETL
US20040034643A1 (en) System and method for real time statistics collection for use in the automatic management of a database system
Woodall et al. A classification of data quality assessment and improvement methods
CN112000773B (en) Search engine technology-based data association relation mining method and application
CN111177139A (en) Data quality verification monitoring and early warning method and system based on data quality system
CN112181840B (en) Method and device for determining database state, equipment and storage medium
CA2639856A1 (en) Relationship data management
CN111581056B (en) Software engineering database maintenance and early warning system based on artificial intelligence
CN116303628B (en) Alarm data query method, system and equipment based on elastic search
CN112416904A (en) Electric power data standardization processing method and device
CN116795631A (en) Service system monitoring alarm method, device, equipment and medium
KR101415528B1 (en) Apparatus and Method for processing data error for distributed system
Francis et al. The Police National Computer and the Offenders Index: can they be combined for research purposes?
CN115187122A (en) Enterprise policy deduction method, device, equipment and medium
KR20180071699A (en) System for online monitoring individual information and method of online monitoring the same
Fisun et al. Generation of frequent item sets in multidimensional data by means of templates for mining inter-dimensional association rules
CN117764455B (en) Universal index management method and system for data
Stefanov Methods for heterogeneity detection during multi-dimensional data mart integration
CN113435701A (en) Method and device for processing consumption quality information
CN115481176A (en) Data multidimensional display method based on real-time warehouse

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200515