CN111177139A - Data quality verification monitoring and early warning method and system based on data quality system - Google Patents

Data quality verification monitoring and early warning method and system based on data quality system Download PDF

Info

Publication number
CN111177139A
CN111177139A CN201911409805.0A CN201911409805A CN111177139A CN 111177139 A CN111177139 A CN 111177139A CN 201911409805 A CN201911409805 A CN 201911409805A CN 111177139 A CN111177139 A CN 111177139A
Authority
CN
China
Prior art keywords
data
early warning
data quality
rule
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911409805.0A
Other languages
Chinese (zh)
Inventor
李松前
李昭
陈浩
高靖
崔岩
卢述奇
陈呈
张宵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingwutong Co ltd
Original Assignee
Qingwutong Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingwutong Co ltd filed Critical Qingwutong Co ltd
Priority to CN201911409805.0A priority Critical patent/CN111177139A/en
Publication of CN111177139A publication Critical patent/CN111177139A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data quality verification monitoring and early warning method and system based on a data quality system, relating to the technical field of computers, wherein the method comprises the following steps: synchronizing source data to a data warehouse and setting an early warning threshold, when the source data are within the early warning threshold, performing standardization processing on the source data to obtain standardized source data, performing metadata management on the standardized source data to obtain first data, processing the first data according to a data quality system rule to obtain second data, generating a visual early warning report, visualizing the second data, and highlighting an abnormal database table. The invention can monitor the data quality in the process of establishing the data warehouse and improve the readability and the accuracy of the data.

Description

Data quality verification monitoring and early warning method and system based on data quality system
Technical Field
The invention relates to the field of computers, in particular to a data quality verification monitoring and early warning method and system based on a data quality system.
Background
The advent of the big data age brings a lot of data assets to enterprises, and the enterprises need to find real effective data in a plurality of data to analyze and mine the data.
The prior art mainly has the following problems:
1. due to the fact that enterprise data come from different business systems, data of an upstream data source is abnormal, for example, the data of a crawler is changed in structure, the crawler fails and the like, and accordingly downstream data errors occur;
2. metadata lacks effective management in establishing a data warehouse, so that the readability of the data is poor, and the functionality of the metadata cannot be exerted to the maximum extent;
3. the data quality in the data table has problems, and the main points are as follows:
1) the lack of valid primary keys for the data results in duplication of the data;
2) the data does not conform to a standard data type;
3) the data does not conform to standard data business rules;
4) the main indexes of the data table are abnormal;
4. enterprises do not effectively monitor and early warn a data warehouse, data developers cannot quickly know the current data condition, data abnormity cannot be notified in time, problems of related data quoted by downstream are caused, and wrong guidance is brought to data analysis and decision-making personnel.
In view of the above, it is an urgent problem in the art to overcome the drawbacks of the prior art.
Disclosure of Invention
In view of this, the invention provides a data quality verification monitoring and early warning method and system based on a data quality system, which are used for monitoring data quality in the process of establishing a data warehouse and improving the readability and the accuracy of data.
On one hand, the invention provides a data quality verification monitoring and early warning method based on a data quality system, which comprises the following steps:
synchronizing source data to a data warehouse and setting an early warning threshold, and when the source data is within the early warning threshold, carrying out standardization processing on the source data to obtain standardized source data;
performing metadata management on the standardized source data, and performing service description on a target table and a target field in the massive report through metadata to obtain first data;
processing the first data according to a data quality system rule to obtain second data, wherein the processing comprises the following steps:
acquiring the data type and/or attribute of the first data;
configuring a detection rule combination according to the data type and/or the attribute of the first data, wherein the configuration detection rule combination at least comprises one detection rule;
performing quality detection on the first data according to the rule combination to obtain second data, and sending the second data to a destination end;
and generating a visual early warning report, visualizing the second data and highlighting the abnormal database table.
On the other hand, the invention provides a data quality verification monitoring and early warning system based on a data quality system, which comprises a source data standardization processing module, a data warehouse metadata management module, a data quality system rule verification module and a visual early warning report generation module, wherein,
the source data standardization processing module is coupled with the data warehouse metadata management module and is used for standardizing the source data synchronized into the data warehouse to obtain standardized source data and sending the standardized source data to the data warehouse metadata management module;
the data warehouse metadata management module is respectively coupled with the source data standardization processing module and the data quality system rule verification module and is used for performing metadata management on the standardized source data and performing service description on a target table and a target field in the massive report through metadata to obtain first data;
the data quality system rule verification module is respectively coupled to the data warehouse metadata management module and the visual pre-alarm table generation module, and configured to process the first data according to a data quality system rule to obtain second data, where the data quality system rule verification module includes:
acquiring the data type and/or attribute of the first data;
configuring a detection rule combination according to the data type and/or the attribute of the first data, wherein the configuration detection rule combination at least comprises one detection rule;
performing quality detection on the first data according to the rule combination to obtain second data, and sending the second data to a destination end;
and the visual pre-alarm table generation module is coupled with the data quality system rule verification module, and is used for visualizing the second data and highlighting the abnormal database table.
Compared with the prior art, the data quality verification monitoring and early warning method and system based on the data quality system at least realize the following beneficial effects:
the invention monitors the data quality in the process of establishing the data warehouse, and can improve the readability and the accuracy of the data;
because the source data are subjected to source data standardization processing, data warehouse metadata management and data quality system rule verification in sequence, the detection quality can be improved, the detection effect and the detection precision are improved, a manager can conveniently judge the related quality of the current data assets and data warehouses, and the manager and the developer can be guided to provide more detailed data problem positioning for improving the data quality; the data can be easily understood and read, and a data user is facilitated;
the invention can generate a visual early warning report form, highlight the abnormal database table and facilitate the monitoring of the data.
Of course, it is not necessary for any product in which the present invention is practiced to achieve all of the above-described technical effects simultaneously.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow chart of a data quality verification monitoring and early warning method based on a data quality system according to the present invention;
fig. 2 is a block diagram of a data quality verification monitoring and early warning system based on a data quality system according to the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Referring to fig. 1, fig. 1 is a flow chart of a data quality verification monitoring and early warning method based on a data quality system according to the present invention.
The data quality verification monitoring and early warning method based on the data quality system in fig. 1 comprises the following steps:
s1: synchronizing source data to a data warehouse and setting an early warning threshold, and when the source data is within the early warning threshold, carrying out standardization processing on the source data to obtain standardized source data;
adding early warning judgment in the process of data transmission according to different service systems;
for example, when the crawler system is abnormal, the system determines whether to perform the next step according to the conditions according to the data quantity standard set by service personnel, if the data quantity standard is lower than a certain minimum value;
s2: performing metadata management on the standardized source data, and performing service description on a target table and a target field in the massive report through metadata to obtain first data;
some tables or related fields are inquired in the massive reports, and because developers do not maintain metadata well, data users are difficult to find the meanings of the related data tables and the related fields corresponding to own services; in the early warning system, statistics and visualization of metadata related to each module are performed, for example, statistics of tables and fields which are not described by services are performed on databases in a service system through the metadata, development and service related personnel are promoted to perfect data indexes through monitoring and early warning visualization, and data are easier to read and understand.
Metadata (Metadata) is data describing other data, or structural data for providing information about a certain resource. Metadata is data that describes an object such as an information resource or data, and is used for the purpose of: identifying a resource; evaluating the resources; tracking changes of the resource in the using process; the method realizes simple and efficient management of a large amount of networked data; the information resources are effectively discovered, searched and integrally organized, and the used resources are effectively managed. Once the metadata is established, it can be shared. The structure and integrity of the metadata depend on the value and use environment of the information resources; the development and utilization environment of metadata is often a varied distributed environment; either format cannot fully meet the different needs of different groups; metadata is a coding scheme. Metadata is an encoding system used to describe digital information resources, especially network information resources, which results in fundamental differences between metadata and conventional data encoding systems; the most important feature and function of metadata is to build a machine understandable framework for digitized information resources. The metadata system constructs a logical framework and a basic model of the e-government affairs, so that the functional characteristics, the operation mode and the overall performance of the system operation of the e-government affairs are determined. The operation of the e-government is implemented based on the metadata. It has the main functions as follows: description functions, integration functions, control functions, and agent functions. Since metadata is also data, it can be stored and retrieved in a database in a data-like manner. The use of data elements will be made accurate and efficient if the organization of the data elements is provided while the metadata describing the data elements is provided. The user may first view his metadata when using the data in order to be able to obtain the information he wants.
In the field of data warehousing, metadata is divided into technical metadata and business metadata by purpose. First, the metadata can provide user-based information, and the metadata, such as business description information of the recorded data items, can help the user to use the data. Second, metadata can support the management and maintenance of data by the system, e.g., metadata about the data item storage method can support the system in accessing data in the most efficient manner. In particular, in a data warehouse system, the metadata mechanism mainly supports five types of system management functions, namely describing which data are in the data warehouse; defining data to be entered into and data to be generated from the data warehouse; recording the data extraction working time arrangement carried out along with the occurrence of the business event; recording and detecting the requirement and the execution condition of the system data consistency; and measuring the data quality.
S3: processing the first data according to a data quality system rule to obtain second data, wherein the second data comprises the following steps:
acquiring the data type and/or attribute of the first data;
configuring a detection rule combination according to the data type and/or the attribute of the first data, wherein the configuration detection rule combination at least comprises one detection rule;
performing quality detection on the first data according to the rule combination to obtain second data, and sending the second data to a destination end;
the combination of detection rules is used to detect the integrity, consistency, accuracy and timeliness of the first data, wherein,
the integrity is the condition of checking whether the record and the information of the first data are complete and whether the first data are missing;
the consistency is to check whether the record of the first data meets the standard or not and whether the record is unified with the sequence and other data sets or not;
the accuracy is to check whether the first data is accurate or not and whether abnormal or error information exists or not;
timeliness is the time interval from the creation of the first data to its viewing.
The data warehouse provides important data support for enterprise decision making, and the quality of data directly influences the development of company business; monitoring the data quality through a main key check, a code standard check, a business rule check and other checks in order to ensure the accuracy of the data warehouse;
the method comprises the following steps that a developer and related business personnel make management rules of related data quality, the management rules are recorded into a system, the developer converts the rules into scripts to execute, inspection records are put into an inspection log table, an inspection result table is formed according to the rule processing of the inspection log table and a data quality system, and data support is made for a visual table system and error notification; data volume check in a rule system mainly aims at ensuring the integrity of data, and judges whether the data in the current day is an empty table or not and whether the reference of downstream data is influenced or not;
in some optional embodiments, detecting the rule combination in some optional embodiments comprises: a primary key check, a code standard check and a business rule check. The check rule includes:
checking a main key: ensuring the uniqueness of the primary key, counting the daily record data number of the primary key field, and writing the data number into a check log table; according to the uniqueness principle of the main key, when the record number is more than 1, judging that an error is written into a checking result table;
code standard checking: when the access source data enters a data warehouse system, the meanings and the apertures of some fields are appointed, and when a source data system is modified or changed, the downstream data is inaccurate; for example, a field is added at the source data to indicate that the source data is of type A or type B, and no downstream is taken in the extraction process, so that the data source column is checked, and the number of statistically relevant valid columns is compared;
and (3) checking a business rule: the business rule check ensures the accuracy and the usability of the data, and checks whether each piece of data meets the rule by establishing rules, for example, when the value of the age is 0, the data is considered to be abnormal, and the rules are finally put into a check log and a check result table;
it will be appreciated that other checks are also included: setting specific data exploration according to different service scenes, recording specific rules into a quality system by a service, and generating corresponding scripts to be executed by developers according to the rules; for example, the proportion of the number of people with a certain sales result being empty to the total number of people is checked in the sales scoring table, when the proportion is more than 60%, the business judges that the data is abnormal, if the upstream data is abnormal, the early warning system informs relevant personnel to check and modify according to the check result after the script is executed.
The method comprises the following steps of judging whether data in the current day is an empty table or not and influencing the reference of downstream data or not mainly for ensuring the integrity of the data; for example, when the number of the house-keeping staff is counted, the number of the house-keeping staff who are present on the day is counted in the staff-keeping list and stored in the inspection log list; according to the relevant basic data of the human department, about 2000 persons are considered as the number of persons who receive the house, the minimum number of persons is not less than 1500 persons, the number of persons who receive the house on the same day is obtained from the inspection log table according to the rule, and when the number is 100 persons, the person is judged to be abnormal or wrongly included in the inspection result table.
S4: and generating a visual early warning report, visualizing the second data and highlighting the abnormal database table.
And performing classified display on each service data, adding special reminding of a warning line to the abnormal index, for example, when the statistical metadata field describes the index, considering that the readability of the data is poor when the amount of no field description reaches thirty percent, and particularly reminding at the moment, and developing personnel perfecting the data indexes according to log data.
In some alternative embodiments, the detection result of the next time overlaps the detection result of the previous time, that is, the detection result of the last time may be retained, although not specifically limited herein.
In some optional embodiments, the data quality verification monitoring and warning method based on the data quality system further includes setting an automatic execution frequency of the data quality system rule verification, where the automatic execution frequency includes daily execution, weekly execution, and/or monthly execution.
In some optional embodiments, the method further comprises setting personnel management in the configuration table, and sending the abnormal database table in the visual early warning report table to personnel.
With reference to fig. 2, fig. 2 is a block diagram of a data quality verification monitoring and early warning system based on a data quality system according to the present invention. In fig. 2, the data quality verification monitoring and early warning system based on the data quality system includes a source data standardization processing module 201, a data warehouse metadata management module 202, a data quality system rule verification module 203, and a visual pre-alarm table generation module 204.
The source data standardization processing module 201 is coupled with the data warehouse metadata management module 202, and is configured to standardize source data synchronized in the data warehouse to obtain standardized source data, and send the standardized source data to the data warehouse metadata management module 202;
adding early warning judgment in the process of data transmission according to different service systems; for example, when the crawler system is abnormal, the system determines whether to perform the next step according to the conditions according to the data quantity standard set by service personnel, if the data quantity standard is lower than a certain minimum value;
the data warehouse metadata management module 202 is respectively coupled with the source data standardization processing module 201 and the data quality system rule verification module 203, and is used for performing metadata management on standardized source data and performing service description on a target table and a target field in the massive report through metadata to obtain first data;
some tables or related fields are inquired in the massive reports, and because developers do not maintain metadata well, data users are difficult to find the meanings of the related data tables and the related fields corresponding to the services of the data users; in the early warning system, statistics and visualization of metadata related to each module are performed, for example, statistics of tables and fields which are not described by services are performed on databases in a service system through the metadata, development and service related personnel are promoted to perfect data indexes through monitoring and early warning visualization, and data are easier to read and understand.
The data quality system rule verification module 203 is coupled to the data warehouse metadata management module 202 and the visual early warning report generation module 204, and configured to process the second data according to the data quality system rule to obtain second data, where the data quality system rule verification module includes:
acquiring the data type and/or attribute of the first data;
configuring a detection rule combination according to the data type and/or the attribute of the first data, wherein the configuration detection rule combination at least comprises one detection rule;
performing quality detection on the first data according to the rule combination to obtain second data, and sending the second data to a destination end;
the detection rule combination comprises: a primary key check, a code standard check and a business rule check. The data warehouse provides important data support for enterprise decision making, and the quality of data directly influences the development of company business; the monitoring of data quality is performed by a primary key check, a code standard check, a business rule check, and other checks in order to ensure the accuracy of the data warehouse.
The data warehouse provides important data support for enterprise decision making, and the quality of data directly influences the development of company business; monitoring the data quality through a main key check, a code standard check, a business rule check and other checks in order to ensure the accuracy of the data warehouse;
in some optional embodiments, checking the rule comprises:
checking a main key: ensuring the uniqueness of the primary key, counting the daily record data number of the primary key field, and writing the data number into a check log table; according to the uniqueness principle of the main key, when the record number is more than 1, judging that an error is written into a checking result table;
code standard checking: when the access source data enters a data warehouse system, the meanings and the apertures of some fields are appointed, and when a source data system is modified or changed, the downstream data is inaccurate; for example, a field is added at the source data to indicate that the source data is of type A or type B, and no downstream is taken in the extraction process, so that the data source column is checked, and the number of statistically relevant valid columns is compared;
and (3) checking a business rule: the business rule check ensures the accuracy and the usability of the data, and checks whether each piece of data meets the rule by establishing rules, for example, when the value of the age is 0, the data is considered to be abnormal, and the rules are finally put into a check log and a check result table;
it will be appreciated that other checks are also included: setting specific data exploration according to different service scenes, recording specific rules into a quality system by a service, and generating corresponding scripts to be executed by developers according to the rules; for example, the proportion of the number of people with a certain sales result being empty to the total number of people is checked in the sales scoring table, when the proportion is more than 60%, the business judges that the data is abnormal, if the upstream data is abnormal, the early warning system informs relevant personnel to check and modify according to the check result after the script is executed.
The detection rule combination is used for detecting the integrity, consistency, accuracy and timeliness of the first data.
The integrity is the condition of checking whether the record and the information of the first data are complete and whether the first data are missing;
the consistency is to check whether the record of the first data meets the standard or not and whether the record is unified with the sequence and other data sets or not;
the accuracy is to check whether the first data is accurate or not and whether abnormal or error information exists or not;
timeliness is the time interval from the creation of the first data to its viewing.
The visualized pre-alarm table generation module 204 is coupled with the data quality system rule verification module 203, visualizes the second data, and highlights the abnormal database table.
The method comprises the following steps that a developer and related business personnel make management rules of related data quality, the management rules are recorded into a system, the developer converts the rules into scripts to execute, inspection records are put into an inspection log table, an inspection result table is formed according to the rule processing of the inspection log table and a data quality system, and data support is made for a visual table system and error notification; data volume checking in a rule system mainly aims to ensure the integrity of data, and judges whether the data in the current day is an empty table or not and influences the reference of downstream data or not.
In some optional embodiments, the data quality verification monitoring and warning system based on the data quality system further includes a data quality system rule verification automatic execution frequency setting module, coupled to the data quality system rule verification module, for setting an automatic execution frequency of the data quality system rule verification, where the automatic execution frequency includes daily execution, weekly execution, and/or monthly execution.
In some optional embodiments, the system further includes a visual warning report sending module, configured to set staff management in the configuration table, and send the abnormal database table in the visual warning report to the staff.
Classifying and displaying each service data, adding special reminding of a warning line to an abnormal index, for example, when a statistical metadata field describes the index, considering that the readability of the data is very poor when no field describes the index to thirty percent, and particularly reminding at the moment, and developing personnel perfecting the data indexes according to log data; and then configuring a development responsible person of the related service table as an attribute of the table in an error notification system, and directly sending the responsible person when the data of the table needs to be reported by mistake, so that the responsible person can quickly process and search related problems conveniently.
By the embodiment, the data quality verification monitoring and early warning method and system based on the data quality system at least achieve the following beneficial effects:
the invention monitors the data quality in the process of establishing the data warehouse, and can improve the readability and the accuracy of the data;
because the source data are subjected to source data standardization processing, data warehouse metadata management and data quality system rule verification in sequence, the detection quality can be improved, the detection effect and the detection precision are improved, a manager can conveniently judge the related quality of the current data assets and data warehouses, and the manager and the developer can be guided to provide more detailed data problem positioning for improving the data quality; the data can be easily understood and read, and a data user is facilitated;
the invention can generate a visual early warning report form, highlight the abnormal database table and facilitate the monitoring of the data.
Although some specific embodiments of the present invention have been described in detail by way of examples, it should be understood by those skilled in the art that the above examples are for illustrative purposes only and are not intended to limit the scope of the present invention. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims (10)

1. A data quality verification monitoring and early warning method based on a data quality system is characterized by comprising the following steps:
synchronizing source data to a data warehouse and setting an early warning threshold, and when the source data is within the early warning threshold, carrying out standardization processing on the source data to obtain standardized source data;
performing metadata management on the standardized source data, and performing service description on a target table and a target field in the massive report through metadata to obtain first data;
processing the first data according to a data quality system rule to obtain second data, wherein the processing comprises the following steps:
acquiring the data type and/or attribute of the first data;
configuring a detection rule combination according to the data type and/or the attribute of the first data, wherein the configuration detection rule combination at least comprises one detection rule;
performing quality detection on the first data according to the rule combination to obtain second data, and sending the second data to a destination end;
and generating a visual early warning report, visualizing the second data and highlighting the abnormal database table.
2. The data quality verification monitoring and early warning method based on data quality hierarchy of claim 1, wherein the detection rule combination is used for detecting the integrity, consistency, accuracy and timeliness of the first data, wherein,
the integrity is the condition of checking whether the record and the information of the first data are complete and whether the first data are missing;
the consistency is to check whether the record of the first data meets the specification or not and whether the record is unified with the sequence and other data sets or not;
the accuracy is to check whether the first data is accurate or not, and whether abnormal or error information exists or not;
the timeliness is a time interval from the time the first data is checked for production to the time it can be viewed.
3. The data quality verification monitoring and early warning method based on the data quality system as claimed in claim 1, wherein the detection rule combination comprises: a primary key check, a code standard check and a business rule check.
4. The data quality verification monitoring and early warning method based on the data quality system as claimed in claim 1, further comprising setting an automatic execution frequency of the data quality system rule verification, wherein the automatic execution frequency comprises daily execution, weekly execution, and/or monthly execution.
5. The data quality verification monitoring and early warning method based on the data quality system according to claim 1, further comprising setting personnel management in a configuration table, and sending an abnormal database table in the visual early warning report to the personnel.
6. A data quality verification monitoring and early warning system based on a data quality system is characterized by comprising a source data standardization processing module, a data warehouse metadata management module, a data quality system rule verification module and a visual early warning report generation module, wherein,
the source data standardization processing module is coupled with the data warehouse metadata management module and is used for standardizing the source data synchronized into the data warehouse to obtain standardized source data and sending the standardized source data to the data warehouse metadata management module;
the data warehouse metadata management module is respectively coupled with the source data standardization processing module and the data quality system rule verification module and is used for performing metadata management on the standardized source data and performing service description on a target table and a target field in the massive report through metadata to obtain first data;
the data quality system rule verification module is respectively coupled to the data warehouse metadata management module and the visual pre-alarm table generation module, and configured to process the first data according to a data quality system rule to obtain second data, where the data quality system rule verification module includes:
acquiring the data type and/or attribute of the first data;
configuring a detection rule combination according to the data type and/or the attribute of the first data, wherein the configuration detection rule combination at least comprises one detection rule;
performing quality detection on the first data according to the rule combination to obtain second data, and sending the second data to a destination end;
and the visual pre-alarm table generation module is coupled with the data quality system rule verification module, and is used for visualizing the second data and highlighting the abnormal database table.
7. The data quality verification monitoring and pre-warning system based on data quality hierarchy of claim 6, wherein the detection rule combination is used for detecting integrity, consistency, accuracy and timeliness of the first data, wherein,
the integrity is the condition of checking whether the record and the information of the first data are complete and whether the first data are missing;
the consistency is to check whether the record of the first data meets the specification or not and whether the record is unified with the sequence and other data sets or not;
the accuracy is to check whether the first data is accurate or not, and whether abnormal or error information exists or not;
the timeliness is a time interval from the time the first data is checked for production to the time it can be viewed.
8. The data quality verification monitoring and early warning system based on the data quality system as claimed in claim 6, wherein the detection rule combination comprises: a primary key check, a code standard check and a business rule check.
9. The data quality verification monitoring and warning system based on the data quality system as claimed in claim 8, further comprising a data quality system rule verification automatic execution frequency setting module coupled to the data quality system rule verification module for setting an automatic execution frequency of the data quality system rule verification, wherein the automatic execution frequency includes daily execution, weekly execution, and/or monthly execution.
10. The data quality verification monitoring and early warning system based on the data quality system as claimed in claim 6, further comprising a visible early warning report sending module for setting personnel management in a configuration table and sending abnormal database tables in the visible early warning report to the personnel.
CN201911409805.0A 2019-12-31 2019-12-31 Data quality verification monitoring and early warning method and system based on data quality system Withdrawn CN111177139A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911409805.0A CN111177139A (en) 2019-12-31 2019-12-31 Data quality verification monitoring and early warning method and system based on data quality system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911409805.0A CN111177139A (en) 2019-12-31 2019-12-31 Data quality verification monitoring and early warning method and system based on data quality system

Publications (1)

Publication Number Publication Date
CN111177139A true CN111177139A (en) 2020-05-19

Family

ID=70654213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911409805.0A Withdrawn CN111177139A (en) 2019-12-31 2019-12-31 Data quality verification monitoring and early warning method and system based on data quality system

Country Status (1)

Country Link
CN (1) CN111177139A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112650789A (en) * 2020-12-25 2021-04-13 北京首钢自动化信息技术有限公司 Monitoring and early warning system of steel product data channel
CN116069775A (en) * 2023-04-06 2023-05-05 上海二三四五网络科技有限公司 Data quality verification system and method for data warehouse
CN117114843A (en) * 2023-10-25 2023-11-24 浙江农商数字科技有限责任公司 Bank data quality control method
CN117591508A (en) * 2024-01-19 2024-02-23 云筑信息科技(成都)有限公司 Data quality guarantee method under big data scene

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050066240A1 (en) * 2002-10-04 2005-03-24 Tenix Investments Pty Ltd Data quality & integrity engine
CN104156822A (en) * 2014-08-11 2014-11-19 国家电网公司 SOA-based comprehensive enterprise level information system operation and maintenance management method
CN107103025A (en) * 2017-01-05 2017-08-29 北京亚信智慧数据科技有限公司 A kind of data processing method and data processing platform (DPP)
CN109716356A (en) * 2016-09-01 2019-05-03 摄取技术有限公司 Abnormality detection in multivariate data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050066240A1 (en) * 2002-10-04 2005-03-24 Tenix Investments Pty Ltd Data quality & integrity engine
CN104156822A (en) * 2014-08-11 2014-11-19 国家电网公司 SOA-based comprehensive enterprise level information system operation and maintenance management method
CN109716356A (en) * 2016-09-01 2019-05-03 摄取技术有限公司 Abnormality detection in multivariate data
CN107103025A (en) * 2017-01-05 2017-08-29 北京亚信智慧数据科技有限公司 A kind of data processing method and data processing platform (DPP)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112650789A (en) * 2020-12-25 2021-04-13 北京首钢自动化信息技术有限公司 Monitoring and early warning system of steel product data channel
CN112650789B (en) * 2020-12-25 2024-03-19 北京首钢自动化信息技术有限公司 Monitoring and early warning system for steel product data channel
CN116069775A (en) * 2023-04-06 2023-05-05 上海二三四五网络科技有限公司 Data quality verification system and method for data warehouse
CN116069775B (en) * 2023-04-06 2023-08-22 上海二三四五网络科技有限公司 Data quality verification system and method for data warehouse
CN117114843A (en) * 2023-10-25 2023-11-24 浙江农商数字科技有限责任公司 Bank data quality control method
CN117114843B (en) * 2023-10-25 2024-02-23 浙江农商数字科技有限责任公司 Bank data quality control method
CN117591508A (en) * 2024-01-19 2024-02-23 云筑信息科技(成都)有限公司 Data quality guarantee method under big data scene
CN117591508B (en) * 2024-01-19 2024-05-28 云筑信息科技(成都)有限公司 Data quality guarantee method under big data scene

Similar Documents

Publication Publication Date Title
Laranjeiro et al. A survey on data quality: classifying poor data
CN111177139A (en) Data quality verification monitoring and early warning method and system based on data quality system
CN104756106B (en) Data source in characterize data storage system
Stvilia et al. A framework for information quality assessment
CN105868373B (en) Method and device for processing key data of power business information system
US10013439B2 (en) Automatic generation of instantiation rules to determine quality of data migration
US10504047B2 (en) Metadata-driven audit reporting system with dynamically created display names
CN111159272A (en) Data quality monitoring and early warning method and system based on data warehouse and ETL
CN107810500A (en) Data quality analysis
Li et al. A rule based taxonomy of dirty data.
US11113317B2 (en) Generating parsing rules for log messages
CN111159161A (en) ETL rule-based data quality monitoring and early warning system and method
US11113137B2 (en) Error incident fingerprinting with unique static identifiers
CN112446555B (en) Risk identification method, device and equipment
US20130238550A1 (en) Method to detect transcoding tables in etl processes
Zhang et al. A data driven approach for discovering data quality requirements
Song et al. Data dependencies extended for variety and veracity: A family tree
Rashid et al. A quality assessment approach for evolving knowledge bases
CN111209153B (en) Abnormity detection processing method and device and electronic equipment
US7992126B2 (en) Apparatus and method for quantitatively measuring the balance within a balanced scorecard
CN112486841A (en) Method and device for checking data collected by buried point
Talha et al. Towards a powerful solution for data accuracy assessment in the big data context
CN112631889A (en) Portrayal method, device and equipment for application system and readable storage medium
Toivonen Big data quality challenges in the context of business analytics
KR101415528B1 (en) Apparatus and Method for processing data error for distributed system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200519