CN111143334A - Data quality closed-loop control method - Google Patents

Data quality closed-loop control method Download PDF

Info

Publication number
CN111143334A
CN111143334A CN201911104166.7A CN201911104166A CN111143334A CN 111143334 A CN111143334 A CN 111143334A CN 201911104166 A CN201911104166 A CN 201911104166A CN 111143334 A CN111143334 A CN 111143334A
Authority
CN
China
Prior art keywords
data
data quality
quality
rule
checking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911104166.7A
Other languages
Chinese (zh)
Inventor
巩怀志
陈祥
范寿明
苏建雄
周晓君
李孟儒
贾西贝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huaao Data Technology Co Ltd
Original Assignee
Shenzhen Huaao Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huaao Data Technology Co Ltd filed Critical Shenzhen Huaao Data Technology Co Ltd
Priority to CN201911104166.7A priority Critical patent/CN111143334A/en
Publication of CN111143334A publication Critical patent/CN111143334A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • General Factory Administration (AREA)

Abstract

The invention discloses a data quality closed-loop management method, which comprises the steps of formulating a data quality monitoring and checking scheme, and monitoring and checking the data quality; formulating a data quality rule base; performing data quality control at regular time according to the data quality rule base to obtain a data quality problem; managing data quality problems; evaluating the data quality; under the drive of the closed-loop management, new treatment requirements are continuously generated for the data quality, and the quality problem is continuously solved, so that the data quality is continuously improved.

Description

Data quality closed-loop control method
Technical Field
The invention relates to the technical field of data quality control, in particular to a data quality closed-loop control method.
Background
Government departments, internet enterprises, and large group enterprises have accumulated and precipitated a large amount of data resources. China has become one of the countries with the largest data generation and accumulation amount and the most abundant data types, and data has become the first resource from the perspective of national strategy and urban strategy. However, in the process of informatization construction, enterprises and governments face the same problem, and data is wasted. Data quality problems such as data repetition, data inaccuracy, data unreliability, data loss, data irrelevance, data updating slowness and even interruption are more and more exposed along with the development of informatization, and the low quality becomes a core problem of government and enterprise data. The quality problems of the data, including technical problems, information problems, flow problems, management problems and the like, cannot meet the problems of data consistency, normalization and the like, and an effective quality management method is not available.
Disclosure of Invention
In view of the above problems, the present invention provides a data quality closed-loop control method, which establishes a data quality closed-loop control to continuously improve data quality, aiming at the quality problems of data, including technical problems, information problems, process problems, management problems, and the like, which cannot satisfy the problems of data consistency and normalization, and the like.
In order to achieve the above object, an embodiment of the present invention provides a data quality closed-loop control method, including:
step 1: formulating a data quality monitoring and checking scheme, and monitoring and checking the data quality;
step 2: formulating a data quality rule base;
and step 3: performing data quality control at regular time according to the data quality rule base to obtain a data quality problem;
and 4, step 4: managing data quality problems;
and 5: and (5) evaluating the data quality, and returning to the step 1.
Further, before step 1, defining data quality, specifically, defining key data items for data quality check, defining check rules, defining data quality measurement indexes, defining data quality control and monitoring modes, and defining a data quality evaluation model.
Further, defining key data items for data quality checking, including null value checking, repeated checking, format checking, reference checking, value range checking, consistency checking, logic checking and relation checking;
defining a data quality checking rule comprises defining a rule name, an association table, a rule type, a problem grade, a rule weight, a rule state, a rule description and creation time;
defining data quality measurement indexes including integrity, consistency, repeatability, correctness, compliance, relevance and timeliness of data;
defining a data quality control model and a data quality monitoring mode, wherein the data quality control model controls a data inspection object, data inspection frequency, data inspection time and a data inspection mode; the data quality monitoring mode comprises an automatic mode data quality monitoring mode or a manual mode data quality monitoring mode;
and defining a data quality evaluation model, and quantitatively diagnosing and evaluating the data quality.
Further, the data quality monitoring and checking scheme formulated in the step 1 comprises data quality control in a service flow and data quality control in an information system;
the data quality control in the business process comprises data quality control in a data generation link, data quality control in a data integration link and data quality control in a data use link;
the data quality control in the information system comprises the control of data quality problems generated in personnel, processes, the front end of the service system, the database of the service system, the extraction process and the loading process in the information system.
Further, in step 3, the formulating the data quality rule base includes:
collecting data quality requirements, collecting the data quality requirements, collecting and sorting the data quality problems, data use quality problems, data process quality problems and data overall quality problems, finding the data quality overall requirements, integrating the quality overall requirements and confirming the data quality overall target;
data quality checking data carding, data quality checking object, data range and index composition checking are confirmed, the data range is carded, a core object in the data range is found, information such as data quality checking indexes, checking rules, checking modes, checking periods, checking targets, grading standards and data quality accountants is preliminarily obtained according to object standard definition and service scenes, a document is formed, and content confirmation and revision are carried out according to the content of the document;
formulating a data quality checking rule, and formulating the data quality checking rule according to the data quality measurement index;
the data quality check rule management comprises the steps of managing a public rule library, and realizing public rule multiplexing through an SQL rule, a regular rule, a value domain rule and an algorithm package;
rule configuration management, namely, according to data detection index indexes, a built-in rule engine realizes quality detection;
and changes to data quality check rules.
Further, in step 3, the performing data quality control at regular time according to the data quality rule base includes data quality rule analysis, data quality checking frequency, data quality monitoring range formulation, data quality monitoring report generation, data quality evaluation report generation, and data quality comprehensive report generation.
Further, in step 4, the data quality problem management includes data quality cycle management, data quality problem feedback, data quality problem reason analysis, data quality problem correction, and data quality problem correction.
Further, the data quality evaluation in step 5 includes evaluating a core index from the data quality evaluation, a data quality evaluation mode and a data quality evaluation management process.
The embodiment of the invention provides a data quality closed-loop management method, which comprises the steps of formulating a data quality monitoring and checking scheme, and monitoring and checking the data quality; formulating a data quality rule base; performing data quality control at regular time according to the data quality rule base to obtain a data quality problem; managing data quality problems; and evaluating the data quality. Under the drive of the closed-loop management, new treatment requirements are continuously generated for the data quality, and the quality problem is continuously solved, so that the data quality is continuously improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 shows a schematic diagram of a data quality closed-loop control method;
fig. 2 shows a data quality evaluation management flowchart.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
In the description and claims of the present invention and in some of the flows described in the above drawings, a plurality of operations are included in a specific order, but it should be clearly understood that these operations may be executed out of the order they appear herein or in parallel, and it should be noted that "first", "second", etc. are described herein for distinguishing different messages, devices, modules, etc. without representing a sequential order, and without limiting "first" and "second" to be different types.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
With reference to fig. 1, a data quality closed-loop control method includes:
step 1: formulating a data quality monitoring and checking scheme, and monitoring and checking the data quality; including data quality control in the traffic flow and data quality control in the information system;
the data quality control in the business process comprises data quality control in a data generation link, data quality control in a data integration link and data quality control in a data use link;
the data quality control in the information system comprises the control of data quality problems generated in personnel, processes, the front end of the service system, the database of the service system, the extraction process and the loading process in the information system.
Step 2: formulating a data quality rule base; the method comprises the following steps:
collecting data quality requirements, collecting the data quality requirements, collecting and sorting the data quality problems, data use quality problems, data process quality problems and data overall quality problems, finding the data quality overall requirements, integrating the quality overall requirements and confirming the data quality overall target;
data quality checking data carding, data quality checking object, data range and index composition checking are confirmed, the data range is carded, a core object in the data range is found, information such as data quality checking indexes, checking rules, checking modes, checking periods, checking targets, grading standards and data quality accountants is preliminarily obtained according to object standard definition and service scenes, a document is formed, and content confirmation and revision are carried out according to the content of the document;
formulating a data quality checking rule, and formulating the data quality checking rule according to the data quality measurement index;
the data quality check rule management comprises the steps of managing a public rule library, and realizing public rule multiplexing through an SQL rule, a regular rule, a value domain rule and an algorithm package;
rule configuration management, namely, according to data detection indexes, a built-in rule engine realizes quality detection;
and changes to data quality check rules.
And step 3: performing data quality control at regular time according to the data quality rule base to obtain a data quality problem; the step of performing data quality control according to the data quality rule base at regular time comprises data quality rule analysis, data quality checking frequency, data quality monitoring range formulation, data quality monitoring report generation, data quality evaluation report generation and data quality comprehensive report generation.
And 4, step 4: managing data quality problems; the data quality problem management comprises data quality circulation management, data quality problem feedback, data quality problem reason analysis, data quality problem correction and data quality problem correction.
And 5: and (5) evaluating the data quality, and returning to the step 1. And 5, evaluating the data quality including core indexes of the data quality evaluation, a data quality evaluation mode and a data quality evaluation management process.
Before the step 1, defining data quality, specifically defining key data items of data quality checking, defining checking rules, defining data quality measurement indexes, defining data quality control and monitoring modes and defining a data quality evaluation model.
Defining key data items for data quality checking, including null value checking, repeated checking, format checking, reference checking, value field checking, consistency checking and logic checking and relation checking;
and (4) checking a null value: refers to checking whether a column of data has an empty data item.
And (4) repeated checking: refers to checking whether two identical values exist for the same entity attribute.
And (3) format verification: which means checking whether the data format meets the standard.
And (5) reference checking: refers to checking whether a certain data value is present in the data.
Checking a value range: which refers to checking whether the data value meets the value range specified by the standard.
And (3) consistency checking: it means to check whether the values of the same entity in the two tables are consistent.
Logic verification: which refers to checking whether the data value meets the business logic requirements and common sense logic requirements.
And (3) relation checking: and checking whether the main foreign key association relationship of the data exists.
The data quality checking rules include, but are not limited to, SQL rules, function dependencies, dictionary rules, regularization rules, value domain rules, containment dependencies, algorithm packages, meta-rules, and structure rules.
Defining the data quality check rule includes defining a rule name, an association table, a rule type, a problem level, a rule weight, a rule state, a rule description, and a creation time.
Defining data quality measurement indexes including integrity, consistency, repeatability, correctness, compliance and relevance of data;
and (3) making a related quality detection rule according to the meaning of the quality index, wherein integrity is the content of detecting four aspects of entity missing, attribute missing, record missing and field value missing, and regarding data which does not accord with the rule, the data is considered to have an integrity problem, and the data is classified into data with data quality problems.
Integrity: the method refers to whether referential integrity exists or is consistent among data in a data warehouse, and the integrity is the content of detecting entity missing, attribute missing, record missing and field value missing.
Repeatability: refers to the measure of which data is duplicate data or which attributes of the data are duplicate.
Consistency: refers to the correctness of the table data (semantics). The purpose is to detect inconsistencies or conflicts in the data.
Standardization: which refers to whether the data is stored in a uniform format.
Correctness: which refers to whether the data is properly embodied on a verifiable data source.
Relevance: indicating which associated data is missing or not indexed.
And (3) timeliness: which refers to whether the data is valid at the required time.
Defining a data quality control model and a data quality monitoring mode, wherein the data quality control model controls a data inspection object, data inspection frequency, data inspection time and a data inspection mode; the data quality monitoring mode comprises an automatic mode data quality monitoring mode or a manual mode data quality monitoring mode;
defining data quality control and monitoring mode based on the data quality defining model, and completing the monitoring work of data quality in an automatic or manual mode according to the defined checking range and time. Data quality problems are considered as data quality problems when the definition of the data quality is violated in the quality control process, and the data quality problems are directly reflected by key characteristics and indexes of the data quality. The control content of the data quality control model is represented as follows: and controlling the aspects of data inspection objects, data inspection frequency, data inspection time, data inspection modes and the like.
(1) Data inspection object: the method is characterized in that users, professional data tables and database entities needing to be checked are set according to an acquisition plan.
(2) Data inspection frequency: the method is to set the inspection execution frequency of the storage process according to the collection plan of the data table and the actual occurrence frequency.
(3) Data inspection time: the method is characterized in that a moment for starting the execution of the inspection is comprehensively set according to the intensive time of daily production and application and the intensive time from the occurrence of data to the collection and storage.
(4) The data inspection mode is as follows: the mode of executing the inspection process can be automatically controlled by a background process, and the inspection is automatically performed once every 2 hours of inspection intervals; the inspection can also be manually performed by manual intervention, and the inspection can be performed at any time (when the database flow is selected to be low as much as possible).
And defining a data quality evaluation model, and quantitatively diagnosing and evaluating the data quality. And defining a data quality evaluation model, wherein the data quality evaluation model is controlled and executed by a data quality control model according to the data quality definition model, and key indexes of the data quality are evaluated according to a feedback quality inspection result table so as to realize quantitative diagnosis and evaluation of the data quality.
The data quality analysis evaluation model has the functional core that a background storage process which can realize the inspection and analysis is called by a control model to execute the inspection in an entity library through the processing of an acquisition plan in a basic model and a constraint rule in a quality definition model to form a query result, then an analysis program is used for analyzing, calculating, classifying and summarizing to generate a result which reflects the completion condition of the acquisition plan and data quality quantitative indexes and stores the result into an analysis result table, and the analysis result table is called from the foreground to generate a detailed data quality analysis evaluation report which reflects various quantitative indexes of data quality problems, and the data warehousing timeliness, the data reporting integrity, the data acquisition consistency, the data warehousing accuracy and the like of the evaluated entity library are shown.
In step 1, the data quality process monitoring refers to data quality verification or control measures set in the business process and the business system according to the data quality rules, so as to ensure the data quality of key data items in links of creation, processing, circulation, storage and the like.
The data quality process monitoring mainly comprises two parts of data quality control in a service flow and data quality control in an information system.
Data quality control in the business process:
the data flow process is generally divided into three phases:
in the first stage, a data generation step. The majority of government-internal raw data is generated from the service source system, and a very small amount of value-added data is generated from the analytic system.
And in the second stage, a data integration link is performed. In the basic data platform class system (ODS/data warehouse and data mart), data from different source systems are integrated and integrated according to a data model.
And in the third stage, a data use link. The system consists of various analysis type applications and also comprises information access means such as random service inquiry, data analysis, data mining and the like.
The data quality problem is mainly generated in a data generation link, and is generated basically in a data processing process in a data integration link because data is not modified in a data use link.
The discovery of data quality problems is basically of opposite characteristics, although a service source system is a main generation link of data, the service source system can only discover the data quality problems related to a service flow and is only limited in the system; the data integration link is a most main gathering point of government internal data, so the data integration link is also a link with the most exposed data quality problems; the data use link is another link frequently exposed to data quality problems, and because the use of data determines the definition of the data quality problems, many quality problems are discovered for the first time when in use.
Based on the key link and quality problem characteristics of data quality management and the best practice of implementing data quality management in the industry, the data quality control in the business process suggests the following functional points that are completed from different conversion links in an emphasized manner:
a data generation link: and (3) correction: data quality issues must be corrected at the source, which is a fundamental principle of data quality.
Prevention: the significance of prevention is greater with respect to correction, since new data quality problems can be prevented from arising.
Defining: the definition of the data quality problem mainly depends on the principle of use purpose, and the data quality problem should be mainly defined by being initiated in connection with a data use link, but is usually defined based on a data structure of a source system.
And (3) a data integration link:
and (4) checking: the basic data platform system is used as a main gathering point of all data, and the inspection of data quality problems is carried out at the main gathering point, so that the effectiveness is highest.
Reporting: for the data quality inspection result, it should be presented in a report form and notified to the relevant data quality problem responsible persons through a certain mechanism (workflow or manual process), such as a business source system project group, a business department, a data warehouse or an application project group, etc.
Tracking: because data from the service source system is loaded to the basic data platform system every day, the basic data platform system should be used to track the solution of the data quality problem, which is used as a basis for the treatment effect of the data quality problem.
And (3) a data use link:
defining: as described above, in the data usage link, the quality standard that the data should satisfy is defined according to the usage target of the data, and the quality standard is used as the input of the Service Level Agreement (SLA) between the upstream and downstream systems in the future.
Evaluation: as the end user of the data, the result of the data quality control should be evaluated in the using process and used as one of the bases for setting the data quality management target of the next stage.
Data quality control in information systems:
the most common root causes of data quality problems are: personnel, processes, business system front ends, business system databases, extraction and loading processes, which all may cause data quality problems, the first three items (personnel, processes, business system front ends) are focused on prevention, and the last three items (business system databases, extraction and loading processes) are usually solved by means of repair.
There are both beneficial and disadvantageous aspects to the prevention/repair of each type of data quality problem, such as: due to quality problems caused by personnel, the method has the advantages that prevention can be carried out at the source, and the method has the disadvantages that personnel are often careless to manage, are easy to forget, and have different differences and concentration points among different personnel, which inevitably cause certain data quality problems.
Concerning the amount of data: data quantity required to be repaired for data quality problems is roughly regular, for example, the data quantity required to be repaired for quality problems generated by personnel, processes and front-end applications is not large, and the data quality problems caused by background links such as database extraction and loading are large in data quantity. For the data quality problem which occurs already, the problem can be solved only by a repair measure, but in the long run, the prevention measure is emphasized, and the generation of control errors at the source is more important.
And (3) trend monitoring: a known data quality problem being fixed does not mean that this particular problem is always solved. Errors may still be reproduced without effective precautions. Therefore, important data quality issues should be continuously monitored.
In step 2, the data quality rule base: according to the information resource catalog, the data element standard and the data quality problem definition, a data quality check rule is formulated, a corresponding data quality check script is developed, and data quality check is performed in a business system, a data center and a data application.
Collecting data quality requirements: the data quality requirements of all departments in province are collected, the data quality overall requirements are found by collecting and sorting the conditions of quality problems, data use quality problems, data process quality problems, data overall quality problems and the like, and the data center integrates the quality requirements and confirms the data quality overall target.
Data quality checking and data combing: confirming a data quality checking object, a checking data range and index composition, combing the checking data range, finding a core object in the checking data range, preliminarily obtaining information such as a data quality checking index, a checking rule, a checking mode, a checking period, a checking target, a grading standard, a data quality responsible person and the like according to object standard definition and a service scene, forming a document, and inviting a data quality related affiliate to confirm and revise the content according to the document content.
Data quality checking rule making: under the guidance of provincial information administration department and provincial business department, corresponding data quality check rules are formulated according to data element standards by combining with government affair information resource catalogs. The content of the data quality checking rule mainly comprises integrity, normalization, consistency, accuracy, uniqueness and timeliness.
The data quality checking rule is a data quality checking basis, and all data quality dry system departments must participate in the formulation of the data quality checking rule, and the data quality checking rule is formulated and executed together.
Data quality checking rules include, but are not limited to: data source, data object, column, checking index, checking rule, checking mode, checking period, checking target, scoring standard, responsibility department, responsible person and remark.
The data quality check index comprises: completeness/consistency/uniqueness/validity/accuracy/authenticity/timeliness.
Checking rules: the number of checking rules is on average not less than 65 per department.
And (3) data quality checking rule management:
the public rule base is used for solving the problem of rule multiplexing, and in a large amount of data, a great number of repeated entity attribute fields exist, and the fields do not need to be repeatedly written. The common rule multiplexing is realized by the rules mainly through SQL rules, regular rules, value domain rules and algorithm packets (standard packets).
And (3) rule configuration management: the data quality detection indexes comprise: the indexes such as normalization, completeness, repeatability, consistency, correctness, relevance and timeliness; data quality rule configuration based on the above criteria, various rule engines are built in to help achieve relevant quality detection as shown in table 1.
TABLE 1
Figure BDA0002270765420000131
Data quality check rule change (rule change management): the data quality checking rule is managed by the data management part, and each business department is responsible for amending and editing the data quality and evaluation rules of the gate-back management information system (or data domain), and the quality rule is strictly executed from the effective date, and is not changed in principle. The following three applicable adjustment rules:
1. the overall requirements are as follows: data quality project promotion, and the existing rule does not meet the work requirement;
2. and (3) service adjustment: the rules are not applicable due to the change of the business management system, the process and the rules;
3. and (3) system adjustment: the rules are not applicable due to upgrading and reconstruction of the service system.
The data quality check rule change process is as follows:
1. change application
When the three reasons appear, the department should make a change application to the data quality responsible person 5 days before the month.
2. Change business audit
And the data quality responsible person of the management department reviews the change requirements, confirms the business rule adjustment content and submits the content to the data center.
3. Change technology auditing
And the data quality responsible person of the data center reviews the business rule adjustment content and defines the technical rule.
4. Change distribution
The data center and a person in charge of a business department review the business rules and the technical rules together and publish the changed contents 10 days before each month.
In step 3, analyzing data quality rules:
1) daily data verification
The data quality check requirement must be judged by at least one of the following three methods: record number checking method; verifying the total amount of key indexes; value range judgment method.
2) Timing data spot check
The periodic spot check must employ all the methods defined in the data quality assessment method.
3) Comprehensive data inspection
The full-scale inspection must employ all the methods defined in the data quality assessment method.
Data quality check frequency:
1) daily data verification
The ETL loading tasks are more every day, and if the time required for completely executing data verification is too long, the verification frequency is determined according to the credibility level of each subject data.
The corresponding relationship between the credibility grade and the verification frequency is as follows:
first-stage: data verification must be performed each time a load is loaded
And (2) second stage: performing data checks every third load
Third-stage: performing data checks every six loads
For subject data needing special guarantee, the checking frequency can be adjusted, and an experience auditing method is additionally added.
2) Timing data spot check
The data quality management team must organize a periodic spot check of data once a quarter.
3) Comprehensive data inspection
The data quality management team must organize a full examination of the data warehouse once a year.
And (3) establishing a data quality monitoring range:
1) daily data verification
The data quality management personnel check the execution condition of the loaded data every day.
2) Timing data spot check
The range of the periodic spot check must include all the subject data with the credibility level of one level, the data of two subjects with the credibility level of two levels, and the data of one subject with the credibility level of three levels.
3) Comprehensive data inspection
The scope of the full inspection includes data of all subjects of the data center platform.
Data quality monitoring report:
(1) the data quality inspection report is used for displaying the data quality problems found by the data quality inspection task, and the display can display detailed problem data corresponding to each rule.
(2) And the data quality management personnel should check and verify the data errors found by the data quality monitoring report in time, fill a data problem processing list according to the checking and verifying condition, and describe the current situation, reason, correction and preventive measures of the data quality problem.
(3) And after approval by the group leader of the data quality management group, reporting approval of a data center administrative department and then executing a data correction task.
And the data quality evaluation report discovers a data quality root and analyzes the influence by using a data quality scoring model and a data quality and metadata relation.
And the comprehensive report is used for summarizing department and monthly data to generate a related report, and the report shows the monthly resource condition, quality condition, grading condition, ranking condition and the like of the department.
In the step 4, in the data quality problem management, the data quality circulation management is based on the realization of all the services, a complete resource library is established to continuously improve the data quality guarantee mechanism and system for the data quality, and the correctness and authority of the data of the resource library are continuously provided through a continuous forward data quality circulation system.
The architecture flow of the data quality cycle management service system comprises the following steps: the dirty data are converted into data meeting the data quality requirement by using related technologies such as mathematical statistics, data mining or predefined cleaning rules, after the data are sorted and cleaned for the first time, the data with problems are automatically repaired through a system or returned to a service unit for manual repair, then returned for the second time of cleaning, the quality of the data is continuously improved through continuous inspection, cleaning and repairing cycles, and finally the data with the highest quality is formed.
The circulating system is realized by the following steps: problem data work order circulation and application data feedback circulation.
The problem data work order circulation system is based on a problem data work order flow system for improving data quality, separates various data from problem data in a centralized mode, forms various problem data work orders according to data sources, returns to each business agency, is repaired through each agency, enters the city basic information resource base through a front-end processor of the data exchange platform again, performs second data circulation, and so on, performs progressive operation, and continuously improves data quality.
Applying a data feedback loop system: the method is characterized in that data mining and analysis are carried out through basic data of population resources, data entering each application system through various service interfaces form new data, change of data values and the like in the using process of each application service, the new data, the changed data and the like pass through the service system of each commission bureau again, a data exchange platform front-end processor enters a resource library and is subjected to data combing and cleaning again, new problem data which cannot be automatically repaired by the system also form a problem worksheet, a task worksheet is launched by an information center of Wuhan city, the data are processed and repaired by each commission bureau, then the data are returned to the information center of Wuhan city to be audited, and are converged to a population resource library, and high-quality data are served for application again. By analogy, the data quality is continuously improved and the perfection of data service and application service is ensured through the circulation of the application work order.
Meanwhile, the two major circulation systems are mutually supported, mutually supplemented, cross-operated and flexibly complemented systems, so that a data quality circulation service and quality guarantee system of complete population base resources is formed.
Data quality problem feedback: the data management department detects various information data, reports the problem data, and reports the problem data to the data management department, and the related data responsibility department reports the problem data modification to the data management department for specific guidance or consults a modification strategy, and tracks and manages the issued data quality monitoring report, and supervises the implementation situation of each data responsibility department, and each department carries out targeted implementation on the problem data according to opinions, feeds back the implementation situation of each data layer by layer, and finally, the data management center sorts and evaluates the implementation feedback summary situation of each department.
And analyzing reasons of the data quality problems, analyzing the data in the data quality problem list, analyzing the solution time limit, the problem reasons and the like of the data quality problems, and facilitating business personnel to correct the problem data.
The data quality problem can be divided into four problem domains of information, technology, flow and management according to the source and the specific reason of the problem. The information-like problem is a data quality problem due to the descriptive understanding of the data itself and the deviation of its metrics. The reasons for this part of the data quality problem are mainly: metadata description and understanding errors, various properties of data metrics are not guaranteed, and frequency of changes is not appropriate, etc.
The metadata description and understanding related metadata in the error mainly include:
service metadata-mainly includes service description, service rules, service terms, service index aperture, etc.
Technical metadata-mainly includes the contents of interface specification, execution order, dependency relationship, ETL conversion, data modeling and tools, etc.
Data metrics and frequency of change provide a means to measure the quality of data. Data metrics mainly include integrity, repeatability, consistency, correctness and compliance. The change frequency mainly comprises a change period of the service system data and a refresh period of the entity data.
Technical problem domain: the technical problem refers to the data quality problem caused by the abnormality of each technical link of specific data processing, and the direct reason for the technical problem is a certain defect in technical implementation. The generation links of the data quality problem mainly comprise the contents in the aspects of data creation, data acquisition, data transmission, data loading, data use, data maintenance and the like:
1. the data creation quality problems mainly comprise that call tickets of a service system are delayed to enter a base, the default value of created data is not used properly, and the verification rule of data entry is not used properly, so that the index statistical result is inconsistent, the data is invalid, the records are repeated, and the like.
2. The data acquisition quality problems mainly comprise incorrect acquisition points, incorrect data acquisition time points and distortion of interface data in the acquisition process. For example, the transcoding process is incorrect and has insufficient precision, which results in inconsistent index statistics, invalid data, and the like.
3. The data transmission quality problems mainly include low interface data timeliness rate, interface data missing transmission, unreliable network transmission process, such as packet loss, file transmission mode error, transmission technical problems, data incompleteness caused by improper protocol use and the like.
4. The data loading quality problem mainly comprises data cleaning algorithm, data conversion algorithm and data loading algorithm errors.
5. The data use quality problems mainly comprise wrong use of the display tool, unreasonable display mode and unreasonable display period.
6. Data maintenance quality problems mainly include data backup/recovery errors, limited data storage capacity, lack of verification mechanisms in the maintenance process, and artificial background adjustment of data.
A flow problem domain: the flow problem refers to the data quality problem caused by improper setting of the system operation flow and the manual operation flow, and mainly comes from the links of the creation flow, the transmission flow, the loading flow, the use flow, the maintenance flow, the audit flow and the like of the operation and analysis system data:
1. the problem of creating process quality mainly means that an operator lacks an auditing process when data is recorded;
2. the problem of the quality of the transmission flow mainly refers to unsmooth communication flow;
3. the loading process quality problems mainly refer to lack/improper cleaning process, scheduling process logic errors, data loading process logic errors and data conversion process logic errors;
4. the quality problem of the use process mainly means that the data use process lacks process management;
5. the quality problems of the maintenance process mainly refer to lack of a change maintenance process, a wrong data maintenance process, a data testing process and process monitoring which is not strict for manual background adjustment data;
6. the quality problem of the audit process mainly refers to the lack of a data error feedback process.
Managing the problem domain: the management problem refers to a data quality problem caused by the personnel quality and the management mechanism, such as management loss caused by improper measures in personnel management, training, rewarding and the like.
The quality problems generated by personnel management mainly refer to:
(1) aiming at the data quality problem, no special mechanism for managing the data quality is established, and no special person is responsible after the data quality problem occurs;
(2) there is no explicit data quality target;
(3) the priority of data quality issues is not sufficient;
(4) organizations lack management methods to manage data quality, etc.;
quality problems resulting from personnel training are mainly the lack of long-term training programs for personnel associated with data quality.
And (3) correcting data quality problems: and a workflow technology is utilized to carry out work order flow and problem processing on the data quality problem. The method comprises the following steps: tracking the data quality problem, and tracking and managing the process of correcting the data quality problem;
data quality problem statistical analysis: and the data quality problem correction process is subjected to statistical analysis, so that a user can comprehensively know the complete appearance of the work order processed by the user, and the statistical analysis is displayed by using a visualization technology.
Data quality problem rectification: the data quality problem rectification scheme can be roughly rectified through two schemes, namely a data quality 'slow' cycle and a data quality 'fast' cycle, wherein the two cycles form closed-loop management of the data quality, and the data quality is continuously improved. These two schemes respectively include the following:
the 'slow' cycle of data quality is that once data conflict is detected, data problem detail and problem summary reports can be automatically generated to form a data conflict worksheet, a data cycle mechanism is triggered, worksheet distribution is performed for a data providing unit, a data using unit and a plurality of source data authoritative responsibility units, information contents related to an objection are confirmed or corrected through online worksheet responses of the data providing unit and the data authoritative responsibility units, if the confirmed or corrected data still have data conflict with the existing data of a legal bank, a second data conflict worksheet needs to be generated for new conflicting data, the second worksheet distribution is performed, the steps are repeated until the conflict is resolved, and finally, the data conflict processing process and results are filed and documented. The data quality problem rectification within the 'slow' loop can be done in a number of ways, such as: the method comprises the steps of business process optimization, source system transformation, a data management mechanism, data quality control and the like.
The 'fast' circulation of data quality is mainly realized by establishing a data quality cleaning and additional recording scheme through a technical means and automatically repairing, complementing, converting, merging and the like the problem data.
And (5) evaluating the data quality: and the rectification result evaluation means that the rectified data quality is evaluated according to the rectification range of the data quality problem so as to evaluate the rectification effect. In order to master the data quality problem rectification effect, a corresponding evaluation scheme is required to be formulated by combining the target and the range of data quality rectification, and an evaluation report is generated.
If the data quality evaluation result after the correction cannot meet the service requirement, further analyzing the problem root cause, and making and implementing a new data quality correction scheme.
The data quality evaluation introduces a data quality evaluation method from three aspects of a data quality evaluation core index, a data quality evaluation mode and a data quality evaluation management process.
Core indexes of data quality evaluation:
data quality scoring
And (3) index definition: data quality score (total data quality problem data/total data stored 100) indicator unit: is divided into
Data quality evaluation mode:
the main data and the historical behaviors thereof in each data warehouse are divided into three levels according to the data quality evaluation indexes as shown in table 2:
TABLE 2
Figure BDA0002270765420000211
By evaluating and grading the frequency of data quality problems, the data can be liberated from a plurality of data, and limited resources are concentrated on theme data which needs to be focused. The credible grade of data quality is an effective way for improving the data quality.
Data quality evaluation management process:
the data quality assessment process is a series of steps that apply a data quality assessment tool to the target data or data set and ultimately obtain the quality status of the assessment target.
The scientific data quality evaluation comprises the following general processes as shown in figure 2, including data quality demand analysis, determination of evaluation objects and ranges, selection of data quality dimensions and evaluation indexes, determination of quality measure and evaluation methods thereof, evaluation by using methods, result analysis and evaluation, and quality results and reports. The data quality evaluation process is an iterative process, the sequence of each process only expresses the approximate sequence of active stages, and some processes may need to be repeatedly executed according to the quality of the actual execution condition.
The data quality evaluation process is an iterative process, the sequence of each process only expresses the approximate sequence of active stages, and some processes may need to be repeatedly executed according to the quality of the actual execution condition.
Data quality requirement analysis, wherein the data requirement is the insufficient feeling and the sufficient feeling of data generated by people in the process of each practical activity for solving the problems. The data resources are different from the entity products and have the characteristics of personalized use, diversification, instability and the like, so that a targeted evaluation index system can be established only by knowing the demand characteristics of a user for specific data resources.
Determining an evaluation object and a range, and determining an evaluation object and a range thereof, wherein the evaluation object can be a data item or a data set;
selecting a data quality dimension and an evaluation index, wherein the data quality dimension is a specific quality reflection of an object in quality activities, such as correctness, accuracy and the like, and is a main content for controlling and evaluating the data quality, so that firstly, determining which factors influencing the quality dimension, such as personnel quality, equipment, facilities and the like, need to respectively explain the quality influencing factors in an evaluation report if necessary. For some factors influencing multiple quality dimensions, the influencing factors are further refined according to needs under specific conditions, or the influencing factors in the quality behavior are determined aiming at further refining target links. In addition, measurable and available quality dimensions are selected as evaluation index criterion items, and in different data types and different data production stages, the same quality dimension has different specific meanings and contents, and the quality dimension is determined according to actual needs and life stages.
At this stage, attention is paid to avoiding conflict among indexes, and attention is paid to the problems of hierarchy, weight and conflict with other indexes in the same hierarchy of newly added evaluation indexes. The selection of the three-level evaluation index can be quantified according to the type of an evaluation object and an evaluation requirement, and a metering evaluation method can be carried out if necessary. The surrogate markers with correlations can be used appropriately with quality dimensions that cannot be quantified with current technology conditions.
The method comprises the steps of determining quality measure and an evaluation method thereof, determining the measure and an implementation method thereof according to the characteristics of each evaluation object after the object range of the data quality evaluation is determined, wherein different measures exist for different evaluation objects generally and different implementation methods are required to support, so the measure and the implementation method thereof are determined according to the characteristics of the quality objects. The former is carried out by a method of weight scoring and the like, and the latter is carried out according to quality specifications of each stage of information production and a first-level defect criterion.
And evaluating by using a method, and realizing the activity process of quality evaluation according to the quality object, the quality range, the measurement and the realization method determined in the previous four steps. The quality of the evaluation object should be reflected by the evaluation of multiple quality dimensions and three-level evaluation indexes, and a single data quality measure is a quality condition that cannot sufficiently and objectively evaluate information defined by a certain data quality range, and cannot provide a comprehensive reference for all possible applications of the data set. The combination of multiple quality dimensions and three-level evaluation indexes can provide more abundant information, so that comprehensive measurement of multiple quality dimensions and three-level evaluation indexes is provided for information limited by a certain data quality range.
In the data quality evaluation process, the correctness and the objectivity of the adopted method are ensured, the interference factors of quality evaluation are avoided to the greatest extent, the method is realized by means of automatic processing of a computer and a network technology to the greatest extent, and the real situation of comprehensively and objectively reflecting the data quality is pursued. In particular, for quantitative quality dimension, scientific quantitative measurement indexes and methods are determined, and the quality measurement should ensure the correctness and completeness of data boundary ranges, system parameters and the like involved in the quality measurement.
Analyzing and evaluating results, wherein the evaluation results are analyzed after evaluation:
comparing and analyzing the evaluation target and the result to determine whether the evaluation target and the result reach evaluation indexes;
analyzing the effectiveness of the evaluated protocol, confirming whether it is appropriate, and the like. Thereafter, a quality evaluation of the subject is determined based on the evaluation result, and a quality level can be identified based on the evaluation result if necessary. The quality grade of the evaluation object is determined on the basis of a corresponding quality grading scheme, the grading scheme is determined according to corresponding quality specifications or requirements of users, and the grading scheme is also an important basis for judging the maturity of the data quality.
And the quality results and reports, and the quality evaluation results and the evaluation reports are a collection of all scientific data quality evaluation items and evaluation results thereof.
All of the above should be included in the complete data quality assessment results and reports. In addition, the operation of the evaluation process performed according to the above should be completely recorded in the data quality evaluation report, and the content determination including the existing quality level should be performed.
The specific method for evaluating the data quality comprises the following steps:
and for the quality inspection mode of specific data, a record number inspection method, a key index total amount verification method, a historical data comparison method, a value range judgment method, an experience audit method and a matching judgment method are adopted. By the methods, the data accuracy of a single data point can be checked, and the data quality problem can be found in time.
(1) Method for checking recorded data
The data condition is generally verified by comparing the number of records. It is mainly checked whether the number of records of the data table is a certain value or within a certain range.
The application range is as follows:
for data which is loaded in a data table in increments according to dates, when the number of records which are incremented in each loading period is a constant value or a range which can be determined, the number of records must be checked.
(2) Method for verifying total amount of key indexes
And comparing whether the total data amount is consistent or not for the key indexes. Mainly refers to the examination of summary logic with the same business meaning and statistics from different dimensions.
The application range is as follows:
and counting the same field from different dimensions in the same table, and when a summary relation exists, carrying out total quantity test.
The fields of the table have the same business meanings as the fields in other tables, the statistics from different dimensions exist a summary relationship, and the data of the two tables are not processed by the same data source. When this condition is satisfied, a total amount check must be performed.
(3) Historical data comparison method
And observing the data change rule through historical data so as to verify the data quality. Judgment is usually made at a comparable rate of development. During evaluation, data with larger increase (or decrease) of the same-proportion development speed is mainly examined according to the development characteristics of various indexes. The historical data comparison method comprises a same comparison mode and a ring comparison mode.
The application range is as follows:
the record number checking method and the key index total amount verification method cannot be carried out, and the historical data comparison method is required when the record number of the fact table is less than 1000 ten thousand.
(4) Value range judging method
And determining a reasonable variation interval of the index data in a certain period, and performing key auditing on data outside the interval. Wherein the reasonable variation range of the data is directly determined according to business experience.
The application range is as follows:
fields in the fact table may determine a range of values, while data that is not within this range may be determined to be necessarily erroneous. A value range determination method is necessary to satisfy this condition.
(5) Experience auditing method
Aiming at the situation that the logical relationship between indexes in the report can not be confirmed and quantified only by the auditing of a computer program, or the situation that the number limit is set by some audits, but the limit is wider and is not easy to judge, manual experience auditing is needed to be added.
The application range is as follows:
where none of the above methods are applicable, empirical review methods may be used.
(6) Matching judgment method
And comparing and verifying the data with the related data provided or issued by the related department.
The application range is as follows:
matching judgment methods may be used in accordance with the calibers of the relevant data provided or distributed by the relevant departments.
The invention provides a data quality closed-loop management method, which comprises the steps of formulating a data quality monitoring and checking scheme, and monitoring and checking the data quality; formulating a data quality rule base; performing data quality control at regular time according to the data quality rule base to obtain a data quality problem; managing data quality problems; and evaluating the data quality. Under the drive of the closed-loop management, new treatment requirements are continuously generated for the data quality, and the quality problem is continuously solved, so that the data quality is continuously improved.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by hardware that is instructed to implement by a program, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
While the data analysis method and system provided by the present invention have been described in detail, those skilled in the art will appreciate that the present invention is not limited to the above embodiments, and that various modifications, additions, substitutions, and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

Claims (9)

1. A method of closed-loop control of data quality, comprising:
step 1: formulating a data quality monitoring and checking scheme, and monitoring and checking the data quality;
step 2: formulating a data quality rule base;
and step 3: performing data quality control at regular time according to the data quality rule base to obtain a data quality problem;
and 4, step 4: managing data quality problems;
and 5: and (5) evaluating the data quality, and returning to the step 1.
2. The closed-loop control method for data quality according to claim 1, further comprising defining data quality before step 1, specifically including defining key data items for data quality check, defining check rules, defining data quality measurement indicators, defining data quality control and monitoring modes, and defining data quality evaluation models.
3. The data quality closed-loop control method of claim 2, wherein defining data quality check key data items comprises null value check, repetition check, format check, reference check, value range check, consistency check, logical check and relation check;
defining a data quality checking rule comprises defining a rule name, an association table, a rule type, a problem grade, a rule weight, a rule state, a rule description and creation time;
defining data quality measurement indexes including data integrity, consistency, repeatability, correctness, compliance, relevance and timeliness;
defining a data quality control model and a data quality monitoring mode, wherein the data quality control model controls a data inspection object, data inspection frequency, data inspection time and a data inspection mode; the data quality monitoring mode comprises an automatic mode data quality monitoring mode or a manual mode data quality monitoring mode;
and defining a data quality evaluation model, and quantitatively diagnosing and evaluating the data quality.
4. The closed-loop control method for data quality according to claim 3, wherein the data quality monitoring and checking scheme established in step 1 comprises data quality control in a service process and data quality control in an information system;
the data quality control in the business process comprises data quality control in a data generation link, data quality control in a data integration link and data quality control in a data use link;
the data quality control in the information system comprises the control of data quality problems generated in personnel, processes, the front end of the service system, the database of the service system, the extraction process and the loading process in the information system.
5. The closed-loop control method for data quality according to claim 4, wherein in step 2, the formulating the data quality rule base comprises:
collecting data quality requirements, collecting the data quality requirements, collecting and sorting the data quality problems, data use quality problems, data process quality problems and data overall quality problems, finding the data quality overall requirements, integrating the quality overall requirements and confirming the data quality overall target;
data quality checking data carding, data quality checking object, data range and index composition checking are confirmed, the data range is carded, a core object in the data range is found, information such as data quality checking indexes, checking rules, checking modes, checking periods, checking targets, grading standards and data quality accountants is preliminarily obtained according to object standard definition and service scenes, a document is formed, and content confirmation and revision are carried out according to the content of the document;
formulating a data quality checking rule, and formulating the data quality checking rule according to the data quality measurement index;
the data quality check rule management comprises the steps of managing a public rule library, and realizing public rule multiplexing through an SQL rule, a regular rule, a value domain rule and an algorithm package;
rule configuration management, namely, according to data detection index indexes, a built-in rule engine realizes quality detection;
and changes to data quality check rules.
6. The data quality closed-loop control method according to claim 5, wherein in step 3, the performing of data quality control according to the data quality rule base at regular time includes data quality rule analysis, data quality checking frequency, data quality monitoring range formulation, data quality monitoring report generation, data quality evaluation report generation, and data quality comprehensive report generation.
7. The closed-loop control method for data quality as claimed in claim 6, wherein in step 4, the data quality problem management comprises data quality cycle management, data quality problem feedback, data quality problem cause analysis, data quality problem correction, and data quality problem correction.
8. The closed-loop control method for data quality as claimed in claim 7, wherein the data quality evaluation in step 5 comprises evaluation from core indicators of data quality evaluation, data quality evaluation mode and data quality evaluation management process.
9. The data quality closed-loop control method of claim 8, wherein the data quality evaluation management process comprises data quality demand analysis, determination of evaluation objects and ranges, selection of data quality dimensions and evaluation indexes, determination of quality metrics and evaluation methods thereof, evaluation by application methods, result analysis and evaluation, quality results and reports.
CN201911104166.7A 2019-11-13 2019-11-13 Data quality closed-loop control method Pending CN111143334A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911104166.7A CN111143334A (en) 2019-11-13 2019-11-13 Data quality closed-loop control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911104166.7A CN111143334A (en) 2019-11-13 2019-11-13 Data quality closed-loop control method

Publications (1)

Publication Number Publication Date
CN111143334A true CN111143334A (en) 2020-05-12

Family

ID=70517052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911104166.7A Pending CN111143334A (en) 2019-11-13 2019-11-13 Data quality closed-loop control method

Country Status (1)

Country Link
CN (1) CN111143334A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949642A (en) * 2020-08-13 2020-11-17 中国工商银行股份有限公司 Data quality control method and device
CN112000656A (en) * 2020-09-01 2020-11-27 北京天源迪科信息技术有限公司 Intelligent data cleaning method and device based on metadata
CN112396343A (en) * 2020-11-30 2021-02-23 北京中电普华信息技术有限公司 Data quality checking method and device
CN112506903A (en) * 2020-12-02 2021-03-16 苏州龙石信息科技有限公司 Data quality representation method using sample line
CN112667622A (en) * 2021-01-07 2021-04-16 吉林银行股份有限公司 Method and system for checking quality of service data
CN112685401A (en) * 2021-01-22 2021-04-20 浪潮云信息技术股份公司 Data quality detection system and method
CN112766676A (en) * 2021-01-08 2021-05-07 深圳市酷开网络科技股份有限公司 Closed-loop data quality control method and device, terminal equipment and storage medium
CN113239126A (en) * 2021-05-11 2021-08-10 中国银行保险信息技术管理有限公司 Business activity information standardization scheme based on BOR method
CN113516375A (en) * 2021-06-21 2021-10-19 苏州长城开发科技有限公司 Maturity evaluation system and method for black light factory
CN113641399A (en) * 2021-08-10 2021-11-12 上海浦东发展银行股份有限公司 Configuration data processing system, method, electronic device, and storage medium
CN113762735A (en) * 2021-08-18 2021-12-07 江苏电力信息技术有限公司 Data quality management system and method based on rule base
WO2022187224A1 (en) * 2021-03-01 2022-09-09 Ab Initio Technology Llc Generation and execution of processing workflows for correcting data quality issues in data sets
CN115296905A (en) * 2022-08-04 2022-11-04 新疆品宣生物科技有限责任公司 Data acquisition and analysis method and system based on mobile terminal
CN117951128A (en) * 2024-01-31 2024-04-30 江苏思行达信息技术股份有限公司 Data quality inspection method based on artificial intelligence
CN118113689A (en) * 2023-12-26 2024-05-31 北京宇信科技集团股份有限公司 Data quality analysis method and system
US20240264986A1 (en) * 2023-01-18 2024-08-08 Google Llc Automated, In-Context Data Quality Annotations for Data Analytics Visualization

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708149A (en) * 2012-04-01 2012-10-03 河海大学 Data quality management method and system
CN106682179A (en) * 2016-12-29 2017-05-17 深圳市华傲数据技术有限公司 Data quality testing method and data quality testing device
CN106709026A (en) * 2016-12-28 2017-05-24 深圳市华傲数据技术有限公司 Data processing method and data processing system
CN107545349A (en) * 2016-06-28 2018-01-05 国网天津市电力公司 A kind of Data Quality Analysis evaluation model towards electric power big data
CN110147966A (en) * 2019-05-28 2019-08-20 国网经济技术研究院有限公司 Enterprise operation data quality management method
CN110399363A (en) * 2019-06-25 2019-11-01 云南电网有限责任公司玉溪供电局 A kind of problem data Life cycle data quality management method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708149A (en) * 2012-04-01 2012-10-03 河海大学 Data quality management method and system
CN107545349A (en) * 2016-06-28 2018-01-05 国网天津市电力公司 A kind of Data Quality Analysis evaluation model towards electric power big data
CN106709026A (en) * 2016-12-28 2017-05-24 深圳市华傲数据技术有限公司 Data processing method and data processing system
CN106682179A (en) * 2016-12-29 2017-05-17 深圳市华傲数据技术有限公司 Data quality testing method and data quality testing device
CN110147966A (en) * 2019-05-28 2019-08-20 国网经济技术研究院有限公司 Enterprise operation data quality management method
CN110399363A (en) * 2019-06-25 2019-11-01 云南电网有限责任公司玉溪供电局 A kind of problem data Life cycle data quality management method and system

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949642A (en) * 2020-08-13 2020-11-17 中国工商银行股份有限公司 Data quality control method and device
CN111949642B (en) * 2020-08-13 2024-07-09 中国工商银行股份有限公司 Method and device for data quality control
CN112000656A (en) * 2020-09-01 2020-11-27 北京天源迪科信息技术有限公司 Intelligent data cleaning method and device based on metadata
CN112396343A (en) * 2020-11-30 2021-02-23 北京中电普华信息技术有限公司 Data quality checking method and device
CN112506903A (en) * 2020-12-02 2021-03-16 苏州龙石信息科技有限公司 Data quality representation method using sample line
CN112506903B (en) * 2020-12-02 2024-02-23 苏州龙石信息科技有限公司 Data quality representation method using specimen line
CN112667622A (en) * 2021-01-07 2021-04-16 吉林银行股份有限公司 Method and system for checking quality of service data
CN112766676A (en) * 2021-01-08 2021-05-07 深圳市酷开网络科技股份有限公司 Closed-loop data quality control method and device, terminal equipment and storage medium
CN112685401A (en) * 2021-01-22 2021-04-20 浪潮云信息技术股份公司 Data quality detection system and method
WO2022187224A1 (en) * 2021-03-01 2022-09-09 Ab Initio Technology Llc Generation and execution of processing workflows for correcting data quality issues in data sets
CN113239126A (en) * 2021-05-11 2021-08-10 中国银行保险信息技术管理有限公司 Business activity information standardization scheme based on BOR method
CN113516375A (en) * 2021-06-21 2021-10-19 苏州长城开发科技有限公司 Maturity evaluation system and method for black light factory
CN113641399A (en) * 2021-08-10 2021-11-12 上海浦东发展银行股份有限公司 Configuration data processing system, method, electronic device, and storage medium
CN113641399B (en) * 2021-08-10 2024-04-09 上海浦东发展银行股份有限公司 Configuration data processing system, method, electronic device and storage medium
CN113762735A (en) * 2021-08-18 2021-12-07 江苏电力信息技术有限公司 Data quality management system and method based on rule base
CN115296905B (en) * 2022-08-04 2024-06-04 新疆品宣生物科技有限责任公司 Data acquisition and analysis method and system based on mobile terminal
CN115296905A (en) * 2022-08-04 2022-11-04 新疆品宣生物科技有限责任公司 Data acquisition and analysis method and system based on mobile terminal
US20240264986A1 (en) * 2023-01-18 2024-08-08 Google Llc Automated, In-Context Data Quality Annotations for Data Analytics Visualization
CN118113689A (en) * 2023-12-26 2024-05-31 北京宇信科技集团股份有限公司 Data quality analysis method and system
CN117951128A (en) * 2024-01-31 2024-04-30 江苏思行达信息技术股份有限公司 Data quality inspection method based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN111143334A (en) Data quality closed-loop control method
CN110457294B (en) Data processing method and device
Casati et al. A generic solution for warehousing business process data
CN112396404A (en) Data center system
CN104899143B (en) The software peer review system implementation device of data mining is provided
CN110728422A (en) Building information model, method, device and settlement system for construction project
CN115374329B (en) Method and system for managing enterprise business metadata and technical metadata
CN111078766A (en) Data warehouse model construction system and method based on multidimensional theory
Batini et al. A Framework And A Methodology For Data Quality Assessment And Monitoring.
CN114880405A (en) Data lake-based data processing method and system
Izquierdo-Cortazar et al. Towards automated quality models for software development communities: The QualOSS and FLOSSMetrics case
CN116701358B (en) Data processing method and system
CN115952224A (en) Heterogeneous report integration method, equipment and medium
CN113297146A (en) Processing model and method for local supervision submission data
Pham Thi et al. Discovering dynamic integrity rules with a rules-based tool for data quality analyzing
Pau et al. Data warehouse model for audit trail analysis in workflows
CN112330182A (en) Quantitative analysis method and device for economic operation condition
Piprani Using orm-based models as a foundation for a data quality firewall in an advanced generation data warehouse
Aunola Data quality in data warehouses
CN118379152B (en) Financial verification method, equipment and medium of ERP system
CN115423379B (en) Confidence evaluation method, system, terminal and storage medium based on traceability information
CN117829121B (en) Data processing method, device, electronic equipment and medium
Munawar Extract Transform Loading (ETL) Based Data Quality for Data Warehouse Development
Dinkelmann et al. Enhancing Survey Quality: Continuous Data Processing Systems
Azeroual et al. Without Data Quality, There Is No Data Migration. Big Data Cogn. Comput. 2021, 5, 24

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200512