CN112348395A - Data quality visual detection system and method - Google Patents

Data quality visual detection system and method Download PDF

Info

Publication number
CN112348395A
CN112348395A CN202011313027.8A CN202011313027A CN112348395A CN 112348395 A CN112348395 A CN 112348395A CN 202011313027 A CN202011313027 A CN 202011313027A CN 112348395 A CN112348395 A CN 112348395A
Authority
CN
China
Prior art keywords
detection
information
module
metadata
detection rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011313027.8A
Other languages
Chinese (zh)
Inventor
卢凯杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Baiying Technology Co Ltd
Original Assignee
Zhejiang Baiying Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Baiying Technology Co Ltd filed Critical Zhejiang Baiying Technology Co Ltd
Priority to CN202011313027.8A priority Critical patent/CN112348395A/en
Publication of CN112348395A publication Critical patent/CN112348395A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Library & Information Science (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a data quality detection system and a method, wherein the detection system comprises a detection rule configuration module, a metadata base, a detection engine, a big data module, an alarm module and a workflow module; the detection method comprises the steps of analyzing task information, extracting a detection rule corresponding to the task information, inquiring required detection rule metadata information, inquiring real data information, outputting a detection result through comparison, and determining whether to send a warning according to the detection result. The invention realizes the visual data quality detection rule configuration, abandons the script detection mode and greatly improves the data quality detection efficiency.

Description

Data quality visual detection system and method
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a data quality visual detection system and method.
Background
The data quality refers to the quality of existing data of an enterprise evaluated by some industry standards, and the quality problem of the data of the enterprise is detected by evaluating the integrity, accuracy, timeliness and consistency of the existing data. The internet is a service industry established on data, and the viability and competitiveness of an enterprise are directly influenced by the quality of the data. If the data quality is not good, the following hazards are easily brought to enterprises:
1. interference operation analysis and influence decision;
2. the quality of an algorithm model is influenced, so that the service is not intelligent enough;
3. manpower is consumed, and analysts, algorithm engineers and data scientists are reluctant to work because of data quality problems.
Therefore, how to improve the quality of enterprise data becomes a serious problem to be solved by enterprises. At present, in the industry, data quality detection is generally performed manually to create scripts such as python and shell, and quality detection of existing data is realized by writing detection logic into the scripts. According to the method, firstly, scripts need to be written, the development difficulty and complexity are increased, secondly, when data are detected to be a point which is difficult to control, if the data are detected in advance, the data are possibly not output, if the data are detected too late, a timely alarm function cannot be realized, and the evaluation standard of timeliness is lost. In view of the above problems in the prior art, no effective solution has been found.
Disclosure of Invention
In view of this, the present invention provides a data quality visualization detection system and method.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
the invention relates to a data quality detection system, which comprises a detection rule configuration module, a metadata base, a detection engine, a big data module, an alarm module and a workflow module, wherein the detection rule configuration module is connected with the metadata base and used for storing detection rule meta-information of data into the metadata base, the detection engine and the workflow module trigger the detection engine after the workflow module finishes running, the detection engine is connected with the metadata base and can acquire detection rule information from the metadata base according to task information, the detection engine and the big data module are connected and acquire data information from the big data module to be matched with the detection rule information, and the detection engine and the alarm module are connected and used for giving an alarm according to a detection result.
Preferably, the detection rule configuration module comprises a WEB front end and a background server end; the background server side comprises a control layer, a business layer and a database layer, wherein the control layer is used for receiving a WEB front end http request and outputting a response to the WEB front end, the business layer is used for executing the request, and the database layer is used for inquiring, storing, updating and deleting detection rule metadata.
A data quality detection method comprises the following steps:
(1) receiving task information sent by a workflow module, analyzing the task information, and extracting a detection rule corresponding to the task information;
(2) inquiring required detection rule metadata information in a metadata base;
(3) inquiring real data information from the big data module according to the metadata information of the detection rule;
(4) outputting a detection result by comparing the detection rule information with the real data information;
(5) and determining whether to send warning information to the warning module for warning according to the detection result.
Preferably, the detection rule metadata information is artificially configured and stored in the metadata database through the detection rule configuration module.
Preferably, the human configuration comprises the following steps:
(2.1) the WEB front end sends a request;
(2.2) receiving an http request of the WEB front end, and outputting a response to the WEB front end;
(2.3) executing the http request of the WEB front end;
and (2.4) sending the metadata information of the requested detection rule to a metadata base.
Compared with the prior art, the technical scheme provided by the invention has the following beneficial effects:
the method includes the steps that rule metadata information of data needing to be detected is configured; secondly, acquiring data through configured rule meta-information and performing rule matching; and finally, whether to alarm or not is carried out according to the rule matching result. The invention realizes the visual data quality detection rule configuration, abandons the script detection mode and greatly improves the data quality detection efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram of the system of the present invention;
FIG. 2 is a block diagram of a detection rule configuration module according to the present invention;
description of the labels in the schematic:
1-detection rule configuration module; 2-a metadata database; 3-a detection engine; 4-big data module; 5-an alarm module; 6-a workflow module; 11-WEB front end; 12-background server; 121-a control layer; 122-a service layer; 123-database tier.
Detailed Description
For further understanding of the present invention, the present invention will be described in detail with reference to examples, which are provided for illustration of the present invention but are not intended to limit the scope of the present invention.
Example 1
Referring to fig. 1, the embodiment relates to a data quality detection system, which includes a detection rule configuration module 1, a metadata database 2, a detection engine 3, a big data module 4, an alarm module 5, and a workflow module 6, where the detection rule configuration module 1 is connected to the metadata database 2 and is used to store the meta information of the detection rule of data in the metadata database 2, the detection engine 3 and the workflow module 6 trigger the rule detection engine 3 after the workflow module 6 completes its operation, the detection engine 3 is connected to the metadata database 2 and can obtain the detection rule information from the metadata database 2 according to the task information, the detection engine 3 is connected to the big data module 4 and obtains the data information from the big data module 4 to match the detection rule information, and the detection engine 3 is connected to the alarm module 5 and is used to send an alarm according to the detection result.
Referring to fig. 2, the detection rule configuration module 1 includes a WEB front end 11 and a background server 12; the background server 1 includes a control layer 121 for receiving an http request from the WEB front end 1 and outputting a response to the WEB front end 11, a service layer 122 for executing the request, and a database layer 123 for querying, storing, updating, and deleting detection rule metadata.
Example 2
Referring to fig. 1, the present embodiment relates to a data quality detection method, which includes the following steps:
(1) receiving task information sent by a workflow module, analyzing the task information, and extracting a detection rule corresponding to the task information;
(2) inquiring required detection rule metadata information in a metadata base;
(3) inquiring real data information from the big data module according to the metadata information of the detection rule;
(4) outputting a detection result by comparing the detection rule information with the real data information;
(5) and determining whether to send warning information to the warning module for warning according to the detection result.
Referring to fig. 2, the detection rule metadata information is manually configured and stored in the metadata database through a detection rule configuration module, and the manual configuration comprises the following steps:
(2.1) the WEB front end sends a request;
(2.2) receiving an http request of the WEB front end, and outputting a response to the WEB front end;
(2.3) executing the http request of the WEB front end;
and (2.4) sending the metadata information of the requested detection rule to a metadata base.
The invention adopts a web mode to visually configure the detection rules and the tables and the fields needing to be detected in the whole process, and for example, the detection rules of table size, table row number, field uniqueness, field non-null, field dispersion and the like can be configured to detect the accuracy of data. The data quality detection system can be communicated with the workflow through an exposed interface, and once the task running of the workflow is completed, the data quality detection operation can be triggered. Secondly, a timing time can be configured for the detection tasks, and the data quality detection operation can be performed in a timing mode. The two operation modes can scan the data table and the field information according to the configured detection rule, and if the detection result does not reach the configured expected value, the alarm notification can be timely carried out. The invention greatly reduces the difficulty of data quality detection, abandons the complex script writing work, can lead people to focus more on services, and does not need to care about the complex data quality detection work.
The present invention and its embodiments have been described above schematically, without limitation, and the embodiments of the present invention are shown in the drawings, and the actual structures are not limited thereto. Therefore, those skilled in the art should understand that they can easily and effectively design and modify the structure and embodiments of the present invention without departing from the spirit and scope of the present invention.

Claims (5)

1. A data quality detection system is characterized by comprising a detection rule configuration module, a metadata base, a detection engine, a big data module, an alarm module and a workflow module, wherein the detection rule configuration module is connected with the metadata base and used for storing detection rule meta-information of data into the metadata base, the detection engine and the workflow module trigger the detection engine after the workflow module finishes running, the detection engine is connected with the metadata base and can acquire detection rule information from the metadata base according to task information, the detection engine and the big data module are connected and acquire data information from the big data module to be matched with the detection rule information, and the detection engine and the alarm module are connected and used for giving an alarm according to a detection result.
2. The data quality detection system according to claim 1, wherein the detection rule configuration module comprises a WEB front end and a background server end; the background server side comprises a control layer, a business layer and a database layer, wherein the control layer is used for receiving a WEB front end http request and outputting a response to the WEB front end, the business layer is used for executing the request, and the database layer is used for inquiring, storing, updating and deleting detection rule metadata.
3. A method of testing a data quality testing system according to claim 1 or 2, comprising the steps of:
(1) receiving task information sent by a workflow module, analyzing the task information, and extracting a detection rule corresponding to the task information;
(2) inquiring required detection rule metadata information in a metadata base;
(3) inquiring real data information from the big data module according to the metadata information of the detection rule;
(4) outputting a detection result by comparing the detection rule information with the real data information;
(5) and determining whether to send warning information to the warning module for warning according to the detection result.
4. The data quality detection method of claim 3, wherein the detection rule metadata information is manually configured and stored in the metadata database through a detection rule configuration module.
5. The data quality detection method according to claim 4, wherein the human configuration comprises the following steps:
(2.1) the WEB front end sends a request;
(2.2) receiving an http request of the WEB front end, and outputting a response to the WEB front end;
(2.3) executing the http request of the WEB front end;
and (2.4) sending the metadata information of the requested detection rule to a metadata base.
CN202011313027.8A 2020-11-20 2020-11-20 Data quality visual detection system and method Pending CN112348395A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011313027.8A CN112348395A (en) 2020-11-20 2020-11-20 Data quality visual detection system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011313027.8A CN112348395A (en) 2020-11-20 2020-11-20 Data quality visual detection system and method

Publications (1)

Publication Number Publication Date
CN112348395A true CN112348395A (en) 2021-02-09

Family

ID=74364527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011313027.8A Pending CN112348395A (en) 2020-11-20 2020-11-20 Data quality visual detection system and method

Country Status (1)

Country Link
CN (1) CN112348395A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345672A (en) * 2013-07-05 2013-10-09 上海海事大学 Vehicle-carried task control system for container yard
CN106227742A (en) * 2016-07-12 2016-12-14 乐视控股(北京)有限公司 Dynamic web page based on B/S pattern generates method, server and system
CN110018860A (en) * 2019-04-04 2019-07-16 深圳市永兴元科技股份有限公司 Workflow management method, device, equipment and computer storage medium
CN110297742A (en) * 2019-07-04 2019-10-01 北京百佑科技有限公司 Data monitoring system, method and server
CN111078675A (en) * 2020-03-23 2020-04-28 绿漫科技有限公司 Multidimensional comprehensive database SQL (structured query language) auditing and optimizing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345672A (en) * 2013-07-05 2013-10-09 上海海事大学 Vehicle-carried task control system for container yard
CN106227742A (en) * 2016-07-12 2016-12-14 乐视控股(北京)有限公司 Dynamic web page based on B/S pattern generates method, server and system
CN110018860A (en) * 2019-04-04 2019-07-16 深圳市永兴元科技股份有限公司 Workflow management method, device, equipment and computer storage medium
CN110297742A (en) * 2019-07-04 2019-10-01 北京百佑科技有限公司 Data monitoring system, method and server
CN111078675A (en) * 2020-03-23 2020-04-28 绿漫科技有限公司 Multidimensional comprehensive database SQL (structured query language) auditing and optimizing method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
徐戈等: "软件工程综合实践案例教程:电子商务网站产品销售数据分析系统", vol. 1, 31 December 2019, 电子科技大学出版社, pages: 40 - 55 *
童维勤等: "数据密集型计算和模型", vol. 1, 31 January 2015, 上海科学技术出版社, pages: 209 - 212 *

Similar Documents

Publication Publication Date Title
CN104750469B (en) Source code statistical analysis technique and system
CN111414457A (en) Intelligent question-answering method, device, equipment and storage medium based on federal learning
CN108268624B (en) User data visualization method and system
CN110990447B (en) Data exploration method, device, equipment and storage medium
CN113360722B (en) Fault root cause positioning method and system based on multidimensional data map
CN103049367A (en) Automatic testing method for software
CN111125068A (en) Metadata management method and system
CN111897806A (en) Big data offline data quality inspection method and device
CN108280644B (en) Group membership data visualization method and system
CN112990281A (en) Abnormal bid identification model training method, abnormal bid identification method and abnormal bid identification device
CN116541855A (en) Cross-coroutine runtime vulnerability analysis method and device, electronic equipment and storage medium
CN112069269B (en) Big data and multidimensional feature-based data tracing method and big data cloud server
CN112559525A (en) Data checking system, method, device and server
CN110188033B (en) Data detection device, method, computer device, and computer-readable storage medium
CN112348395A (en) Data quality visual detection system and method
CN115543976A (en) Big data processing system
CN113052269B (en) Intelligent cooperative identification method, system, equipment and medium
CN108920182A (en) A kind of novel source code statistical analysis technique and system
CN110321366B (en) Statistical quantity determining method and system based on online learning
CN116150420B (en) Evaluation method and system for picture task pushing result
CN111291246A (en) Big data rapid analysis system
CN117034036A (en) Method and device for detecting timeliness of data processing and computer equipment
CN117011082A (en) Enterprise financial balance analysis system based on cloud platform
CN110737640A (en) data quality improving method and system based on distributed system
CN117195119A (en) Data quality detection method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination