CN109902081A - Data quality management method and device - Google Patents
Data quality management method and device Download PDFInfo
- Publication number
- CN109902081A CN109902081A CN201910089863.3A CN201910089863A CN109902081A CN 109902081 A CN109902081 A CN 109902081A CN 201910089863 A CN201910089863 A CN 201910089863A CN 109902081 A CN109902081 A CN 109902081A
- Authority
- CN
- China
- Prior art keywords
- data
- dirty
- scene
- rule
- quality
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The disclosure provides a kind of data quality management method and device, is related to information technology field, can be improved the accuracy and efficiency of quality of data judgement.The specific technical proposal is: determining quality of data decision rule corresponding to K application scenarios, K >=1;When target trigger condition meets, determine that decision rule corresponding to target application scene is the rule that comes into force;Wherein, the target application scene is one of described K application scenarios;The input data under target application scene is obtained, dirty data is filtered out from the input data according to the rule that comes into force.The disclosure is used for data quality management.
Description
Technical field
This disclosure relates to information technology field more particularly to data quality management method and device.
Background technique
The business datum generated in production management process is one important data assets of enterprise, is had to these data
Effect analysis and utilization can greatly help enterprise's cost efficiency, support production and management decision-making.But since data source is in each business system
System, the quality of data is not often high, causes data analysis result inaccurate, has seriously affected the availability and valence of business data assets
Value.Therefore, quality problems existing for data are found out, availability of data is improved effectively to support service optimization and decision point to enterprise
It analyses most important.
The quality problems of data are determined often through the mode of manual examination and verification in the prior art.But it is to rely on auditor
Artificial judgement, be easy to appear misjudgement and fail to judge and inefficiency.
Summary of the invention
The embodiment of the present disclosure provides a kind of data quality management method and device, can be improved the accurate of quality of data judgement
Property and efficiency.The technical solution is as follows:
According to the first aspect of the embodiments of the present disclosure, a kind of data quality management method is provided, this method comprises:
Determine quality of data decision rule, K >=1 corresponding to K application scenarios;
When target trigger condition meets, determine that decision rule corresponding to target application scene is the rule that comes into force;Wherein,
The target application scene is one of described K application scenarios;
The input data under target application scene is obtained, is filtered out from the input data according to the rule that comes into force dirty
Data.
Technical solution provided by the present disclosure defines the application scenarios of several data quality checking, determines each applied field
The rule of the quality of data is determined under scape.It is corresponding according to target application scene when the condition for triggering target application scene meets
Decision rule carries out quality judging to the Data Data under the application scenarios, dirty data therein is filtered out, to improve number
According to the accuracy and efficiency of quality judging.
In one embodiment, the input data obtained under target application scene, comprising:
Under data inputting scene, the data from the typing of foreground interface are obtained;
Alternatively, obtaining in the case where data import scene and importing data;
Alternatively, obtaining the data of incoming data acquisition interface in the case where data are passed to scene;
Alternatively, obtaining the data read from database in the case where database reads scene.
In one embodiment, further includes:
The analysis report of dirty data is generated according to the dirty data filtered out;
The analysis report includes that dirty data inventory or the ungratified decision rule of dirty data or dirty data exist
The field of quality problems.
In one embodiment, further includes:
Plan parameters are obtained, the plan parameters are used to indicate the frequency for executing dirty data screening, time or screening model
It encloses;
Dirty data screening is repeated to the data with existing in database according to the plan parameters.
In one embodiment, further includes:
It is determined in the input data according to the rule that comes into force there are when dirty data, exports prompt information;Wherein, described
Prompt information is used to cover the new data of the dirty data for request.
According to the second aspect of an embodiment of the present disclosure, a kind of data quality management device is provided, comprising:
Definition module, for determining quality of data decision rule, K >=1 corresponding to K application scenarios;
Control module, for when target trigger condition meets, determining that decision rule corresponding to target application scene is
Come into force rule;Wherein, the target application scene is one of described K application scenarios;
Processing module, for obtaining the input data under target application scene, according to the rule that comes into force from the input
Dirty data is filtered out in data.
In one embodiment, the processing module, for obtaining from the typing of foreground interface under data inputting scene
Data;
Alternatively, the processing module, for obtaining and importing data in the case where data import scene;
Alternatively, the processing module, in the case where data are passed to scene, obtaining the data of incoming data acquisition interface;
Alternatively, the processing module, for obtaining the data read from database in the case where database reads scene.
In one embodiment, the processing module is also used to generate the analysis of dirty data according to the dirty data filtered out
Report;
The analysis report includes that dirty data inventory or the ungratified decision rule of dirty data or dirty data exist
The field of quality problems.
In one embodiment, the processing module is also used to obtain plan parameters, according to the plan parameters to data
Data with existing in library repeats dirty data screening;
Wherein, the plan parameters are used to indicate the frequency for executing dirty data screening, time or screening range.
In one embodiment, the processing module, the rule that is also used to come into force according to determine in the input data
There are when dirty data, prompt information is exported;
Wherein, the prompt information is used to cover the new data of the dirty data for request.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure
Example, and together with specification for explaining the principles of this disclosure.
Fig. 1 is a kind of flow chart for data quality management method that the embodiment of the present disclosure provides.
Fig. 2 is a kind of flow chart for data quality management method that the embodiment of the present disclosure provides.
Fig. 3 is a kind of structure chart for data quality management device that the embodiment of the present disclosure provides.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended
The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
The quality problems that data are determined by way of manual examination and verification are easy dependent on the artificial judgement of auditor
Now misjudgement is failed to judge and inefficiency.Embodiment of the disclosure provides a kind of data quality management method, according to different applied fields
Scape determines corresponding quality of data decision rule, carries out quality judging to the Data Data under the application scenarios, filters out dirty number
According to.
As shown in Figure 1, the embodiment of the present disclosure provides a kind of data quality management method, which includes
Following steps:
101, quality of data decision rule corresponding to K application scenarios is determined.
K≥1.Illustratively, four kinds of application scenarios are defined in embodiment of the disclosure, comprising: data inputting scene, in number
Scene, database reading scene are passed to according to importing scene, in data.It is respectively used to the data to web interface typing, batch imports
Data, webservice interface be passed to that data, data carry out dirty data screening in database.
Quality of data decision rule includes but is not limited to uniqueness rule, non-empty rule, customized regularity, consistency
Rule, threshold rule, conditional combination rule and other custom scripts etc..
Such as to demographic data, can be defined as follows quality rule: identification card number does not allow to repeat (uniqueness rule), name
It must cannot be 11 bit digitals for empty (non-empty rule), phone number and must be 1-200 with 1 beginning (regularity), age
Between positive integer (threshold rule), gender can only select sex (rule of consistency) etc..
Certainly, only for illustration, the disclosure is for specific for above several application scenarios, quality of data decision rule
Application scenarios type and corresponding quality of data decision rule without limitation.
102, when target trigger condition meets, determine that decision rule corresponding to target application scene is the rule that comes into force.
Trigger condition is to determine the criterion of application scenarios.When target trigger condition meets, determine that current application scene is mesh
Application scenarios are marked, target application scene is that K application scenarios are any.
For example, determining that current application scene is data inputting scene detecting user in web interface logging data.
When detecting through EXEL file importing data, determine that current application scene is that data import scene.
When determining current application scene is target application scene, determine that decision rule corresponding to target application scene is
Come into force rule.
103, the input data under target application scene is obtained, rule filters out dirty number from input data according to coming into force
According to.
By taking data inputting scene as an example, under the application scenarios, in user during web interface logging data
According to coming into force, rule judges Data Data.For example, user is in personal informations such as typing name, identification card numbers, it can be right
The data of typing judge the quality of data one by one.Play the role of checking on to the quality of data in the source that data generate.It is similar
Ground can carry out respectively data to the data for importing data, incoming data acquisition interface in the case where data import, data are passed to scene
Quality judging can detect the quality of data in data input, can be early compared to the subsequent situation for determining quality problems
It was found that data quality problem.
In one embodiment, the analysis report of dirty data can be generated according to the dirty data filtered out.Analysis report includes
There are the fields of quality problems for dirty data inventory or the ungratified decision rule of dirty data or dirty data.
Analysis report displayable output is sent to some equipment.User can modify dirty data after checking analysis report,
Modified new data is re-entered.
The data quality management method that the embodiment of the present disclosure provides, defines the application scenarios of several data quality checking,
Determine the rule that the quality of data is determined under each application scenarios.When the condition for triggering target application scene meets, according to target
The corresponding decision rule of application scenarios carries out quality judging to the Data Data under the application scenarios, filters out therein dirty
Data, to improve the accuracy and efficiency of quality of data judgement.
Based on the data quality management method that the corresponding embodiment of above-mentioned Fig. 1 provides, another embodiment of the disclosure is to data
Method for quality control has done further supplementary explanation.The step in content embodiment corresponding with Fig. 1 in part of step
It is same or like, it only elaborates below to difference in step.
Referring to shown in Fig. 2, data quality management method provided in this embodiment the following steps are included:
201, quality of data decision rule corresponding to K application scenarios is determined.
One application scenarios can correspond to one or above data quality judging rule.The corresponding decision rule of application scenarios can
It is determined according to specific data service.The quantity of disclosure quality of data decision rule corresponding for application scenarios, content are not
It limits.
202, when target trigger condition meets, determine that decision rule corresponding to target application scene is the rule that comes into force.
Target application scene is that K application scenarios are any.For example, detect data pass through data acquisition interface pass
It is fashionable, determine that current application scene is that data are passed to scene.When reading data from database, determine that current application scene is
Database reads scene.
203, the input data under target application scene is obtained, rule filters out dirty number from input data according to coming into force
According to.
For specific application scenarios, input data under target application scene can be the number of the past platform interface typing
According to, perhaps by EXEL import data perhaps the data of incoming data acquisition interface or from database read data.
204, the analysis report of dirty data is generated according to the dirty data filtered out.
Analysis report may include the various quantizating index about dirty data.For example, dirty data inventory, dirty data item number etc..
For a certain dirty data, the ungratified decision rule of the data can indicate whether, or indicate dirty data there are quality problems
Field.
205, dirty data screening is repeated to the data with existing in database.
Data in database may often update, therefore can repeat dirty data sieve to data with existing in database
Choosing.
In one embodiment, plan parameters are obtained, parameter logistic repeats according to the data with existing in library according to schedule
Dirty data screening.Wherein, plan parameters are used to indicate the frequency for executing dirty data screening, time or screening range.Plan ginseng
Number can be preset, or be customized by the user setting.
206, prompt information is exported.
In one embodiment, it is determined in input data according to the rule that comes into force there are when dirty data, exports prompt information.It mentions
Show that information is used to cover the new data of dirty data for request.
By taking data inputting scene as an example, in user during web interface logging data, however, it is determined that user newly inputted
There are dirty data in data, certain exportable entry data does not meet the prompt information that specification request is re-entered.When user again
Data, and when the non-dirty data of data of data again, former dirty data is covered with new data.
The data quality management method that the embodiment of the present disclosure provides, defines the application scenarios of several data quality checking,
Determine the rule that the quality of data is determined under each application scenarios.When the condition for triggering target application scene meets, according to target
The corresponding decision rule of application scenarios carries out quality judging to the Data Data under the application scenarios, filters out therein dirty
Data, to improve the accuracy and efficiency of quality of data judgement.In addition, the program can data generate or input when logarithm
Detected according to quality, compared to the subsequent situation for determining quality problems, can find data quality problem early, thus user can and
It is early to correct.
It is following to be filled for the disclosure based on data quality management method described in the corresponding embodiment of above-mentioned Fig. 1-Fig. 2
Embodiment is set, can be used for executing embodiments of the present disclosure.
The embodiment of the present disclosure provides a kind of data quality management device, as shown in figure 3, data quality management device includes:
Definition module 31, for determining quality of data decision rule, K >=1 corresponding to K application scenarios.
Control module 32, for determining decision rule corresponding to target application scene when target trigger condition meets
For the rule that comes into force.Wherein, target application scene is one of K application scenarios.
Processing module 33, it is regular from input data according to coming into force for obtaining the input data under target application scene
Filter out dirty data.
In one embodiment, processing module 33, for obtaining the number from the typing of foreground interface under data inputting scene
According to.
Alternatively, processing module 33, for obtaining and importing data in the case where data import scene.
Alternatively, processing module 33, in the case where data are passed to scene, obtaining the data of incoming data acquisition interface.
Alternatively, processing module 33, for obtaining the data read from database in the case where database reads scene.
In one embodiment, processing module 33 are also used to generate the analysis report of dirty data according to the dirty data filtered out
It accuses.
Analysis report includes that there are quality for dirty data inventory or the ungratified decision rule of dirty data or dirty data
The field of problem.
In one embodiment, processing module 33 are also used to obtain plan parameters, and parameter logistic is according in library according to schedule
Data with existing repeats dirty data screening.
Wherein, plan parameters are used to indicate the frequency for executing dirty data screening, time or screening range.
In one embodiment, processing module 33 are also used to determine that there are dirty datas in input data according to the rule that comes into force
When, export prompt information.
Wherein, prompt information is used to cover the new data of dirty data for request.
The data quality management device that the embodiment of the present disclosure provides, defines the application scenarios of several data quality checking,
Determine the rule that the quality of data is determined under each application scenarios.When the condition for triggering target application scene meets, according to target
The corresponding decision rule of application scenarios carries out quality judging to the Data Data under the application scenarios, filters out therein dirty
Data, to improve the accuracy and efficiency of quality of data judgement.
Based on data quality management method described in the corresponding embodiment of above-mentioned Fig. 1-Fig. 2, the embodiment of the present disclosure is also
A kind of computer readable storage medium is provided, for example, non-transitorycomputer readable storage medium can be read-only memory (English
Text: Read Only Memory, ROM), random access memory (English: Random Access Memory, RAM), CD-
ROM, tape, floppy disk and optical data storage devices etc..It is stored with computer instruction on the storage medium, for executing above-mentioned Fig. 1-
Data quality management method described in the corresponding embodiment of Fig. 2, details are not described herein again.
Those skilled in the art will readily occur to its of the disclosure after considering specification and practicing disclosure disclosed herein
Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following
Claim is pointed out.
Claims (10)
1. a kind of data quality management method, which is characterized in that the described method includes:
Determine quality of data decision rule, K >=1 corresponding to K application scenarios;
When target trigger condition meets, determine that decision rule corresponding to target application scene is the rule that comes into force;Wherein, described
Target application scene is one of described K application scenarios;
The input data under target application scene is obtained, dirty number is filtered out from the input data according to the rule that comes into force
According to.
2. the method according to claim 1, wherein the input data obtained under target application scene, packet
It includes:
Under data inputting scene, the data from the typing of foreground interface are obtained;
Alternatively, obtaining in the case where data import scene and importing data;
Alternatively, obtaining the data of incoming data acquisition interface in the case where data are passed to scene;
Alternatively, obtaining the data read from database in the case where database reads scene.
3. the method according to claim 1, wherein further include:
The analysis report of dirty data is generated according to the dirty data filtered out;
The analysis report includes that there are quality for dirty data inventory or the ungratified decision rule of dirty data or dirty data
The field of problem.
4. the method according to claim 1, wherein further include:
Plan parameters are obtained, the plan parameters are used to indicate the frequency for executing dirty data screening, time or screening range;
Dirty data screening is repeated to the data with existing in database according to the plan parameters.
5. the method according to claim 1, wherein further include:
It is determined in the input data according to the rule that comes into force there are when dirty data, exports prompt information;Wherein, the prompt
Information is used to cover the new data of the dirty data for request.
6. a kind of data quality management device characterized by comprising
Definition module, for determining quality of data decision rule, K >=1 corresponding to K application scenarios;
Control module, for when target trigger condition meets, determining that decision rule corresponding to target application scene is to come into force
Rule;Wherein, the target application scene is one of described K application scenarios;
Processing module, for obtaining the input data under target application scene, according to the rule that comes into force from the input data
In filter out dirty data.
7. device according to claim 6, which is characterized in that
The processing module, for obtaining the data from the typing of foreground interface under data inputting scene;
Alternatively, the processing module, for obtaining and importing data in the case where data import scene;
Alternatively, the processing module, in the case where data are passed to scene, obtaining the data of incoming data acquisition interface;
Alternatively, the processing module, for obtaining the data read from database in the case where database reads scene.
8. device according to claim 6, which is characterized in that
The processing module is also used to generate the analysis report of dirty data according to the dirty data filtered out;
The analysis report includes that there are quality for dirty data inventory or the ungratified decision rule of dirty data or dirty data
The field of problem.
9. device according to claim 6, which is characterized in that
The processing module is also used to obtain plan parameters, repeats according to the plan parameters to the data with existing in database
Carry out dirty data screening;
Wherein, the plan parameters are used to indicate the frequency for executing dirty data screening, time or screening range.
10. device according to claim 6, which is characterized in that
The processing module, the rule that is also used to come into force according to determine in the input data that output mentions there are when dirty data
Show information;
Wherein, the prompt information is used to cover the new data of the dirty data for request.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910089863.3A CN109902081A (en) | 2019-01-30 | 2019-01-30 | Data quality management method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910089863.3A CN109902081A (en) | 2019-01-30 | 2019-01-30 | Data quality management method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109902081A true CN109902081A (en) | 2019-06-18 |
Family
ID=66944422
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910089863.3A Pending CN109902081A (en) | 2019-01-30 | 2019-01-30 | Data quality management method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109902081A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1547145A (en) * | 2003-12-08 | 2004-11-17 | 西安交通大学 | Dynamic detecting and ensuring method for equipment operating status data quality |
CN103473472A (en) * | 2013-09-26 | 2013-12-25 | 深圳市华傲数据技术有限公司 | Quartile graph-based data quality detection method and system |
CN103914616A (en) * | 2014-03-18 | 2014-07-09 | 清华大学深圳研究生院 | Emergency data quality control system and emergency data quality control method |
CN105868373A (en) * | 2016-03-31 | 2016-08-17 | 国网江西省电力公司信息通信分公司 | Method and device for processing key data of power service information system |
CN107491381A (en) * | 2017-07-04 | 2017-12-19 | 广西电网有限责任公司电力科学研究院 | A kind of equipment condition monitoring quality of data evaluating system |
-
2019
- 2019-01-30 CN CN201910089863.3A patent/CN109902081A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1547145A (en) * | 2003-12-08 | 2004-11-17 | 西安交通大学 | Dynamic detecting and ensuring method for equipment operating status data quality |
CN103473472A (en) * | 2013-09-26 | 2013-12-25 | 深圳市华傲数据技术有限公司 | Quartile graph-based data quality detection method and system |
CN103914616A (en) * | 2014-03-18 | 2014-07-09 | 清华大学深圳研究生院 | Emergency data quality control system and emergency data quality control method |
CN105868373A (en) * | 2016-03-31 | 2016-08-17 | 国网江西省电力公司信息通信分公司 | Method and device for processing key data of power service information system |
CN107491381A (en) * | 2017-07-04 | 2017-12-19 | 广西电网有限责任公司电力科学研究院 | A kind of equipment condition monitoring quality of data evaluating system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109561322A (en) | A kind of method, apparatus, equipment and the storage medium of video audit | |
CN108596410B (en) | Automatic wind control event processing method and device | |
KR101588027B1 (en) | Method and apparatus for generating test case to support localization of software | |
CN111210842A (en) | Voice quality inspection method, device, terminal and computer readable storage medium | |
CN110619535B (en) | Data processing method and device | |
CN112015747B (en) | Data uploading method and device | |
CN111797320A (en) | Data processing method, device, equipment and storage medium | |
CN113468520A (en) | Data intrusion detection method applied to block chain service and big data server | |
CN116414815A (en) | Data quality detection method, device, computer equipment and storage medium | |
CN114218034B (en) | Online office security processing method under big data scene and big data server | |
CN110716767B (en) | Model component calling and generating method, device and storage medium | |
CN111401722A (en) | Intelligent decision method and intelligent decision system | |
CN110717509A (en) | Data sample analysis method and device based on tree splitting algorithm | |
US20190303424A1 (en) | Novel and innovative computer system and method for accurately and consistently automating the coding of timekeeping activities and expenses, and automatically assessing the reasonableness of amounts of time billed for those activities and expenses, through the use of supervised and unsupervised machine learning, as well as lexical, statistical, and multivariate modelling of billing entries | |
CN113468017A (en) | Online service state detection method applied to block chain and service server | |
CN113472860A (en) | Service resource allocation method and server under big data and digital environment | |
CN109800887B (en) | Generation method and device of prediction process model, storage medium and electronic equipment | |
KR101948603B1 (en) | Anonymization Device for Preserving Utility of Data and Method thereof | |
CN109902081A (en) | Data quality management method and device | |
CN111737319B (en) | User cluster prediction method, device, computer equipment and storage medium | |
CN114595216A (en) | Data verification method and device, storage medium and electronic equipment | |
CN113962216A (en) | Text processing method and device, electronic equipment and readable storage medium | |
CN110750727A (en) | Data processing method, device, system and computer readable storage medium | |
CN115208831B (en) | Request processing method, device, equipment and storage medium | |
CN112598237A (en) | Organization rating result determining method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190618 |
|
RJ01 | Rejection of invention patent application after publication |