CN109902081A - Data quality management method and device - Google Patents

Data quality management method and device Download PDF

Info

Publication number
CN109902081A
CN109902081A CN201910089863.3A CN201910089863A CN109902081A CN 109902081 A CN109902081 A CN 109902081A CN 201910089863 A CN201910089863 A CN 201910089863A CN 109902081 A CN109902081 A CN 109902081A
Authority
CN
China
Prior art keywords
data
dirty
scene
rule
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910089863.3A
Other languages
Chinese (zh)
Inventor
程宏亮
张鹏
吴垌沅
李晓燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Merrill Lynch Data Technology Ltd By Share Ltd
Original Assignee
Merrill Lynch Data Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Merrill Lynch Data Technology Ltd By Share Ltd filed Critical Merrill Lynch Data Technology Ltd By Share Ltd
Priority to CN201910089863.3A priority Critical patent/CN109902081A/en
Publication of CN109902081A publication Critical patent/CN109902081A/en
Pending legal-status Critical Current

Links

Abstract

The disclosure provides a kind of data quality management method and device, is related to information technology field, can be improved the accuracy and efficiency of quality of data judgement.The specific technical proposal is: determining quality of data decision rule corresponding to K application scenarios, K >=1;When target trigger condition meets, determine that decision rule corresponding to target application scene is the rule that comes into force;Wherein, the target application scene is one of described K application scenarios;The input data under target application scene is obtained, dirty data is filtered out from the input data according to the rule that comes into force.The disclosure is used for data quality management.

Description

Data quality management method and device
Technical field
This disclosure relates to information technology field more particularly to data quality management method and device.
Background technique
The business datum generated in production management process is one important data assets of enterprise, is had to these data Effect analysis and utilization can greatly help enterprise's cost efficiency, support production and management decision-making.But since data source is in each business system System, the quality of data is not often high, causes data analysis result inaccurate, has seriously affected the availability and valence of business data assets Value.Therefore, quality problems existing for data are found out, availability of data is improved effectively to support service optimization and decision point to enterprise It analyses most important.
The quality problems of data are determined often through the mode of manual examination and verification in the prior art.But it is to rely on auditor Artificial judgement, be easy to appear misjudgement and fail to judge and inefficiency.
Summary of the invention
The embodiment of the present disclosure provides a kind of data quality management method and device, can be improved the accurate of quality of data judgement Property and efficiency.The technical solution is as follows:
According to the first aspect of the embodiments of the present disclosure, a kind of data quality management method is provided, this method comprises:
Determine quality of data decision rule, K >=1 corresponding to K application scenarios;
When target trigger condition meets, determine that decision rule corresponding to target application scene is the rule that comes into force;Wherein, The target application scene is one of described K application scenarios;
The input data under target application scene is obtained, is filtered out from the input data according to the rule that comes into force dirty Data.
Technical solution provided by the present disclosure defines the application scenarios of several data quality checking, determines each applied field The rule of the quality of data is determined under scape.It is corresponding according to target application scene when the condition for triggering target application scene meets Decision rule carries out quality judging to the Data Data under the application scenarios, dirty data therein is filtered out, to improve number According to the accuracy and efficiency of quality judging.
In one embodiment, the input data obtained under target application scene, comprising:
Under data inputting scene, the data from the typing of foreground interface are obtained;
Alternatively, obtaining in the case where data import scene and importing data;
Alternatively, obtaining the data of incoming data acquisition interface in the case where data are passed to scene;
Alternatively, obtaining the data read from database in the case where database reads scene.
In one embodiment, further includes:
The analysis report of dirty data is generated according to the dirty data filtered out;
The analysis report includes that dirty data inventory or the ungratified decision rule of dirty data or dirty data exist The field of quality problems.
In one embodiment, further includes:
Plan parameters are obtained, the plan parameters are used to indicate the frequency for executing dirty data screening, time or screening model It encloses;
Dirty data screening is repeated to the data with existing in database according to the plan parameters.
In one embodiment, further includes:
It is determined in the input data according to the rule that comes into force there are when dirty data, exports prompt information;Wherein, described Prompt information is used to cover the new data of the dirty data for request.
According to the second aspect of an embodiment of the present disclosure, a kind of data quality management device is provided, comprising:
Definition module, for determining quality of data decision rule, K >=1 corresponding to K application scenarios;
Control module, for when target trigger condition meets, determining that decision rule corresponding to target application scene is Come into force rule;Wherein, the target application scene is one of described K application scenarios;
Processing module, for obtaining the input data under target application scene, according to the rule that comes into force from the input Dirty data is filtered out in data.
In one embodiment, the processing module, for obtaining from the typing of foreground interface under data inputting scene Data;
Alternatively, the processing module, for obtaining and importing data in the case where data import scene;
Alternatively, the processing module, in the case where data are passed to scene, obtaining the data of incoming data acquisition interface;
Alternatively, the processing module, for obtaining the data read from database in the case where database reads scene.
In one embodiment, the processing module is also used to generate the analysis of dirty data according to the dirty data filtered out Report;
The analysis report includes that dirty data inventory or the ungratified decision rule of dirty data or dirty data exist The field of quality problems.
In one embodiment, the processing module is also used to obtain plan parameters, according to the plan parameters to data Data with existing in library repeats dirty data screening;
Wherein, the plan parameters are used to indicate the frequency for executing dirty data screening, time or screening range.
In one embodiment, the processing module, the rule that is also used to come into force according to determine in the input data There are when dirty data, prompt information is exported;
Wherein, the prompt information is used to cover the new data of the dirty data for request.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.
Fig. 1 is a kind of flow chart for data quality management method that the embodiment of the present disclosure provides.
Fig. 2 is a kind of flow chart for data quality management method that the embodiment of the present disclosure provides.
Fig. 3 is a kind of structure chart for data quality management device that the embodiment of the present disclosure provides.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
The quality problems that data are determined by way of manual examination and verification are easy dependent on the artificial judgement of auditor Now misjudgement is failed to judge and inefficiency.Embodiment of the disclosure provides a kind of data quality management method, according to different applied fields Scape determines corresponding quality of data decision rule, carries out quality judging to the Data Data under the application scenarios, filters out dirty number According to.
As shown in Figure 1, the embodiment of the present disclosure provides a kind of data quality management method, which includes Following steps:
101, quality of data decision rule corresponding to K application scenarios is determined.
K≥1.Illustratively, four kinds of application scenarios are defined in embodiment of the disclosure, comprising: data inputting scene, in number Scene, database reading scene are passed to according to importing scene, in data.It is respectively used to the data to web interface typing, batch imports Data, webservice interface be passed to that data, data carry out dirty data screening in database.
Quality of data decision rule includes but is not limited to uniqueness rule, non-empty rule, customized regularity, consistency Rule, threshold rule, conditional combination rule and other custom scripts etc..
Such as to demographic data, can be defined as follows quality rule: identification card number does not allow to repeat (uniqueness rule), name It must cannot be 11 bit digitals for empty (non-empty rule), phone number and must be 1-200 with 1 beginning (regularity), age Between positive integer (threshold rule), gender can only select sex (rule of consistency) etc..
Certainly, only for illustration, the disclosure is for specific for above several application scenarios, quality of data decision rule Application scenarios type and corresponding quality of data decision rule without limitation.
102, when target trigger condition meets, determine that decision rule corresponding to target application scene is the rule that comes into force.
Trigger condition is to determine the criterion of application scenarios.When target trigger condition meets, determine that current application scene is mesh Application scenarios are marked, target application scene is that K application scenarios are any.
For example, determining that current application scene is data inputting scene detecting user in web interface logging data. When detecting through EXEL file importing data, determine that current application scene is that data import scene.
When determining current application scene is target application scene, determine that decision rule corresponding to target application scene is Come into force rule.
103, the input data under target application scene is obtained, rule filters out dirty number from input data according to coming into force According to.
By taking data inputting scene as an example, under the application scenarios, in user during web interface logging data According to coming into force, rule judges Data Data.For example, user is in personal informations such as typing name, identification card numbers, it can be right The data of typing judge the quality of data one by one.Play the role of checking on to the quality of data in the source that data generate.It is similar Ground can carry out respectively data to the data for importing data, incoming data acquisition interface in the case where data import, data are passed to scene Quality judging can detect the quality of data in data input, can be early compared to the subsequent situation for determining quality problems It was found that data quality problem.
In one embodiment, the analysis report of dirty data can be generated according to the dirty data filtered out.Analysis report includes There are the fields of quality problems for dirty data inventory or the ungratified decision rule of dirty data or dirty data.
Analysis report displayable output is sent to some equipment.User can modify dirty data after checking analysis report, Modified new data is re-entered.
The data quality management method that the embodiment of the present disclosure provides, defines the application scenarios of several data quality checking, Determine the rule that the quality of data is determined under each application scenarios.When the condition for triggering target application scene meets, according to target The corresponding decision rule of application scenarios carries out quality judging to the Data Data under the application scenarios, filters out therein dirty Data, to improve the accuracy and efficiency of quality of data judgement.
Based on the data quality management method that the corresponding embodiment of above-mentioned Fig. 1 provides, another embodiment of the disclosure is to data Method for quality control has done further supplementary explanation.The step in content embodiment corresponding with Fig. 1 in part of step It is same or like, it only elaborates below to difference in step.
Referring to shown in Fig. 2, data quality management method provided in this embodiment the following steps are included:
201, quality of data decision rule corresponding to K application scenarios is determined.
One application scenarios can correspond to one or above data quality judging rule.The corresponding decision rule of application scenarios can It is determined according to specific data service.The quantity of disclosure quality of data decision rule corresponding for application scenarios, content are not It limits.
202, when target trigger condition meets, determine that decision rule corresponding to target application scene is the rule that comes into force.
Target application scene is that K application scenarios are any.For example, detect data pass through data acquisition interface pass It is fashionable, determine that current application scene is that data are passed to scene.When reading data from database, determine that current application scene is Database reads scene.
203, the input data under target application scene is obtained, rule filters out dirty number from input data according to coming into force According to.
For specific application scenarios, input data under target application scene can be the number of the past platform interface typing According to, perhaps by EXEL import data perhaps the data of incoming data acquisition interface or from database read data.
204, the analysis report of dirty data is generated according to the dirty data filtered out.
Analysis report may include the various quantizating index about dirty data.For example, dirty data inventory, dirty data item number etc.. For a certain dirty data, the ungratified decision rule of the data can indicate whether, or indicate dirty data there are quality problems Field.
205, dirty data screening is repeated to the data with existing in database.
Data in database may often update, therefore can repeat dirty data sieve to data with existing in database Choosing.
In one embodiment, plan parameters are obtained, parameter logistic repeats according to the data with existing in library according to schedule Dirty data screening.Wherein, plan parameters are used to indicate the frequency for executing dirty data screening, time or screening range.Plan ginseng Number can be preset, or be customized by the user setting.
206, prompt information is exported.
In one embodiment, it is determined in input data according to the rule that comes into force there are when dirty data, exports prompt information.It mentions Show that information is used to cover the new data of dirty data for request.
By taking data inputting scene as an example, in user during web interface logging data, however, it is determined that user newly inputted There are dirty data in data, certain exportable entry data does not meet the prompt information that specification request is re-entered.When user again Data, and when the non-dirty data of data of data again, former dirty data is covered with new data.
The data quality management method that the embodiment of the present disclosure provides, defines the application scenarios of several data quality checking, Determine the rule that the quality of data is determined under each application scenarios.When the condition for triggering target application scene meets, according to target The corresponding decision rule of application scenarios carries out quality judging to the Data Data under the application scenarios, filters out therein dirty Data, to improve the accuracy and efficiency of quality of data judgement.In addition, the program can data generate or input when logarithm Detected according to quality, compared to the subsequent situation for determining quality problems, can find data quality problem early, thus user can and It is early to correct.
It is following to be filled for the disclosure based on data quality management method described in the corresponding embodiment of above-mentioned Fig. 1-Fig. 2 Embodiment is set, can be used for executing embodiments of the present disclosure.
The embodiment of the present disclosure provides a kind of data quality management device, as shown in figure 3, data quality management device includes:
Definition module 31, for determining quality of data decision rule, K >=1 corresponding to K application scenarios.
Control module 32, for determining decision rule corresponding to target application scene when target trigger condition meets For the rule that comes into force.Wherein, target application scene is one of K application scenarios.
Processing module 33, it is regular from input data according to coming into force for obtaining the input data under target application scene Filter out dirty data.
In one embodiment, processing module 33, for obtaining the number from the typing of foreground interface under data inputting scene According to.
Alternatively, processing module 33, for obtaining and importing data in the case where data import scene.
Alternatively, processing module 33, in the case where data are passed to scene, obtaining the data of incoming data acquisition interface.
Alternatively, processing module 33, for obtaining the data read from database in the case where database reads scene.
In one embodiment, processing module 33 are also used to generate the analysis report of dirty data according to the dirty data filtered out It accuses.
Analysis report includes that there are quality for dirty data inventory or the ungratified decision rule of dirty data or dirty data The field of problem.
In one embodiment, processing module 33 are also used to obtain plan parameters, and parameter logistic is according in library according to schedule Data with existing repeats dirty data screening.
Wherein, plan parameters are used to indicate the frequency for executing dirty data screening, time or screening range.
In one embodiment, processing module 33 are also used to determine that there are dirty datas in input data according to the rule that comes into force When, export prompt information.
Wherein, prompt information is used to cover the new data of dirty data for request.
The data quality management device that the embodiment of the present disclosure provides, defines the application scenarios of several data quality checking, Determine the rule that the quality of data is determined under each application scenarios.When the condition for triggering target application scene meets, according to target The corresponding decision rule of application scenarios carries out quality judging to the Data Data under the application scenarios, filters out therein dirty Data, to improve the accuracy and efficiency of quality of data judgement.
Based on data quality management method described in the corresponding embodiment of above-mentioned Fig. 1-Fig. 2, the embodiment of the present disclosure is also A kind of computer readable storage medium is provided, for example, non-transitorycomputer readable storage medium can be read-only memory (English Text: Read Only Memory, ROM), random access memory (English: Random Access Memory, RAM), CD- ROM, tape, floppy disk and optical data storage devices etc..It is stored with computer instruction on the storage medium, for executing above-mentioned Fig. 1- Data quality management method described in the corresponding embodiment of Fig. 2, details are not described herein again.
Those skilled in the art will readily occur to its of the disclosure after considering specification and practicing disclosure disclosed herein Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claim is pointed out.

Claims (10)

1. a kind of data quality management method, which is characterized in that the described method includes:
Determine quality of data decision rule, K >=1 corresponding to K application scenarios;
When target trigger condition meets, determine that decision rule corresponding to target application scene is the rule that comes into force;Wherein, described Target application scene is one of described K application scenarios;
The input data under target application scene is obtained, dirty number is filtered out from the input data according to the rule that comes into force According to.
2. the method according to claim 1, wherein the input data obtained under target application scene, packet It includes:
Under data inputting scene, the data from the typing of foreground interface are obtained;
Alternatively, obtaining in the case where data import scene and importing data;
Alternatively, obtaining the data of incoming data acquisition interface in the case where data are passed to scene;
Alternatively, obtaining the data read from database in the case where database reads scene.
3. the method according to claim 1, wherein further include:
The analysis report of dirty data is generated according to the dirty data filtered out;
The analysis report includes that there are quality for dirty data inventory or the ungratified decision rule of dirty data or dirty data The field of problem.
4. the method according to claim 1, wherein further include:
Plan parameters are obtained, the plan parameters are used to indicate the frequency for executing dirty data screening, time or screening range;
Dirty data screening is repeated to the data with existing in database according to the plan parameters.
5. the method according to claim 1, wherein further include:
It is determined in the input data according to the rule that comes into force there are when dirty data, exports prompt information;Wherein, the prompt Information is used to cover the new data of the dirty data for request.
6. a kind of data quality management device characterized by comprising
Definition module, for determining quality of data decision rule, K >=1 corresponding to K application scenarios;
Control module, for when target trigger condition meets, determining that decision rule corresponding to target application scene is to come into force Rule;Wherein, the target application scene is one of described K application scenarios;
Processing module, for obtaining the input data under target application scene, according to the rule that comes into force from the input data In filter out dirty data.
7. device according to claim 6, which is characterized in that
The processing module, for obtaining the data from the typing of foreground interface under data inputting scene;
Alternatively, the processing module, for obtaining and importing data in the case where data import scene;
Alternatively, the processing module, in the case where data are passed to scene, obtaining the data of incoming data acquisition interface;
Alternatively, the processing module, for obtaining the data read from database in the case where database reads scene.
8. device according to claim 6, which is characterized in that
The processing module is also used to generate the analysis report of dirty data according to the dirty data filtered out;
The analysis report includes that there are quality for dirty data inventory or the ungratified decision rule of dirty data or dirty data The field of problem.
9. device according to claim 6, which is characterized in that
The processing module is also used to obtain plan parameters, repeats according to the plan parameters to the data with existing in database Carry out dirty data screening;
Wherein, the plan parameters are used to indicate the frequency for executing dirty data screening, time or screening range.
10. device according to claim 6, which is characterized in that
The processing module, the rule that is also used to come into force according to determine in the input data that output mentions there are when dirty data Show information;
Wherein, the prompt information is used to cover the new data of the dirty data for request.
CN201910089863.3A 2019-01-30 2019-01-30 Data quality management method and device Pending CN109902081A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910089863.3A CN109902081A (en) 2019-01-30 2019-01-30 Data quality management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910089863.3A CN109902081A (en) 2019-01-30 2019-01-30 Data quality management method and device

Publications (1)

Publication Number Publication Date
CN109902081A true CN109902081A (en) 2019-06-18

Family

ID=66944422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910089863.3A Pending CN109902081A (en) 2019-01-30 2019-01-30 Data quality management method and device

Country Status (1)

Country Link
CN (1) CN109902081A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1547145A (en) * 2003-12-08 2004-11-17 西安交通大学 Dynamic detecting and ensuring method for equipment operating status data quality
CN103473472A (en) * 2013-09-26 2013-12-25 深圳市华傲数据技术有限公司 Quartile graph-based data quality detection method and system
CN103914616A (en) * 2014-03-18 2014-07-09 清华大学深圳研究生院 Emergency data quality control system and emergency data quality control method
CN105868373A (en) * 2016-03-31 2016-08-17 国网江西省电力公司信息通信分公司 Method and device for processing key data of power service information system
CN107491381A (en) * 2017-07-04 2017-12-19 广西电网有限责任公司电力科学研究院 A kind of equipment condition monitoring quality of data evaluating system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1547145A (en) * 2003-12-08 2004-11-17 西安交通大学 Dynamic detecting and ensuring method for equipment operating status data quality
CN103473472A (en) * 2013-09-26 2013-12-25 深圳市华傲数据技术有限公司 Quartile graph-based data quality detection method and system
CN103914616A (en) * 2014-03-18 2014-07-09 清华大学深圳研究生院 Emergency data quality control system and emergency data quality control method
CN105868373A (en) * 2016-03-31 2016-08-17 国网江西省电力公司信息通信分公司 Method and device for processing key data of power service information system
CN107491381A (en) * 2017-07-04 2017-12-19 广西电网有限责任公司电力科学研究院 A kind of equipment condition monitoring quality of data evaluating system

Similar Documents

Publication Publication Date Title
CN109561322A (en) A kind of method, apparatus, equipment and the storage medium of video audit
CN108596410B (en) Automatic wind control event processing method and device
KR101588027B1 (en) Method and apparatus for generating test case to support localization of software
CN111210842A (en) Voice quality inspection method, device, terminal and computer readable storage medium
CN110619535B (en) Data processing method and device
CN112015747B (en) Data uploading method and device
CN111797320A (en) Data processing method, device, equipment and storage medium
CN113468520A (en) Data intrusion detection method applied to block chain service and big data server
CN116414815A (en) Data quality detection method, device, computer equipment and storage medium
CN114218034B (en) Online office security processing method under big data scene and big data server
CN110716767B (en) Model component calling and generating method, device and storage medium
CN111401722A (en) Intelligent decision method and intelligent decision system
CN110717509A (en) Data sample analysis method and device based on tree splitting algorithm
US20190303424A1 (en) Novel and innovative computer system and method for accurately and consistently automating the coding of timekeeping activities and expenses, and automatically assessing the reasonableness of amounts of time billed for those activities and expenses, through the use of supervised and unsupervised machine learning, as well as lexical, statistical, and multivariate modelling of billing entries
CN113468017A (en) Online service state detection method applied to block chain and service server
CN113472860A (en) Service resource allocation method and server under big data and digital environment
CN109800887B (en) Generation method and device of prediction process model, storage medium and electronic equipment
KR101948603B1 (en) Anonymization Device for Preserving Utility of Data and Method thereof
CN109902081A (en) Data quality management method and device
CN111737319B (en) User cluster prediction method, device, computer equipment and storage medium
CN114595216A (en) Data verification method and device, storage medium and electronic equipment
CN113962216A (en) Text processing method and device, electronic equipment and readable storage medium
CN110750727A (en) Data processing method, device, system and computer readable storage medium
CN115208831B (en) Request processing method, device, equipment and storage medium
CN112598237A (en) Organization rating result determining method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190618

RJ01 Rejection of invention patent application after publication