CN110069480A - A kind of parallel data cleaning method - Google Patents
A kind of parallel data cleaning method Download PDFInfo
- Publication number
- CN110069480A CN110069480A CN201910161073.1A CN201910161073A CN110069480A CN 110069480 A CN110069480 A CN 110069480A CN 201910161073 A CN201910161073 A CN 201910161073A CN 110069480 A CN110069480 A CN 110069480A
- Authority
- CN
- China
- Prior art keywords
- data
- constraint relationship
- abnormal data
- abnormal
- constraint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000002159 abnormal effect Effects 0.000 claims description 77
- 230000008030 elimination Effects 0.000 claims description 7
- 238000003379 elimination reaction Methods 0.000 claims description 7
- 238000013459 approach Methods 0.000 claims description 4
- 241001269238 Data Species 0.000 claims description 3
- 230000008439 repair process Effects 0.000 abstract description 4
- 238000012545 processing Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000004080 punching Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Abstract
The invention discloses a kind of parallel data cleaning methods, by the overall architecture for constructing distributed parallel cleaning system, the hypergraph that conflicts is constituted with corresponding constraint using the data cells for violating the constraint relationship all in data, carry out data cleansing, and according to data cell in conflict hypergraph and the corresponding position constrained, the rapid data cleaning method for being suitble to mass data is formed.By means of the invention it is possible to which reaching data cleansing repairs speed faster, and algorithm complexity is lower, is suitble to the reparation of mass data.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of parallel data cleaning method.
Background technique
Business and science data have become the wealth of most worthy under current era development.But in data
Source because the mistake of the interference of noise and bring data substantially reduces the value of data itself: what data were extracted
Inaccurately lead to the missing of data;The data extracted from multiple data sources lead to the redundancy of data after merging.Data set provider is wrong
Data input accidentally, causes Data Integrality Restriction no longer to be set up.These mistakes all cause a large amount of economic loss every year.It provides
The operation of data is cleaned, improving the quality of data is the key that realize data efficient management.
Data cleansing includes the detection and correction of wrong data.The current algorithm for having had already appeared many data cleansings, one
Kind is for different type of errors (shortage of data, data redundancy, error in data) for the straightforward procedure of global data
Cleaning algorithm is simply together in series use, can reduce the processing complexity to global data to the maximum extent in this way.But
This method does not account for the interaction relationship between global data different type.Eventually result in the result of data cleansing simultaneously
It is undesirable.
MapReduce programming framework can use the big data cleaning that concurrent technique realizes enhanced scalability.MapReduce
It is a kind of programming model, the concurrent operation for large-scale dataset.Concept " Map (mapping) " and " Reduce (reduction) " are
Their main thought is all the characteristic borrowed in Functional Programming and vector programming language.It is greatly facilitated
Programming personnel will not distributed parallel programming in the case where, the program of oneself is operated in distributed system.Current is soft
Part realization is to specify Map (mapping) function, for one group of key-value pair is mapped to one group of new key-value pair, is specified concurrently
Reduce (reduction) function, for guaranteeing that each of the key-value pair of all mappings shares identical key group.
Research in terms of data cleansing appears in the U.S. earliest, even to this day, has emerged algorithm too numerous to enumerate.With
The transition in epoch, the form of wrong data change multiplicity, the growth of data volume also proposes for the design of data cleansing algorithm new
Requirement.Many traditional data cleansing algorithms have been unable to satisfy the demand of big data era.Therefore how accurately and efficiently clear
Washing dirty data always is the project for being worth research.Data cleansing is intended to identify and corrects the noise in data, by noise to data
The influence of analysis result minimizes.Noise in data mainly includes incomplete data, the data of redundancy, the data of conflict
With the data of mistake.Detecting and eliminating for data noise can be realized by automatic algorithm, it can also be by data cleansing
Rule, or by user participation.
Instantly, machine learning and the development of crowdsourcing technology are that the research work of data cleansing is filled with new vitality.Machine
Learning art can learn to formulate the rule of cleaning decision from user record, to mitigate the burden of user annotation data.Together
When, the conversion from cleaning rule to machine learning model is so that user no longer needs to formulate a large amount of data cleansing rule.Crowdsourcing
Data cleansing task is published to internet by technology, to concentrate the knowledge of numerous users and the form of decision crowdsourcing can be abundant
The accuracy and efficiency of data cleansing are improved while reducing cleaning cost using external resource advantage.
In existing data cleansing amending method, usually using the technology of negative constraint, found out according to negative constraint
All data cells for not meeting constraint, these data cells for not meeting constraint constitute conflict hypergraph, root with corresponding constraint
According to the corresponding position of Bian Yudian in conflict hypergraph, reparation context is defined.It is cleaned according to context and corresponding constraint is repaired
Data.But available data cleaning restorative procedure is unsatisfactory for the processing of mass data, because the complexity of algorithm is high
So that the efficiency of data cleansing is very low.Or the algorithm designed for specific data, there is no propose logarithm on the whole
According to the unified approach of cleaning.
Summary of the invention
The purpose of the present invention is provide a kind of parallel data cleaning method to solve above-mentioned the deficiencies in the prior art place.
Technology of the invention is achieved through the following technical solutions: a kind of parallel data cleaning method, comprising:
Using the incidence relation and the constraint relationship between data all in database, abnormal data retrieval model is constructed;Its
In, the input of the abnormal data retrieval model is each data in database, with the data in abnormal data retrieval model
Incidence relation and the constraint relationship are compared, if being unsatisfactory at least one of data correlation relation and the constraint relationship, will input
Data as abnormal data, and using abnormal data and its ungratified whole incidence relation and the constraint relationship as abnormal data
The output of retrieval model;
According to the abnormal data and the ungratified the constraint relationship of each abnormal data of the output of abnormal data retrieval model, structure
Build the hypergraph of abnormal data;Wherein, the super side using ungratified the constraint relationship as hypergraph, correspondence are unsatisfactory for the constraint relationship extremely
The violation unit that a few abnormal data is covered as super side;
It selects to be unsatisfactory for the violation unit most as the constraint relationship on super side as minimum vertex-covering point, finds minimum vertex-covering
In the ungratified the constraint relationship of violation unit of point, the most the constraint relationship conduct of the violation unit of relationship is not only satisfied the constraint
The first surpass side, to the inversion operation for the first surpassing the constraint relationship when violation unit covered in executes and the first surpasses, after negating
The violation unit for being unsatisfactory for the first surpassing the constraint relationship on side is changed into normal data, the first surpasses side elimination;
Loop iteration, until all super sides are eliminated, the violation unit of remaining minimum vertex-covering point executes minimum vertex-covering point
With the inversion operation of its current whole the constraint relationship, the reparation of whole abnormal datas is completed.
Wherein, abnormal data retrieval model meets formula:
Wherein,Any data in database to input abnormal data retrieval model, Ri With dataRelated pass
Connection relationship, PiIt is dataThe constraint relationship of satisfaction;If the data x of input is unsatisfactory for formula (1), data are determinedFor exception
Data, while determining ungratified the constraint relationship.
Wherein, in establishing the step of conflicting hypergraph, settingSearching is unsatisfactory for any the constraint relationship
PiViolation unit V={ v1,...,vn, the two is corresponding to obtain conflict hypergraph.
Wherein, in the step of selection the first surpasses side, comprising steps of
For the subdomain being unsatisfactory for when super all violation units in corresponding the constraint relationship constitute corresponding super;
In the subdomain that the violation unit that all super sides where counting minimum vertex-covering point surround is constituted, it is only unsatisfactory for one
The quantity of the violation data unit of the corresponding the constraint relationship in super side;
In the subdomain that the violation unit that all super sides where choosing minimum vertex-covering point surround is constituted, only it is unsatisfactory for one and surpasses
In most super of the quantity of the violation unit of corresponding the constraint relationship as the first surpassing side.
Wherein, after completion the first surpasses the step of side is eliminated, again by whole violation units in initial collision hypergraph
It is retrieved in input abnormal data retrieval model, excludes the first to surpass in crack approach by eliminating, be corrected for normal data
Abnormal data and the ungratified incidence relation of abnormal data and the constraint relationship.
Wherein, by the abnormal data newly obtained and corresponding ungratified incidence relation and the constraint relationship, punching is rebuild
Prominent hypergraph, carries out loop iteration, finds the minimum vertex-covering point in the violation unit that new abnormal data is formed and the first surpasses side, and
Progress the first surpasses side elimination, until abnormal data only remaining minimum vertex-covering point.
Wherein, when abnormal data only remaining minimum vertex-covering point, for the ungratified whole the constraint relationships of minimum vertex-covering point, into
Row inversion operation, and the data obtained after minimum vertex-covering point reparation are inputted in abnormal data retrieval model, judge minimum vertex-covering
Whether the data that point obtains after repairing are abnormal data.
Wherein, if abnormal data retrieval model judge the data obtained after minimum vertex-covering point reparation for normal data, it is complete
The reparation of paired data library total data;If abnormal data retrieval model judges the data obtained after minimum vertex-covering point reparation to be different
Regular data then deletes the data, the reparation of database total data.
It is different from the prior art, parallel data cleaning method of the invention passes through building distributed parallel cleaning system
Overall architecture, conflicted with corresponding constraint composition hypergraph using the data cells for violating the constraint relationships all in data, counted
According to cleaning, and according to data cell in conflict hypergraph and the corresponding position constrained, the rapid data for being suitble to mass data is formed
Cleaning method.By means of the invention it is possible to which reaching data cleansing repairs speed faster, and algorithm complexity is lower, is suitble to a large amount of numbers
According to reparation.
Detailed description of the invention
Fig. 1 be it is provided by the invention it is a kind of for coordinate conversion in abnormal point positioning and estimation method process illustrate
Figure.
Specific embodiment
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention.But the present invention can be with
Much it is different from other way described herein to implement, those skilled in the art can be without prejudice to intension of the present invention the case where
Under do similar popularization, therefore the present invention is not limited to the specific embodiments disclosed below.
Secondly, the present invention is described in detail using schematic diagram, when describing the embodiments of the present invention, for purposes of illustration only, showing
It is intended to be example, the scope of protection of the invention should not be limited herein.
Refering to fig. 1, Fig. 1 is a kind of flow diagram of parallel data cleaning method provided by the invention.The step of this method
Suddenly include:
S110: it using the incidence relation and the constraint relationship between data all in database, constructs abnormal data and retrieves mould
Type;Wherein, the input of the abnormal data retrieval model is each data in database, in abnormal data retrieval model
Data correlation relation and the constraint relationship are compared, will if being unsatisfactory at least one of data correlation relation and the constraint relationship
The data of input are as abnormal data, and using abnormal data and its ungratified whole incidence relation and the constraint relationship as extremely
The output of data retrieval model.
S120: it is closed according to the abnormal data of abnormal data retrieval model output and the ungratified constraint of each abnormal data
System, constructs the hypergraph of abnormal data;Wherein, the super side using ungratified the constraint relationship as hypergraph, it is corresponding to be unsatisfactory for constraint pass
The violation unit that at least one abnormal data of system is covered as super side.
S130: it selects to be unsatisfactory for the violation unit most as the constraint relationship on super side as minimum vertex-covering point, finds most
In the ungratified the constraint relationship of violation unit of small covering point, the most constraint pass of the violation unit of relationship is not only satisfied the constraint
System executes inversion operation with the constraint relationship when the first surpassing as the first surpassing side, to the violation unit covered in the first is surpassed,
The violation unit for being unsatisfactory for the first surpassing the constraint relationship on side after negating is changed into normal data, the first surpasses side elimination.
S140: loop iteration, until all super sides are eliminated, the violation unit of remaining minimum vertex-covering point, to minimum vertex-covering
Point executes the inversion operation with its current whole the constraint relationship, completes the reparation of whole abnormal datas.
Wherein, abnormal data retrieval model meets formula:
Wherein,Any data in database to input abnormal data retrieval model, Ri With dataRelated pass
Connection relationship, PiIt is dataThe constraint relationship of satisfaction;If the data of inputIt is unsatisfactory for formula (1), then determines dataFor exception
Data, while determining ungratified the constraint relationship.
Whether the principle of abnormal data retrieval model concludes a contract or treaty beam, and negative constraint is exactly one-level formula:Here x refers to data cell.Ri∈ R is a relationship atom.Each PiForm: v1θv2,θ ∈ B, similar predicate ≈, when the editing distance of two character strings is big
In the threshold value σ that user gives, this predicate just comes into force.Single constraint, functional dependence, matching relies on and conditional function relies on
It is all the negative constraint in a tuple.A database instance I at database schema S is provided, there are also negative constraintsIf I meetsIt writes and doesIf there is one group of DC ∑, and if only ifWrite and be I |=∑.It is right
One, there are the reparation I' of the database instance I of wrong data, will meet the negative constraint ∑ on database instance I, and
And there is identical first deck label with example I.Attribute value in I and I' can be different, for the domain of attribute in R, also have difference
Reparation I'.The new value of these reparations i.e. attribute.In repairing at one, each is ok the new value FV of attribute A
With one from Dom (A) Doma(A) value replaces, Doma(A) be a value domain, the value in this domain at least meet for
Include a predicate in each negative constraint including FV.In other words, new value to meet on actual attribute it is all about
Beam.
Wherein, in establishing the step of conflicting hypergraph, settingSearching is unsatisfactory for any the constraint relationship
PiViolation unit V={ v1,...,vn, the two is corresponding to obtain conflict hypergraph.
Conflict hypergraph is the detection for unlawful practice.Its node is the unit of violation, and the side being connected to node is corresponding
The unlawful practice of node.It represents the current state of the data under all the constraint relationships.By this state, can analyze
Interaction between different unlawful practices also can analyze out the interaction between different constraints.
Provide a constraint DC d:The violation unit V={ v connected with super side1,...,vn, it is right
In each vi∈ V, at least one selectable reparation viθ t, t are and the connection in unit V, that is, violation unit viInstitute
All constraints of violation.
Wherein, in the step of selection the first surpasses side, comprising steps of
For the subdomain being unsatisfactory for when super all violation units in corresponding the constraint relationship constitute corresponding super;
In the subdomain that the violation unit that all super sides where counting minimum vertex-covering point surround is constituted, it is only unsatisfactory for one
The quantity of the violation data unit of the corresponding the constraint relationship in super side;
In the subdomain that the violation unit that all super sides where choosing minimum vertex-covering point surround is constituted, only it is unsatisfactory for one and surpasses
In most super of the quantity of the violation unit of corresponding the constraint relationship as the first surpassing side.
According to conflict hypergraph in each super violation unit in possessed it is super while number, to select preferentially to repair
Violation constraint, the violation unit that these violate same constraint is uniformly repaired according to corresponding constraint and reparation rule.
Different violation units is surrounded by the super side of a hypergraph, illustrates that they all violate the pass of constraint corresponding to super side
System, then the processing of the unit in these super side is similar.In violation data cleaning, belong to disobeying for the super side of same
Advise unit processing together.
Wherein, after completion the first surpasses the step of side is eliminated, again by whole violation units in initial collision hypergraph
It is retrieved in input abnormal data retrieval model, excludes the first to surpass in crack approach by eliminating, be corrected for normal data
Abnormal data and the ungratified incidence relation of abnormal data and the constraint relationship.
Wherein, by the abnormal data newly obtained and corresponding ungratified incidence relation and the constraint relationship, punching is rebuild
Prominent hypergraph, carries out loop iteration, finds the minimum vertex-covering point in the violation unit that new abnormal data is formed and the first surpasses side, and
Progress the first surpasses side elimination, until abnormal data only remaining minimum vertex-covering point.
Wherein, when abnormal data only remaining minimum vertex-covering point, for the ungratified whole the constraint relationships of minimum vertex-covering point, into
Row inversion operation, and the data obtained after minimum vertex-covering point reparation are inputted in abnormal data retrieval model, judge minimum vertex-covering
Whether the data that point obtains after repairing are abnormal data.
Wherein, if abnormal data retrieval model judge the data obtained after minimum vertex-covering point reparation for normal data, it is complete
The reparation of paired data library total data;If abnormal data retrieval model judges the data obtained after minimum vertex-covering point reparation to be different
Regular data then deletes the data, the reparation of database total data.
It is different from the prior art, parallel data cleaning method of the invention passes through building distributed parallel cleaning system
Overall architecture, conflicted with corresponding constraint composition hypergraph using the data cells for violating the constraint relationships all in data, counted
According to cleaning, and according to data cell in conflict hypergraph and the corresponding position constrained, the rapid data for being suitble to mass data is formed
Cleaning method.By means of the invention it is possible to which reaching data cleansing repairs speed faster, and algorithm complexity is lower, is suitble to a large amount of numbers
According to reparation.
Although the invention has been described by way of example and in terms of the preferred embodiments, but it is not for limiting the present invention, any this field
Technical staff without departing from the spirit and scope of the present invention, may be by the methods and technical content of the disclosure above to this hair
Bright technical solution makes possible variation and modification, therefore, anything that does not depart from the technical scheme of the invention, and according to the present invention
Technical spirit any simple modifications, equivalents, and modifications to the above embodiments, belong to technical solution of the present invention
Protection scope.
Claims (8)
1. a kind of parallel data cleaning method characterized by comprising
Using the incidence relation and the constraint relationship between data all in database, abnormal data retrieval model is constructed;Wherein, institute
The input of abnormal data retrieval model is stated as each data in database, is closed with the data correlation in abnormal data retrieval model
System and the constraint relationship are compared, if being unsatisfactory at least one of data correlation relation and the constraint relationship, by the data of input
Mould is retrieved as abnormal data, and using abnormal data and its ungratified whole incidence relations and the constraint relationship as abnormal data
The output of type;
According to the abnormal data and the ungratified the constraint relationship of each abnormal data of the output of abnormal data retrieval model, construct different
The hypergraph of regular data;Wherein, the super side using ungratified the constraint relationship as hypergraph, correspondence are unsatisfactory at least the one of the constraint relationship
The violation unit that a abnormal data is covered as super side;
It selects to be unsatisfactory for the violation unit most as the constraint relationship on super side as minimum vertex-covering point, finds minimum vertex-covering point
In the ungratified the constraint relationship of violation unit, the most the constraint relationship of violation unit of relationship is not only satisfied the constraint as first
Super side is discontented with by the inversion operation for the first surpassing the constraint relationship when violation unit covered in executes and the first surpasses after negating
The violation unit for the constraint relationship that foot the first surpasses side is changed into normal data, the first surpasses side elimination;
Loop iteration, until all super sides are eliminated, the violation unit of remaining minimum vertex-covering point, to the execution of minimum vertex-covering point and its
The inversion operation of current whole the constraint relationship, completes the reparation of whole abnormal datas.
2. parallel data cleaning method according to claim 1, which is characterized in that the abnormal data retrieval model meets
Formula:
Wherein,Any data in database to input abnormal data retrieval model,With dataRelated association
Relationship, PiIt is dataThe constraint relationship of satisfaction;If the data of inputIt is unsatisfactory for formula (1), then determines dataFor abnormal number
According to, while determining ungratified the constraint relationship.
3. parallel data cleaning method according to claim 1, which is characterized in that in establishing the step of conflicting hypergraph,
SettingSearching is unsatisfactory for any the constraint relationship PiViolation unit V={ v1,...,vn, the two is to deserved
To conflict hypergraph.
4. parallel data cleaning method according to claim 1, which is characterized in that in the step of selection the first surpasses side,
Comprising steps of
For the subdomain being unsatisfactory for when super all violation units in corresponding the constraint relationship constitute corresponding super;
In the subdomain that the violation unit that all super sides where counting minimum vertex-covering point surround is constituted, it is only unsatisfactory for a super side
The quantity of the violation data unit of corresponding the constraint relationship;
In the subdomain that the violation unit that all super sides where choosing minimum vertex-covering point surround is constituted, it is only unsatisfactory for a super side pair
Most super while as the first surpassing of the quantity of the violation unit for the constraint relationship answered.
5. parallel data cleaning method according to claim 4, which is characterized in that the first surpass the step of side is eliminated in completion
Later, whole violation units in initial collision hypergraph are inputted again in abnormal data retrieval model and is retrieved, exclude warp
Cross elimination the first surpass in crack approach, be corrected for normal data abnormal data and the ungratified incidence relation of abnormal data and
The constraint relationship.
6. parallel data cleaning method according to claim 5, which is characterized in that by the abnormal data newly obtained and correspondence
Ungratified incidence relation and the constraint relationship, rebuild conflict hypergraph, carry out loop iteration, find new abnormal data shape
At violation unit in minimum vertex-covering point and the first surpass side, and carry out the first surpassing side elimination, until abnormal data is only remaining minimum
Covering point.
7. parallel data cleaning method according to claim 6, which is characterized in that when abnormal data only remaining minimum vertex-covering point
When, for the ungratified whole the constraint relationships of minimum vertex-covering point, carry out inversion operation, and will obtain after minimum vertex-covering point reparation
Data input in abnormal data retrieval model, judge whether the data obtained after minimum vertex-covering point reparation are abnormal data.
8. parallel data cleaning method according to claim 7, which is characterized in that if the judgement of abnormal data retrieval model is most
The data that small covering point obtains after repairing are normal data, the then reparation of database total data;If abnormal data is examined
Rope model judges the data obtained after minimum vertex-covering point reparation for abnormal data, then deletes the data, and database is whole
The reparation of data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910161073.1A CN110069480B (en) | 2019-03-04 | 2019-03-04 | Parallel data cleaning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910161073.1A CN110069480B (en) | 2019-03-04 | 2019-03-04 | Parallel data cleaning method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110069480A true CN110069480A (en) | 2019-07-30 |
CN110069480B CN110069480B (en) | 2022-06-24 |
Family
ID=67366031
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910161073.1A Expired - Fee Related CN110069480B (en) | 2019-03-04 | 2019-03-04 | Parallel data cleaning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110069480B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050081194A1 (en) * | 2000-05-02 | 2005-04-14 | Microsoft Corporation | Methods for enhancing type reconstruction |
US20120137367A1 (en) * | 2009-11-06 | 2012-05-31 | Cataphora, Inc. | Continuous anomaly detection based on behavior modeling and heterogeneous information analysis |
WO2014000788A1 (en) * | 2012-06-27 | 2014-01-03 | Qatar Foundation | A method for cleaning data records in a database |
US20160140190A1 (en) * | 2014-11-04 | 2016-05-19 | Spatial Information Systems Research Limited | Data representation |
US20170193078A1 (en) * | 2016-01-06 | 2017-07-06 | International Business Machines Corporation | Hybrid method for anomaly Classification |
US20170364831A1 (en) * | 2016-06-21 | 2017-12-21 | Sri International | Systems and methods for machine learning using a trusted model |
CN107633099A (en) * | 2017-10-20 | 2018-01-26 | 西北工业大学 | The importance decision method of data base consistency(-tance) mistake |
US20180276261A1 (en) * | 2014-05-30 | 2018-09-27 | Georgetown University | Process and Framework For Facilitating Information Sharing Using a Distributed Hypergraph |
-
2019
- 2019-03-04 CN CN201910161073.1A patent/CN110069480B/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050081194A1 (en) * | 2000-05-02 | 2005-04-14 | Microsoft Corporation | Methods for enhancing type reconstruction |
US20120137367A1 (en) * | 2009-11-06 | 2012-05-31 | Cataphora, Inc. | Continuous anomaly detection based on behavior modeling and heterogeneous information analysis |
WO2014000788A1 (en) * | 2012-06-27 | 2014-01-03 | Qatar Foundation | A method for cleaning data records in a database |
US20180276261A1 (en) * | 2014-05-30 | 2018-09-27 | Georgetown University | Process and Framework For Facilitating Information Sharing Using a Distributed Hypergraph |
US20160140190A1 (en) * | 2014-11-04 | 2016-05-19 | Spatial Information Systems Research Limited | Data representation |
US20170193078A1 (en) * | 2016-01-06 | 2017-07-06 | International Business Machines Corporation | Hybrid method for anomaly Classification |
US20170364831A1 (en) * | 2016-06-21 | 2017-12-21 | Sri International | Systems and methods for machine learning using a trusted model |
CN107633099A (en) * | 2017-10-20 | 2018-01-26 | 西北工业大学 | The importance decision method of data base consistency(-tance) mistake |
Also Published As
Publication number | Publication date |
---|---|
CN110069480B (en) | 2022-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | On complexity and optimization of expensive queries in complex event processing | |
Dijkman et al. | Aligning business process models | |
Chen et al. | A choice relation framework for supporting category-partition test case generation | |
US9519862B2 (en) | Domains for knowledge-based data quality solution | |
US20130117202A1 (en) | Knowledge-based data quality solution | |
WO2019185039A1 (en) | A data processing method and electronic apparatus | |
CN111177322A (en) | Ontology model construction method of domain knowledge graph | |
Lee et al. | A survey on data cleaning methods for improved machine learning model performance | |
CN111241079A (en) | Data cleaning method and device and computer readable storage medium | |
Mahdavi et al. | Semi-Supervised Data Cleaning with Raha and Baran. | |
CN109634949A (en) | A kind of blended data cleaning method based on more versions of data | |
CN102799960A (en) | Parallel operation flow anomaly detection method oriented to data model | |
CN110363662A (en) | A kind of personal credit points-scoring system | |
WO2020259391A1 (en) | Database script performance testing method and device | |
Diamantopoulos et al. | Semantically-enriched Jira issue tracking data | |
Zhao et al. | Safe semi-supervised classification algorithm combined with active learning sampling strategy | |
CN110069480A (en) | A kind of parallel data cleaning method | |
CN116467437A (en) | Automatic flow modeling method for complex scene description | |
CN116523284A (en) | Automatic evaluation method and system for business operation flow based on machine learning | |
Karami et al. | Maintaining accurate web usage models using updates from activity diagrams | |
CN112395343B (en) | DSG-based field change data acquisition and extraction method | |
EP3306540A1 (en) | System and method for content affinity analytics | |
CN114035783A (en) | Software code knowledge graph construction method and tool | |
Palepu et al. | Meta data quality control architecture in data warehousing | |
JP2010267229A (en) | Association processing method and flow comparative processing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A parallel data cleaning method Effective date of registration: 20221008 Granted publication date: 20220624 Pledgee: China Co. truction Bank Corp Jiangmen branch Pledgor: Guangdong Heng Rui Science and Technology Ltd. s Registration number: Y2022980017520 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220624 |
|
CF01 | Termination of patent right due to non-payment of annual fee |