CN109491991B - Unsupervised automatic data cleaning method - Google Patents

Unsupervised automatic data cleaning method Download PDF

Info

Publication number
CN109491991B
CN109491991B CN201811325335.5A CN201811325335A CN109491991B CN 109491991 B CN109491991 B CN 109491991B CN 201811325335 A CN201811325335 A CN 201811325335A CN 109491991 B CN109491991 B CN 109491991B
Authority
CN
China
Prior art keywords
data
rule
reasoning
attributes
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811325335.5A
Other languages
Chinese (zh)
Other versions
CN109491991A (en
Inventor
李玲
唐军
吴纯彬
于跃
陈秋宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN201811325335.5A priority Critical patent/CN109491991B/en
Publication of CN109491991A publication Critical patent/CN109491991A/en
Application granted granted Critical
Publication of CN109491991B publication Critical patent/CN109491991B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models

Abstract

The invention discloses an unsupervised automatic data cleaning method, which comprises the following steps: A. learning a data model, namely learning the dependency relationship among attributes from original data possibly containing invalid data, and finding out the implicit non-absolute or relatively weak dependency relationship to obtain the data model represented in the form of a Bayesian network; B. generating a data cleaning rule; generating a data cleaning rule after obtaining the original data or a complete data model of original data sampling, and specifically generating a predicate and a first-order predicate rule; C. b, generating a Markov logic network based on the predicate generated in the step B and the first-order predicate rule; D. and C, generating a reasoning rule based on the Markov logic network generated in the step C and cleaning data based on a reasoning result. The method can effectively improve the data quality of each business system of the company under the condition of not consuming a large amount of manpower and material resources, and is beneficial to a management layer to make a correct decision.

Description

Unsupervised automatic data cleaning method
Technical Field
The invention relates to the technical field of data management, in particular to an unsupervised automatic data cleaning method.
Background
Real world data is typically cleaned (hereinafter, cleaning is defined as dirty data), as it may contain values such as inconsistent, noisy, incomplete, or repetitive. In the commercial world, erroneous data can cause significant economic losses. For example, incorrect customer information may lead to incorrect delivery of goods purchased by the customer by the company, which not only increases the delivery cost of the enterprise, but also has a significant negative impact on the image of the enterprise over a relatively long period of time.
Among the existing data cleaning methods, some methods need heavy manual participation in the data cleaning process, such as providing suggestions for cleaning or confirming repair; some methods do not need manual participation in the cleaning process, but need to make relevant cleaning rules in advance. The existing data cleaning method is not suitable for the situation that the data rule is unknown or the labor cost is hard to bear. In consideration of the current situation of the current data cleaning method, the method solves the problem that data cleaning is carried out without defining cleaning rules in advance and without manual participation, and improves data quality.
Disclosure of Invention
The invention aims to overcome the defects in the background art, and provides an unsupervised automatic data cleaning method, which is used for learning rules from data based on statistical relationship learning and cleaning the data based on probabilistic reasoning, so that the data cleaning efficiency and effect can be effectively improved, the data quality of each business system of a company can be effectively improved under the condition of not consuming a large amount of manpower and material resources, the satisfaction degree of a user is improved, and meanwhile, a management layer can make a correct decision based on the improved data quality.
In order to achieve the technical effects, the invention adopts the following technical scheme:
an unsupervised automatic data cleaning method comprises the following steps:
A. learning a data model, namely learning the dependency relationship among attributes from original data possibly containing invalid data, and finding out the implicit non-absolute or relatively weak dependency relationship to obtain the data model represented in the form of a Bayesian network;
B. generating a data cleaning rule; generating a data cleaning rule after obtaining the original data or a complete data model of original data sampling, and specifically generating a predicate and a first-order predicate rule, namely a first-order predicate logic expression;
C. b, generating a Markov logic network based on the predicate generated in the step B and the first-order predicate rule;
D. and C, generating a reasoning rule based on the Markov logic network generated in the step C and cleaning data based on a reasoning result.
Further, the step a specifically includes:
A1. evaluating and sampling data to be repaired, namely original data possibly containing invalid data;
A2. learning the original data set or the sampled data set to obtain a structure of a data model expressed in a Bayesian network form; the structure of the bayesian network reflects the dependencies and degrees of dependencies between data attributes,
A3. learning an original data set or a sampled data set to obtain parameters of a data model, wherein the specific form of the parameters is a conditional probability table of a dependency relationship;
A4. and combining the structure of the data model and the parameters of the data model to obtain the complete data model. Further, the step B specifically includes:
B1. defining a relationship constant for representing a relationship between the bodies;
B2. generating a corresponding first-order predicate logic expression according to the complete data model obtained in the step A4: specifically, predicates and first-order predicate rules are generated according to the Bayesian network obtained through learning, and conversion rules for converting the dependency relationship into the first-order predicate logic expression are respectively formulated according to different conditions that a single attribute points to one attribute and a plurality of attributes point to one attribute.
Further, in the step B2;
when a single attribute is pointed to an attribute, attribute A1And A2There is a directed edge between and from A1Point of direction A2Then A will be1And A2The dependency relationship between them is formalized as the first order predicate logic as follows:
Figure BDA0001858580930000031
where v is tuple id1And id2The A attribute value of (1);
when one attribute is pointed to for a plurality of attributes, attribute A1、A2、…、AiPointing simultaneously to AjThen its dependency is formalized as a first order predicate logic as follows:
Figure BDA0001858580930000032
wherein v is1、v2、…、viIs a tuple id1And id2At attribute A1、A2、…、AiIs determined. Further, the step C specifically includes:
C1. distinguishing the generated first-order predicate rules into absolute rules and non-absolute rules according to whether the first-order predicate rules are logic effective expressions or not, namely, under any explanation, the probability is 1;
C2. calculating the weight of the first-order predicate logic, which comprises respectively making different weight calculation strategies aiming at absolute rules and non-absolute rules, wherein the weight of the absolute rules is assigned as positive infinity, and the weights of the rules are calculated by using mutual information for the non-absolute rules;
C3. according to the first-order predicate rule generated in the step B2, calculating the weight of the rule based on mutual information between the attributes related to the rule;
C4. from the weight calculation result in step C3, a markov logic network of the original data set or the sampled data set is obtained.
Further, the step C3 specifically includes:
c3.1, aiming at different conditions that a first-order predicate logic rule relates to two attributes and a plurality of attributes, respectively formulating different rule weight calculation methods; wherein the content of the first and second substances,
aiming at the condition that a first-order predicate logic rule relates to two attributes, the mutual information of the two attributes on an original data set or a sampled data set is used for calculating rule weight;
the mutual information is a real number with a value range between 0 and 1, if the attributes are completely correlated, the mutual information is 1, if the attributes are completely uncorrelated, the mutual information is 0, if the rule relates to the two attributes, the mutual information is a statistical average value of joint probability densities of the two attribute variables, and the statistical average value is used as a first-order predicate logic rule to relate to the weights of the two attributes, if the weight is higher, the correlation is strong, and the interpretability is strong; since the first-order predicate logic rule relates to the discrete characteristics of the attributes, the mutual information is defined as:
Figure BDA0001858580930000041
where P (x, y) is a joint probability distribution function, and P (x) and P (y) are edge probability density functions
C3.2 when the regular weight is calculated, introducing an exponential function for calculation, and ensuring that the weight result is a number not less than 0, wherein the introduced exponential function is a potential function of several attributes characterized by a non-negative real function, is equivalent to the weighted characteristic quantity of several attribute characteristics, and plays a role in normalization, and the formula is as follows:
Figure BDA0001858580930000042
further, the step D specifically includes:
D1. reasoning is carried out based on the Markov logic network generated in the step C4, rule reasoning is carried out by adopting a Gibbs sampling method in the Markov chain Monte Carlo, and the weight of the Gibbs sampling reasoning rule is determined according to the rule of Gibbs sampling reasoning generated by the Markov logic network;
D2. constructing a Gibbs sampling reasoning model, using a factor graph as the Gibbs sampling reasoning model, and determining variables and factors of the factor graph in the reasoning model, wherein the factors are used for evaluating the relation between the variables;
D3. constructing possible worlds of the variables according to the predicates generated in step B2;
D4. reasoning about the possible world of the predicate of step D3 according to the reasoning model constructed in step D2;
D5. and D4, cleaning and repairing the original data set based on the reasoning result of the step D4.
Further, in the step D5, the maximum value is selected as the value after the repair.
Compared with the prior art, the invention has the following beneficial effects:
the unsupervised automatic data cleaning method is an unsupervised automatic data cleaning method based on statistical relationship learning, manual intervention is not needed when data cleaning is carried out, so that the labor cost of data cleaning can be greatly saved, and meanwhile, because rule discovery is automatically carried out from original data containing dirty data, a data quality rule does not need to be established in advance. The unsupervised automatic data cleaning method can effectively improve the data cleaning effect, improve the data accuracy and improve the data cleaning efficiency.
Drawings
FIG. 1 is a block diagram of the unsupervised data auto-cleaning method of the present invention.
Detailed Description
The invention will be further elucidated and described with reference to the embodiments of the invention described hereinafter.
Example (b):
as shown in fig. 1, an unsupervised automatic data cleaning method can implement data cleaning under the condition of lacking data quality mode/rule and without human intervention, and simultaneously ensure the effect and efficiency of data cleaning.
The method specifically comprises the following steps:
s10, learning of a data model:
to find the implicit patterns/rules, the dependencies between attributes need to be learned from the raw data that may contain invalid data. Since invalid data may exist, absolute or strong dependency relationships between the attributes of the data table do not necessarily exist, and the data model is obtained by finding out implicit non-absolute or relatively weak dependency relationships and representing the dependency relationships in the form of a Bayesian network.
The key flow extracted in the step is as follows:
s101, evaluating and sampling data to be repaired;
s102, learning an original data set or a sampled data set to obtain a structure of a data model expressed in a Bayesian network form, wherein the specific form is the Bayesian network;
s103, learning the original data set or the sampled data set to obtain parameters of a data model, wherein the specific form of the parameters is a conditional probability table of a dependency relationship;
and S104, combining the structures and parameters of the data models in the step S102 and the step S103 to obtain a complete data model.
S20, generating a data cleaning rule:
after a complete data model of the raw data or raw data samples is obtained, generation of data cleansing rules is performed.
The data cleaning rule generation comprises the following main steps:
s201, defining a relation constant. The relation constants contain relations among a plurality of elements and are mainly used for representing relations among the main bodies, and the relation constants such as equivalence, matching and the like need to be defined in the step.
And S202, generating a corresponding first-order predicate logic expression according to the data model.
The Bayesian network is a reflection of the dependency between attributes in the relational table if node N1Point to N2Then, it represents N2To some extent dependent on N1. Based on the consideration, a first-order predicate logic is constructed according to the Bayesian network obtained by learning.
Assume attribute A1And A2There is a directed edge between and from A1Point of direction A2Then A can be substituted1And A2The dependency relationship between the two is formalized as the following first-order predicate logic expression:
Figure BDA0001858580930000071
where v is tuple id1And id2The a attribute value of (1).
If there are multiple attributes pointing to an attribute, e.g., attribute A1、A2、…、AiPointing simultaneously to AjThen the dependencies between them can also be formalized as first order predicate logic as follows:
Figure BDA0001858580930000072
wherein v is1、v2、…、viIs a tuple id1And id2At attribute A1、A2、…、AiIs determined.
In other words, in this step, conversion rules for converting the dependency relationship into the first-order predicate logic expression are respectively formulated for different situations where a single attribute points to one attribute and multiple attributes point to one attribute, and predicates and the first-order predicate rules are automatically generated according to the complete data model obtained in S104.
And S30, generating the Markov logic network based on the predicate generated in the step S202 and the first-order predicate rule.
The markov logic network defines a probability distribution over a possible world, which in the context of data cleansing refers to the possible repair of erroneous data. The markov logic network includes first order predicate logic rules and corresponding weights. The weight is the reflection of the satisfaction degree of the first-order predicate logic, and the larger the weight is, the higher the satisfaction degree of the first-order predicate logic is.
And S301, distinguishing the generated first-order predicate rules into absolute rules and non-absolute rules, specifically, distinguishing the generated first-order predicate rules into the absolute rules and the non-absolute rules according to whether the first-order predicate rules are logic effective expressions or not, namely, the probability is 1 under any explanation.
S302, calculating the weight of the first-order predicate logic.
And respectively making different weight calculation strategies according to the absolute rule and the non-absolute rule. For absolute rules, the weight assignment is positive infinity. Non-absolute rules belong to approximate fulfilment, for which mutual information is used to calculate the weights of these rules. Each approximately satisfied first-order predicate logic rule is a reflection of the dependency relationship between the attributes in the relationship table, and the degree of dependency relationship is expressed by calculating mutual information between the attributes.
And S303, calculating the weight of the rule based on mutual information between the attributes related to the rule according to the first-order predicate rule generated in the step S302.
Aiming at different conditions that a first-order logic rule relates to two attributes and a plurality of attributes, different rule weight calculation methods are respectively formulated.
For the case that a first-order logic rule relates to two attributes, the rule weight is calculated by utilizing mutual information of the two attributes on the original data set or the original data set sample.
Mutual information is a real number with a value range between 0 and 1, and if the attributes are completely correlated, the mutual information is 1, and if the attributes are completely uncorrelated, the mutual information is 0. If the rule relates to two attributes, the mutual information is a statistical average value of the joint probability density of the two attribute variables, and the statistical average value is used as the weight of the first-order predicate logic rule related to the two attributes, and if the weight is higher, the correlation is strong, and the interpretability is strong; since the first-order predicate logic rule relates to the discrete characteristics of the attributes, the mutual information is defined as:
Figure BDA0001858580930000081
where P (x, y) is a joint probability distribution function, and P (x) and P (y) are edge probability density functions.
When the rule weight is calculated, an exponential function is introduced for calculation, the weight result is ensured to be a number which is more than or equal to 0, and the obtained weight can better reflect the dependency relationship among the attributes. Because the introduced exponential function is a potential function of several attributes characterized by non-negative real functions, which is equivalent to the weighted characteristic quantity of several attribute characteristics, the normalization is performed, and the formula is as follows:
Figure BDA0001858580930000091
meanwhile, with the increase of mutual information, the weight is increased exponentially, so that the effect of a high-weight rule in the data cleaning process can be increased, and the data cleaning effect is improved.
S304, according to the weight calculation result in the step S303, the original data or the Markov logic network of the original data sampling is automatically obtained.
And S40, generating a reasoning rule based on the Markov logic network generated in the step S304 and cleaning data based on a reasoning result.
The method specifically comprises the following steps:
s401, reasoning is carried out based on the Markov logic network generated in the step S304, and regular reasoning is carried out by adopting a Gibbs sampling method in the Markov chain Monte Carlo. And determining the weight of the Gibbs sampling inference rule according to the Gibbs sampling inference rule generated by the Markov logic network.
S402, constructing a Gibbs sampling reasoning model.
The factor graph is used as a gibbs sampling inference model. Variables and factors of a factor graph in the inference model are determined, and the factors are used for evaluating the relation between the variables.
S403, constructing possible worlds of variables based on the predicates generated in the step S202, wherein the possible worlds are the basis of reasoning.
S404, reasoning is carried out on the possible world of the predicate of the step S403 based on the reasoning model constructed in the step S402.
S405, cleaning and repairing the original data set based on the inference result of the step S404. For each data to be repaired, the expected maximum value is selected as the repaired value.
In summary, the unsupervised automatic data cleaning method is an unsupervised automatic data cleaning method based on statistical relationship learning, manual intervention is not needed during data cleaning, so that the labor cost of data cleaning can be greatly saved, and meanwhile, the rule discovery is automatically carried out from original data containing dirty data, so that the data quality rule does not need to be established in advance. The unsupervised automatic data cleaning method can effectively improve the data cleaning effect, improve the data accuracy and improve the data cleaning efficiency.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (8)

1. An unsupervised automatic data cleaning method is characterized by comprising the following steps:
A. learning a data model, namely learning the dependency relationship among attributes from original data possibly containing invalid data, and finding out the implicit non-absolute or relatively weak dependency relationship to obtain the data model represented in the form of a Bayesian network;
B. generating a data cleaning rule; generating a data cleaning rule after obtaining the original data or a complete data model of original data sampling, and specifically generating a predicate and a first-order predicate rule;
C. b, generating a Markov logic network based on the predicate generated in the step B and the first-order predicate rule;
D. and C, generating a reasoning rule based on the Markov logic network generated in the step C and cleaning data based on a reasoning result.
2. The unsupervised automatic data cleaning method according to claim 1, wherein the step a specifically comprises:
A1. evaluating and sampling data to be repaired, namely original data possibly containing invalid data;
A2. learning the original data set or the sampled data set to obtain a structure of a data model expressed in a Bayesian network form;
A3. learning an original data set or a sampled data set to obtain parameters of a data model, wherein the specific form of the parameters is a conditional probability table of a dependency relationship;
A4. and combining the structure of the data model and the parameters of the data model to obtain the complete data model.
3. The unsupervised automatic data cleaning method according to claim 2, wherein the step B specifically comprises:
B1. defining a relationship constant for representing a relationship between the bodies;
B2. generating a corresponding first-order predicate logic expression according to the complete data model obtained in the step A4: specifically, the method comprises the steps of generating predicates and first-order predicate rules, namely first-order predicate logic expressions, according to the Bayesian network obtained through learning, and respectively formulating conversion rules for converting the dependency relationship into the first-order predicate logic expressions according to different conditions that a single attribute points to one attribute and a plurality of attributes point to one attribute.
4. An unsupervised automatic data washing method as claimed in claim 3, wherein in step B2;
when a single attribute is pointed to an attribute, attribute A1And A2There is a directed edge between and from A1Point of direction A2Then A will be1And A2The dependency relationship between them is formalized as the first order predicate logic as follows:
Figure FDA0003439266800000021
where v is tuple id1And id2The A attribute value of (1);
when one attribute is pointed to for a plurality of attributes, attribute A1、A2、…、AiPointing simultaneously to AjThen its dependency is formalized as a first order predicate logic as follows:
Figure FDA0003439266800000022
wherein v is1、v2、…、viIs a tuple id1And id2At attribute A1、A2、…、AiIs determined.
5. The unsupervised automatic data cleaning method according to claim 3, wherein the step C specifically comprises:
C1. distinguishing the generated first-order predicate rules into absolute rules and non-absolute rules;
C2. calculating the weight of the first-order predicate logic, which comprises respectively making different weight calculation strategies aiming at absolute rules and non-absolute rules, wherein the weight of the absolute rules is assigned as positive infinity, and the weights of the rules are calculated by using mutual information for the non-absolute rules;
C3. according to the first-order predicate rule generated in the step B2, calculating the weight of the rule based on mutual information between the attributes related to the rule;
C4. from the weight calculation result in step C3, a markov logic network of the original data set or the sampled data set is obtained.
6. The unsupervised automatic data washing method according to claim 5, wherein the step C3 specifically includes:
c3.1, aiming at the condition that a first-order predicate logic rule relates to two attributes, utilizing mutual information of the two attributes on an original data set or a sampled data set to calculate rule weight;
the mutual information is a real number with a value range between 0 and 1, if the attributes are completely correlated, the mutual information is 1, and if the attributes are completely uncorrelated, the mutual information is 0;
and C3.2, when the rule weight is calculated, introducing an exponential function for calculation, and ensuring that the weight result is a number not less than 0.
7. The unsupervised automatic data cleaning method according to claim 6, wherein the step D specifically comprises:
D1. reasoning is carried out based on the Markov logic network generated in the step C4, rule reasoning is carried out by adopting a Gibbs sampling method in the Markov chain Monte Carlo, and the weight of the Gibbs sampling reasoning rule is determined according to the rule of Gibbs sampling reasoning generated by the Markov logic network;
D2. constructing a Gibbs sampling reasoning model, using a factor graph as the Gibbs sampling reasoning model, and determining variables and factors of the factor graph in the reasoning model, wherein the factors are used for evaluating the relation between the variables;
D3. constructing possible worlds of the variables according to the predicates generated in step B2;
D4. reasoning about the possible world of the predicate of step D3 according to the reasoning model constructed in step D2;
D5. and D4, cleaning and repairing the original data set based on the reasoning result of the step D4.
8. The method according to claim 7, wherein the repairing in step D5 is performed by selecting the expected maximum value as the repaired value.
CN201811325335.5A 2018-11-08 2018-11-08 Unsupervised automatic data cleaning method Active CN109491991B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811325335.5A CN109491991B (en) 2018-11-08 2018-11-08 Unsupervised automatic data cleaning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811325335.5A CN109491991B (en) 2018-11-08 2018-11-08 Unsupervised automatic data cleaning method

Publications (2)

Publication Number Publication Date
CN109491991A CN109491991A (en) 2019-03-19
CN109491991B true CN109491991B (en) 2022-03-01

Family

ID=65695410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811325335.5A Active CN109491991B (en) 2018-11-08 2018-11-08 Unsupervised automatic data cleaning method

Country Status (1)

Country Link
CN (1) CN109491991B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117610541A (en) * 2024-01-17 2024-02-27 之江实验室 Author disambiguation method and device for large-scale data and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046559A (en) * 2015-09-10 2015-11-11 河海大学 Bayesian network and mutual information-based client credit scoring method
KR20160115515A (en) * 2015-03-27 2016-10-06 금오공과대학교 산학협력단 A user behavior prediction System and Method for using mobile-based Life log
CN106094744A (en) * 2016-06-04 2016-11-09 上海大学 The determination method of thermoelectricity factory owner's operational factor desired value based on association rule mining
CN106528634A (en) * 2016-10-11 2017-03-22 武汉理工大学 Mass RFID (Radio Frequency Identification) data intelligent cleaning method and system oriented to workshop manufacturing process
CN106779087A (en) * 2016-11-30 2017-05-31 福建亿榕信息技术有限公司 A kind of general-purpose machinery learning data analysis platform
CN108304668A (en) * 2018-02-11 2018-07-20 河海大学 A kind of Forecasting Flood method of combination hydrologic process data and history priori data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11205103B2 (en) * 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160115515A (en) * 2015-03-27 2016-10-06 금오공과대학교 산학협력단 A user behavior prediction System and Method for using mobile-based Life log
CN105046559A (en) * 2015-09-10 2015-11-11 河海大学 Bayesian network and mutual information-based client credit scoring method
CN106094744A (en) * 2016-06-04 2016-11-09 上海大学 The determination method of thermoelectricity factory owner's operational factor desired value based on association rule mining
CN106528634A (en) * 2016-10-11 2017-03-22 武汉理工大学 Mass RFID (Radio Frequency Identification) data intelligent cleaning method and system oriented to workshop manufacturing process
CN106779087A (en) * 2016-11-30 2017-05-31 福建亿榕信息技术有限公司 A kind of general-purpose machinery learning data analysis platform
CN108304668A (en) * 2018-02-11 2018-07-20 河海大学 A kind of Forecasting Flood method of combination hydrologic process data and history priori data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BayesWipe: A multimodal system for data cleaning and consistent query answering on structured bigdata;Sushovan De 等;《2014 IEEE International Conference on Big Data (Big Data)》;20150108;15-24 *
基于一阶逻辑的知识表示与自动提取;王勇;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20160315;I138-7989 *
基于概率图模型的数据清洗;段亮;《中国优秀硕士学位论文全文数据库 信息科技辑》;20141215;I138-236 *

Also Published As

Publication number Publication date
CN109491991A (en) 2019-03-19

Similar Documents

Publication Publication Date Title
CN110413494B (en) LightGBM fault diagnosis method for improving Bayesian optimization
CN110263230B (en) Data cleaning method and device based on density clustering
Miller et al. Automatic test data generation using genetic algorithm and program dependence graphs
Greiner et al. Learning Bayesian nets that perform well
CN110335168B (en) Method and system for optimizing power utilization information acquisition terminal fault prediction model based on GRU
CN103810104A (en) Method and system for optimizing software test case
CN111832101A (en) Construction method of cement strength prediction model and cement strength prediction method
CN106951963B (en) Knowledge refining method and device
CN113962145A (en) Parameter uncertainty quantitative modeling method under interval data sample condition
Wang et al. On the use of time series and search based software engineering for refactoring recommendation
CN109491991B (en) Unsupervised automatic data cleaning method
Bukhtoyarov et al. A comprehensive evolutionary approach for neural network ensembles automatic design
Hong et al. Confidence-conditioned value functions for offline reinforcement learning
Stützle et al. Automatic (offline) configuration of algorithms
CN115169555A (en) Edge attack network disruption method based on deep reinforcement learning
Dalibard et al. Faster improvement rate population based training
CN109947752A (en) A kind of automaticdata cleaning method based on DeepDive
Castle et al. Some forecasting principles from the M4 competition
CN110705889A (en) Enterprise screening method, device, equipment and storage medium
Berden et al. Learning max-sat models from examples using genetic algorithms and knowledge compilation
CN114169763A (en) Measuring instrument demand prediction method, system, computing device and storage medium
Lederman et al. Learning heuristics for quantified boolean formulas through deep reinforcement learning
Borgelt A conditional independence algorithm for learning undirected graphical models
Iba et al. GP-RVM: Genetic programing-based symbolic regression using relevance vector machine
Xiang et al. Learning tractable NAT-modeled Bayesian networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant