CN109491991B - Unsupervised automatic data cleaning method - Google Patents
Unsupervised automatic data cleaning method Download PDFInfo
- Publication number
- CN109491991B CN109491991B CN201811325335.5A CN201811325335A CN109491991B CN 109491991 B CN109491991 B CN 109491991B CN 201811325335 A CN201811325335 A CN 201811325335A CN 109491991 B CN109491991 B CN 109491991B
- Authority
- CN
- China
- Prior art keywords
- data
- rule
- reasoning
- attributes
- generating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
Abstract
The invention discloses an unsupervised automatic data cleaning method, which comprises the following steps: A. learning a data model, namely learning the dependency relationship among attributes from original data possibly containing invalid data, and finding out the implicit non-absolute or relatively weak dependency relationship to obtain the data model represented in the form of a Bayesian network; B. generating a data cleaning rule; generating a data cleaning rule after obtaining the original data or a complete data model of original data sampling, and specifically generating a predicate and a first-order predicate rule; C. b, generating a Markov logic network based on the predicate generated in the step B and the first-order predicate rule; D. and C, generating a reasoning rule based on the Markov logic network generated in the step C and cleaning data based on a reasoning result. The method can effectively improve the data quality of each business system of the company under the condition of not consuming a large amount of manpower and material resources, and is beneficial to a management layer to make a correct decision.
Description
Technical Field
The invention relates to the technical field of data management, in particular to an unsupervised automatic data cleaning method.
Background
Real world data is typically cleaned (hereinafter, cleaning is defined as dirty data), as it may contain values such as inconsistent, noisy, incomplete, or repetitive. In the commercial world, erroneous data can cause significant economic losses. For example, incorrect customer information may lead to incorrect delivery of goods purchased by the customer by the company, which not only increases the delivery cost of the enterprise, but also has a significant negative impact on the image of the enterprise over a relatively long period of time.
Among the existing data cleaning methods, some methods need heavy manual participation in the data cleaning process, such as providing suggestions for cleaning or confirming repair; some methods do not need manual participation in the cleaning process, but need to make relevant cleaning rules in advance. The existing data cleaning method is not suitable for the situation that the data rule is unknown or the labor cost is hard to bear. In consideration of the current situation of the current data cleaning method, the method solves the problem that data cleaning is carried out without defining cleaning rules in advance and without manual participation, and improves data quality.
Disclosure of Invention
The invention aims to overcome the defects in the background art, and provides an unsupervised automatic data cleaning method, which is used for learning rules from data based on statistical relationship learning and cleaning the data based on probabilistic reasoning, so that the data cleaning efficiency and effect can be effectively improved, the data quality of each business system of a company can be effectively improved under the condition of not consuming a large amount of manpower and material resources, the satisfaction degree of a user is improved, and meanwhile, a management layer can make a correct decision based on the improved data quality.
In order to achieve the technical effects, the invention adopts the following technical scheme:
an unsupervised automatic data cleaning method comprises the following steps:
A. learning a data model, namely learning the dependency relationship among attributes from original data possibly containing invalid data, and finding out the implicit non-absolute or relatively weak dependency relationship to obtain the data model represented in the form of a Bayesian network;
B. generating a data cleaning rule; generating a data cleaning rule after obtaining the original data or a complete data model of original data sampling, and specifically generating a predicate and a first-order predicate rule, namely a first-order predicate logic expression;
C. b, generating a Markov logic network based on the predicate generated in the step B and the first-order predicate rule;
D. and C, generating a reasoning rule based on the Markov logic network generated in the step C and cleaning data based on a reasoning result.
Further, the step a specifically includes:
A1. evaluating and sampling data to be repaired, namely original data possibly containing invalid data;
A2. learning the original data set or the sampled data set to obtain a structure of a data model expressed in a Bayesian network form; the structure of the bayesian network reflects the dependencies and degrees of dependencies between data attributes,
A3. learning an original data set or a sampled data set to obtain parameters of a data model, wherein the specific form of the parameters is a conditional probability table of a dependency relationship;
A4. and combining the structure of the data model and the parameters of the data model to obtain the complete data model. Further, the step B specifically includes:
B1. defining a relationship constant for representing a relationship between the bodies;
B2. generating a corresponding first-order predicate logic expression according to the complete data model obtained in the step A4: specifically, predicates and first-order predicate rules are generated according to the Bayesian network obtained through learning, and conversion rules for converting the dependency relationship into the first-order predicate logic expression are respectively formulated according to different conditions that a single attribute points to one attribute and a plurality of attributes point to one attribute.
Further, in the step B2;
when a single attribute is pointed to an attribute, attribute A1And A2There is a directed edge between and from A1Point of direction A2Then A will be1And A2The dependency relationship between them is formalized as the first order predicate logic as follows:
where v is tuple id1And id2The A attribute value of (1);
when one attribute is pointed to for a plurality of attributes, attribute A1、A2、…、AiPointing simultaneously to AjThen its dependency is formalized as a first order predicate logic as follows:
wherein v is1、v2、…、viIs a tuple id1And id2At attribute A1、A2、…、AiIs determined. Further, the step C specifically includes:
C1. distinguishing the generated first-order predicate rules into absolute rules and non-absolute rules according to whether the first-order predicate rules are logic effective expressions or not, namely, under any explanation, the probability is 1;
C2. calculating the weight of the first-order predicate logic, which comprises respectively making different weight calculation strategies aiming at absolute rules and non-absolute rules, wherein the weight of the absolute rules is assigned as positive infinity, and the weights of the rules are calculated by using mutual information for the non-absolute rules;
C3. according to the first-order predicate rule generated in the step B2, calculating the weight of the rule based on mutual information between the attributes related to the rule;
C4. from the weight calculation result in step C3, a markov logic network of the original data set or the sampled data set is obtained.
Further, the step C3 specifically includes:
c3.1, aiming at different conditions that a first-order predicate logic rule relates to two attributes and a plurality of attributes, respectively formulating different rule weight calculation methods; wherein the content of the first and second substances,
aiming at the condition that a first-order predicate logic rule relates to two attributes, the mutual information of the two attributes on an original data set or a sampled data set is used for calculating rule weight;
the mutual information is a real number with a value range between 0 and 1, if the attributes are completely correlated, the mutual information is 1, if the attributes are completely uncorrelated, the mutual information is 0, if the rule relates to the two attributes, the mutual information is a statistical average value of joint probability densities of the two attribute variables, and the statistical average value is used as a first-order predicate logic rule to relate to the weights of the two attributes, if the weight is higher, the correlation is strong, and the interpretability is strong; since the first-order predicate logic rule relates to the discrete characteristics of the attributes, the mutual information is defined as:
where P (x, y) is a joint probability distribution function, and P (x) and P (y) are edge probability density functions
C3.2 when the regular weight is calculated, introducing an exponential function for calculation, and ensuring that the weight result is a number not less than 0, wherein the introduced exponential function is a potential function of several attributes characterized by a non-negative real function, is equivalent to the weighted characteristic quantity of several attribute characteristics, and plays a role in normalization, and the formula is as follows:
further, the step D specifically includes:
D1. reasoning is carried out based on the Markov logic network generated in the step C4, rule reasoning is carried out by adopting a Gibbs sampling method in the Markov chain Monte Carlo, and the weight of the Gibbs sampling reasoning rule is determined according to the rule of Gibbs sampling reasoning generated by the Markov logic network;
D2. constructing a Gibbs sampling reasoning model, using a factor graph as the Gibbs sampling reasoning model, and determining variables and factors of the factor graph in the reasoning model, wherein the factors are used for evaluating the relation between the variables;
D3. constructing possible worlds of the variables according to the predicates generated in step B2;
D4. reasoning about the possible world of the predicate of step D3 according to the reasoning model constructed in step D2;
D5. and D4, cleaning and repairing the original data set based on the reasoning result of the step D4.
Further, in the step D5, the maximum value is selected as the value after the repair.
Compared with the prior art, the invention has the following beneficial effects:
the unsupervised automatic data cleaning method is an unsupervised automatic data cleaning method based on statistical relationship learning, manual intervention is not needed when data cleaning is carried out, so that the labor cost of data cleaning can be greatly saved, and meanwhile, because rule discovery is automatically carried out from original data containing dirty data, a data quality rule does not need to be established in advance. The unsupervised automatic data cleaning method can effectively improve the data cleaning effect, improve the data accuracy and improve the data cleaning efficiency.
Drawings
FIG. 1 is a block diagram of the unsupervised data auto-cleaning method of the present invention.
Detailed Description
The invention will be further elucidated and described with reference to the embodiments of the invention described hereinafter.
Example (b):
as shown in fig. 1, an unsupervised automatic data cleaning method can implement data cleaning under the condition of lacking data quality mode/rule and without human intervention, and simultaneously ensure the effect and efficiency of data cleaning.
The method specifically comprises the following steps:
s10, learning of a data model:
to find the implicit patterns/rules, the dependencies between attributes need to be learned from the raw data that may contain invalid data. Since invalid data may exist, absolute or strong dependency relationships between the attributes of the data table do not necessarily exist, and the data model is obtained by finding out implicit non-absolute or relatively weak dependency relationships and representing the dependency relationships in the form of a Bayesian network.
The key flow extracted in the step is as follows:
s101, evaluating and sampling data to be repaired;
s102, learning an original data set or a sampled data set to obtain a structure of a data model expressed in a Bayesian network form, wherein the specific form is the Bayesian network;
s103, learning the original data set or the sampled data set to obtain parameters of a data model, wherein the specific form of the parameters is a conditional probability table of a dependency relationship;
and S104, combining the structures and parameters of the data models in the step S102 and the step S103 to obtain a complete data model.
S20, generating a data cleaning rule:
after a complete data model of the raw data or raw data samples is obtained, generation of data cleansing rules is performed.
The data cleaning rule generation comprises the following main steps:
s201, defining a relation constant. The relation constants contain relations among a plurality of elements and are mainly used for representing relations among the main bodies, and the relation constants such as equivalence, matching and the like need to be defined in the step.
And S202, generating a corresponding first-order predicate logic expression according to the data model.
The Bayesian network is a reflection of the dependency between attributes in the relational table if node N1Point to N2Then, it represents N2To some extent dependent on N1. Based on the consideration, a first-order predicate logic is constructed according to the Bayesian network obtained by learning.
Assume attribute A1And A2There is a directed edge between and from A1Point of direction A2Then A can be substituted1And A2The dependency relationship between the two is formalized as the following first-order predicate logic expression:
where v is tuple id1And id2The a attribute value of (1).
If there are multiple attributes pointing to an attribute, e.g., attribute A1、A2、…、AiPointing simultaneously to AjThen the dependencies between them can also be formalized as first order predicate logic as follows:
wherein v is1、v2、…、viIs a tuple id1And id2At attribute A1、A2、…、AiIs determined.
In other words, in this step, conversion rules for converting the dependency relationship into the first-order predicate logic expression are respectively formulated for different situations where a single attribute points to one attribute and multiple attributes point to one attribute, and predicates and the first-order predicate rules are automatically generated according to the complete data model obtained in S104.
And S30, generating the Markov logic network based on the predicate generated in the step S202 and the first-order predicate rule.
The markov logic network defines a probability distribution over a possible world, which in the context of data cleansing refers to the possible repair of erroneous data. The markov logic network includes first order predicate logic rules and corresponding weights. The weight is the reflection of the satisfaction degree of the first-order predicate logic, and the larger the weight is, the higher the satisfaction degree of the first-order predicate logic is.
And S301, distinguishing the generated first-order predicate rules into absolute rules and non-absolute rules, specifically, distinguishing the generated first-order predicate rules into the absolute rules and the non-absolute rules according to whether the first-order predicate rules are logic effective expressions or not, namely, the probability is 1 under any explanation.
S302, calculating the weight of the first-order predicate logic.
And respectively making different weight calculation strategies according to the absolute rule and the non-absolute rule. For absolute rules, the weight assignment is positive infinity. Non-absolute rules belong to approximate fulfilment, for which mutual information is used to calculate the weights of these rules. Each approximately satisfied first-order predicate logic rule is a reflection of the dependency relationship between the attributes in the relationship table, and the degree of dependency relationship is expressed by calculating mutual information between the attributes.
And S303, calculating the weight of the rule based on mutual information between the attributes related to the rule according to the first-order predicate rule generated in the step S302.
Aiming at different conditions that a first-order logic rule relates to two attributes and a plurality of attributes, different rule weight calculation methods are respectively formulated.
For the case that a first-order logic rule relates to two attributes, the rule weight is calculated by utilizing mutual information of the two attributes on the original data set or the original data set sample.
Mutual information is a real number with a value range between 0 and 1, and if the attributes are completely correlated, the mutual information is 1, and if the attributes are completely uncorrelated, the mutual information is 0. If the rule relates to two attributes, the mutual information is a statistical average value of the joint probability density of the two attribute variables, and the statistical average value is used as the weight of the first-order predicate logic rule related to the two attributes, and if the weight is higher, the correlation is strong, and the interpretability is strong; since the first-order predicate logic rule relates to the discrete characteristics of the attributes, the mutual information is defined as:
where P (x, y) is a joint probability distribution function, and P (x) and P (y) are edge probability density functions.
When the rule weight is calculated, an exponential function is introduced for calculation, the weight result is ensured to be a number which is more than or equal to 0, and the obtained weight can better reflect the dependency relationship among the attributes. Because the introduced exponential function is a potential function of several attributes characterized by non-negative real functions, which is equivalent to the weighted characteristic quantity of several attribute characteristics, the normalization is performed, and the formula is as follows:
meanwhile, with the increase of mutual information, the weight is increased exponentially, so that the effect of a high-weight rule in the data cleaning process can be increased, and the data cleaning effect is improved.
S304, according to the weight calculation result in the step S303, the original data or the Markov logic network of the original data sampling is automatically obtained.
And S40, generating a reasoning rule based on the Markov logic network generated in the step S304 and cleaning data based on a reasoning result.
The method specifically comprises the following steps:
s401, reasoning is carried out based on the Markov logic network generated in the step S304, and regular reasoning is carried out by adopting a Gibbs sampling method in the Markov chain Monte Carlo. And determining the weight of the Gibbs sampling inference rule according to the Gibbs sampling inference rule generated by the Markov logic network.
S402, constructing a Gibbs sampling reasoning model.
The factor graph is used as a gibbs sampling inference model. Variables and factors of a factor graph in the inference model are determined, and the factors are used for evaluating the relation between the variables.
S403, constructing possible worlds of variables based on the predicates generated in the step S202, wherein the possible worlds are the basis of reasoning.
S404, reasoning is carried out on the possible world of the predicate of the step S403 based on the reasoning model constructed in the step S402.
S405, cleaning and repairing the original data set based on the inference result of the step S404. For each data to be repaired, the expected maximum value is selected as the repaired value.
In summary, the unsupervised automatic data cleaning method is an unsupervised automatic data cleaning method based on statistical relationship learning, manual intervention is not needed during data cleaning, so that the labor cost of data cleaning can be greatly saved, and meanwhile, the rule discovery is automatically carried out from original data containing dirty data, so that the data quality rule does not need to be established in advance. The unsupervised automatic data cleaning method can effectively improve the data cleaning effect, improve the data accuracy and improve the data cleaning efficiency.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.
Claims (8)
1. An unsupervised automatic data cleaning method is characterized by comprising the following steps:
A. learning a data model, namely learning the dependency relationship among attributes from original data possibly containing invalid data, and finding out the implicit non-absolute or relatively weak dependency relationship to obtain the data model represented in the form of a Bayesian network;
B. generating a data cleaning rule; generating a data cleaning rule after obtaining the original data or a complete data model of original data sampling, and specifically generating a predicate and a first-order predicate rule;
C. b, generating a Markov logic network based on the predicate generated in the step B and the first-order predicate rule;
D. and C, generating a reasoning rule based on the Markov logic network generated in the step C and cleaning data based on a reasoning result.
2. The unsupervised automatic data cleaning method according to claim 1, wherein the step a specifically comprises:
A1. evaluating and sampling data to be repaired, namely original data possibly containing invalid data;
A2. learning the original data set or the sampled data set to obtain a structure of a data model expressed in a Bayesian network form;
A3. learning an original data set or a sampled data set to obtain parameters of a data model, wherein the specific form of the parameters is a conditional probability table of a dependency relationship;
A4. and combining the structure of the data model and the parameters of the data model to obtain the complete data model.
3. The unsupervised automatic data cleaning method according to claim 2, wherein the step B specifically comprises:
B1. defining a relationship constant for representing a relationship between the bodies;
B2. generating a corresponding first-order predicate logic expression according to the complete data model obtained in the step A4: specifically, the method comprises the steps of generating predicates and first-order predicate rules, namely first-order predicate logic expressions, according to the Bayesian network obtained through learning, and respectively formulating conversion rules for converting the dependency relationship into the first-order predicate logic expressions according to different conditions that a single attribute points to one attribute and a plurality of attributes point to one attribute.
4. An unsupervised automatic data washing method as claimed in claim 3, wherein in step B2;
when a single attribute is pointed to an attribute, attribute A1And A2There is a directed edge between and from A1Point of direction A2Then A will be1And A2The dependency relationship between them is formalized as the first order predicate logic as follows:
where v is tuple id1And id2The A attribute value of (1);
when one attribute is pointed to for a plurality of attributes, attribute A1、A2、…、AiPointing simultaneously to AjThen its dependency is formalized as a first order predicate logic as follows:
wherein v is1、v2、…、viIs a tuple id1And id2At attribute A1、A2、…、AiIs determined.
5. The unsupervised automatic data cleaning method according to claim 3, wherein the step C specifically comprises:
C1. distinguishing the generated first-order predicate rules into absolute rules and non-absolute rules;
C2. calculating the weight of the first-order predicate logic, which comprises respectively making different weight calculation strategies aiming at absolute rules and non-absolute rules, wherein the weight of the absolute rules is assigned as positive infinity, and the weights of the rules are calculated by using mutual information for the non-absolute rules;
C3. according to the first-order predicate rule generated in the step B2, calculating the weight of the rule based on mutual information between the attributes related to the rule;
C4. from the weight calculation result in step C3, a markov logic network of the original data set or the sampled data set is obtained.
6. The unsupervised automatic data washing method according to claim 5, wherein the step C3 specifically includes:
c3.1, aiming at the condition that a first-order predicate logic rule relates to two attributes, utilizing mutual information of the two attributes on an original data set or a sampled data set to calculate rule weight;
the mutual information is a real number with a value range between 0 and 1, if the attributes are completely correlated, the mutual information is 1, and if the attributes are completely uncorrelated, the mutual information is 0;
and C3.2, when the rule weight is calculated, introducing an exponential function for calculation, and ensuring that the weight result is a number not less than 0.
7. The unsupervised automatic data cleaning method according to claim 6, wherein the step D specifically comprises:
D1. reasoning is carried out based on the Markov logic network generated in the step C4, rule reasoning is carried out by adopting a Gibbs sampling method in the Markov chain Monte Carlo, and the weight of the Gibbs sampling reasoning rule is determined according to the rule of Gibbs sampling reasoning generated by the Markov logic network;
D2. constructing a Gibbs sampling reasoning model, using a factor graph as the Gibbs sampling reasoning model, and determining variables and factors of the factor graph in the reasoning model, wherein the factors are used for evaluating the relation between the variables;
D3. constructing possible worlds of the variables according to the predicates generated in step B2;
D4. reasoning about the possible world of the predicate of step D3 according to the reasoning model constructed in step D2;
D5. and D4, cleaning and repairing the original data set based on the reasoning result of the step D4.
8. The method according to claim 7, wherein the repairing in step D5 is performed by selecting the expected maximum value as the repaired value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811325335.5A CN109491991B (en) | 2018-11-08 | 2018-11-08 | Unsupervised automatic data cleaning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811325335.5A CN109491991B (en) | 2018-11-08 | 2018-11-08 | Unsupervised automatic data cleaning method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109491991A CN109491991A (en) | 2019-03-19 |
CN109491991B true CN109491991B (en) | 2022-03-01 |
Family
ID=65695410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811325335.5A Active CN109491991B (en) | 2018-11-08 | 2018-11-08 | Unsupervised automatic data cleaning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109491991B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117610541A (en) * | 2024-01-17 | 2024-02-27 | 之江实验室 | Author disambiguation method and device for large-scale data and readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105046559A (en) * | 2015-09-10 | 2015-11-11 | 河海大学 | Bayesian network and mutual information-based client credit scoring method |
KR20160115515A (en) * | 2015-03-27 | 2016-10-06 | 금오공과대학교 산학협력단 | A user behavior prediction System and Method for using mobile-based Life log |
CN106094744A (en) * | 2016-06-04 | 2016-11-09 | 上海大学 | The determination method of thermoelectricity factory owner's operational factor desired value based on association rule mining |
CN106528634A (en) * | 2016-10-11 | 2017-03-22 | 武汉理工大学 | Mass RFID (Radio Frequency Identification) data intelligent cleaning method and system oriented to workshop manufacturing process |
CN106779087A (en) * | 2016-11-30 | 2017-05-31 | 福建亿榕信息技术有限公司 | A kind of general-purpose machinery learning data analysis platform |
CN108304668A (en) * | 2018-02-11 | 2018-07-20 | 河海大学 | A kind of Forecasting Flood method of combination hydrologic process data and history priori data |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11205103B2 (en) * | 2016-12-09 | 2021-12-21 | The Research Foundation for the State University | Semisupervised autoencoder for sentiment analysis |
-
2018
- 2018-11-08 CN CN201811325335.5A patent/CN109491991B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20160115515A (en) * | 2015-03-27 | 2016-10-06 | 금오공과대학교 산학협력단 | A user behavior prediction System and Method for using mobile-based Life log |
CN105046559A (en) * | 2015-09-10 | 2015-11-11 | 河海大学 | Bayesian network and mutual information-based client credit scoring method |
CN106094744A (en) * | 2016-06-04 | 2016-11-09 | 上海大学 | The determination method of thermoelectricity factory owner's operational factor desired value based on association rule mining |
CN106528634A (en) * | 2016-10-11 | 2017-03-22 | 武汉理工大学 | Mass RFID (Radio Frequency Identification) data intelligent cleaning method and system oriented to workshop manufacturing process |
CN106779087A (en) * | 2016-11-30 | 2017-05-31 | 福建亿榕信息技术有限公司 | A kind of general-purpose machinery learning data analysis platform |
CN108304668A (en) * | 2018-02-11 | 2018-07-20 | 河海大学 | A kind of Forecasting Flood method of combination hydrologic process data and history priori data |
Non-Patent Citations (3)
Title |
---|
BayesWipe: A multimodal system for data cleaning and consistent query answering on structured bigdata;Sushovan De 等;《2014 IEEE International Conference on Big Data (Big Data)》;20150108;15-24 * |
基于一阶逻辑的知识表示与自动提取;王勇;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20160315;I138-7989 * |
基于概率图模型的数据清洗;段亮;《中国优秀硕士学位论文全文数据库 信息科技辑》;20141215;I138-236 * |
Also Published As
Publication number | Publication date |
---|---|
CN109491991A (en) | 2019-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110413494B (en) | LightGBM fault diagnosis method for improving Bayesian optimization | |
CN110263230B (en) | Data cleaning method and device based on density clustering | |
Miller et al. | Automatic test data generation using genetic algorithm and program dependence graphs | |
Greiner et al. | Learning Bayesian nets that perform well | |
CN110335168B (en) | Method and system for optimizing power utilization information acquisition terminal fault prediction model based on GRU | |
CN103810104A (en) | Method and system for optimizing software test case | |
CN111832101A (en) | Construction method of cement strength prediction model and cement strength prediction method | |
CN106951963B (en) | Knowledge refining method and device | |
CN113962145A (en) | Parameter uncertainty quantitative modeling method under interval data sample condition | |
Wang et al. | On the use of time series and search based software engineering for refactoring recommendation | |
CN109491991B (en) | Unsupervised automatic data cleaning method | |
Bukhtoyarov et al. | A comprehensive evolutionary approach for neural network ensembles automatic design | |
Hong et al. | Confidence-conditioned value functions for offline reinforcement learning | |
Stützle et al. | Automatic (offline) configuration of algorithms | |
CN115169555A (en) | Edge attack network disruption method based on deep reinforcement learning | |
Dalibard et al. | Faster improvement rate population based training | |
CN109947752A (en) | A kind of automaticdata cleaning method based on DeepDive | |
Castle et al. | Some forecasting principles from the M4 competition | |
CN110705889A (en) | Enterprise screening method, device, equipment and storage medium | |
Berden et al. | Learning max-sat models from examples using genetic algorithms and knowledge compilation | |
CN114169763A (en) | Measuring instrument demand prediction method, system, computing device and storage medium | |
Lederman et al. | Learning heuristics for quantified boolean formulas through deep reinforcement learning | |
Borgelt | A conditional independence algorithm for learning undirected graphical models | |
Iba et al. | GP-RVM: Genetic programing-based symbolic regression using relevance vector machine | |
Xiang et al. | Learning tractable NAT-modeled Bayesian networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |