CN108197665A - A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search - Google Patents
A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search Download PDFInfo
- Publication number
- CN108197665A CN108197665A CN201810085728.7A CN201810085728A CN108197665A CN 108197665 A CN108197665 A CN 108197665A CN 201810085728 A CN201810085728 A CN 201810085728A CN 108197665 A CN108197665 A CN 108197665A
- Authority
- CN
- China
- Prior art keywords
- bayesian network
- node
- network structure
- algorithm
- father
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2111—Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Physiology (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Genetics & Genomics (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a kind of algorithm of Bayesian network structure learning based on parallel evolutionary search, belong to artificial intelligence field.Thought of this method based on EVOLUTIONARY COMPUTATION, to scoring search process parallelization processing, to realize efficient bayesian network structure learning.Genetic evolution algorithm is combined by the present invention by using Map Reduce technologies with structure searching process, the ability of multiple servers parallel computation to be made full use of to realize high-efficient rapid studying.Traditional genetic algorithm is applied to cloud computing by the present invention, utilize the ability and genetic algorithm concurrency and ability of searching optimum of distributed computing method processing mass data, bayesian network structure learning in mass data is quickly and efficiently carried out, is that current technology is no, there is substantive breakthrough.
Description
Technical field
The invention belongs to artificial intelligence fields, are related to a kind of bayesian network structure learning side searched for based on parallel evolutionary
Method
Background technology
Bayesian network is the mathematical model based on probability inference, is the extension of bayes method, is not know to know at present
Know one of expression and the most effective theoretical model in reasoning field.The information representation of Bayesian network is made of two parts, first, adopting
Represent the bayesian network structure of conditional independence information with directed acyclic graph, each node in network is represented in special domain
One variable, the connection between node represent mutual causality;First, conditional probability distribution function (or conditional probability table).
Bayesian network structure learning is under the premise of a data sample set is given, and finds one and training sample
Collection matches best network structure.The purpose of bayesian network structure learning is that the logic obtained in special domain between each variable is closed
System can obtain bayesian network structure by the searching algorithm that scores, and structure searching process is a NP hard problem.It is existing
There is solution to include K2 learning algorithms, Tan learning algorithms, Bayes's tolerance mechanism, conditional likelihood methods of marking etc..
The basic thought of Algorithm for Bayesian Networks Structure Learning based on scoring search is gone out from a basic network topology
Hair, structure is changed using certain searching algorithm, such as edged, subtract while, change while direction, use certain scoring
Function pair network structure scores, whether the result of scoring determines the reservation of the network.It is mainly concerned with two problems, one
It is the selection of score function;Second is that the selection of searching algorithm.
Under big data environment, data dimension is high, and sample space is big, and Bayesian network is carried out using common learning algorithm
Structure learning, process processing is cumbersome, and time-consuming.And unit processing capacity is limited, tends not to rapid within reasonable time obtain
To bayesian network structure, next step analysis and decision is significantly limited.
Existing domestic and international mainstream algorithm of Bayesian network structure learning is essentially all using serial process, while Bayes
Network structure study is NP hard problems, therefore, the performance of traditional bayesian network structure learning process under big data environment
It all can be than relatively low, it is difficult to obtain required bayesian network structure within reasonable time with efficiency.
Invention content
In view of this, the purpose of the present invention is to provide a kind of bayesian network structure learnings based on parallel evolutionary search
Method, the thought based on EVOLUTIONARY COMPUTATION, to scoring search process parallelization processing, to realize efficient bayesian network structure
It practises.Genetic evolution algorithm is combined by the present invention by using Map Reduce technologies with structure searching process, to make full use of
The ability of multiple servers parallel computation realizes high-efficient rapid studying.
In order to achieve the above objectives, the present invention provides following technical solution:
A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search, includes the following steps:
S1:Using Markov chain Monte Carlo (Markov Chain Monte Carlo, MCMC) to original sample
Notebook data carries out stochastical sampling;Processing optimizes random sampling procedure using gibbs method, causes in higher-dimension
Sampling efficiency higher;Consider the presence of transition probability, data from the sample survey is by several relatively independent data with preliminary conditional probability
Subset is formed;
S2:The S1 data subsets generated are subjected to gene code respectively, obtain initial population Xj, i.e., several are without edge graph;
S3:Evaluation function is defined using Bayes's methods of marking, i.e., using the principle G*=arg max of posterior probability maximum
P (G | D), wherein, D represents data set, and G represents a network structure on Bayesian network N;For arbitrary node xi,
If his father's set of node is πi, then the evaluation function of this bayesian network structure be defined as:
I represents node serial number in Bayesian network to be evaluated;J represents node xiDifferent father node collection serial numbers;qiFor πi's
Example;NijkFor variable X in data set DiValue take xikWhen, father node Π xiValue is πijNumber when a, wijRepresent the
J,
S4:Using the legal adopted evolutionary process of search by hill climbing;During evolutional operation, from initial population one by one to variable into
Row is investigated, and determines the father node of each node, generates the directed edge that child node is directed toward by father node;For Xj, obtained
Father node collection is combined into πj;Definition μ is variable father node number upper limit threshold, if | πj| < μ, then XjFather node number be less than
The defined upper limit, examination come XjAnd not in πjIn variable, therefrom select such XiSo that new family scoring is maximum;Then
Compare V againnewAnd VoldIf Vnew>Vold, then by XiIt is added to πjIn;
S5:Define evolution stop condition;If | πj| >=μ stops evolving.
The beneficial effects of the present invention are:
(1) present invention carries out random sampling using gibbs method, since acceptance probability is 1, markov transition rule
It does not need to carry out state and receives judgement, refuse this step without state, so fast convergence rate, random sampling is efficient.
(2) for the present invention using genetic algorithm, the operation in entire evolutionary process has randomness, but is different from complete
Random search can effectively utilize historical information follow-on desired value to be made to increase.Mechanism in this way, a generation
For genetic evolution, finally converge on a suitable individual.
(3) traditional genetic algorithm is applied in cloud computing by the present invention, and magnanimity number is handled using distributed computing method
According to ability and genetic algorithm concurrency and ability of searching optimum, quickly and efficiently carry out bayesian network structure in mass data
Study is current technology without having substantive breakthrough.
Description of the drawings
In order to make the purpose of the present invention, technical solution and advantageous effect clearer, the present invention provides drawings described below and carries out
Explanation:
Fig. 1 is the technology of the present invention block diagram.
Specific embodiment
Below in conjunction with attached drawing, the preferred embodiment of the present invention is described in detail.
The technology of the present invention block diagram is as shown in Figure 1, process flow is as follows:
S1:Stochastical sampling is carried out to raw sample data using Markov chain Monte Carlo (MCMC).The present invention
Processing optimizes random sampling procedure using gibbs method, can cause sampling efficiency higher in higher-dimension.By
In the presence of transition probability, therefore data from the sample survey is made of several relatively independent data subsets with preliminary conditional probability.
S2:The data subset that the first step generates is subjected to gene code respectively, obtains initial population, i.e., several are boundless
Figure.
S3:Evaluation function is defined using Bayes's methods of marking.That is the principle G*=arg max P (G of posterior probability maximum
| D), wherein, D represents data set, and G represents a network structure on Bayesian network N.For arbitrary node xiIf
His father's set of node is πi, then the evaluation function of this bayesian network structure be defined as:
S4:Using the legal adopted evolutionary process of search by hill climbing.During evolutional operation, from initial population one by one to variable into
Row is investigated, and determines the father node of each node, generates the directed edge that child node is directed toward by father node.For Xj, obtained
Father node collection is combined into πj.If | πj| < μ (definition μ be variable father node number upper limit threshold), then XjFather node number be less than
The defined upper limit, examination come XjAnd not in πjIn variable, therefrom select such XiSo that new family scoring is maximum.Then
Compare V againnewAnd VoldIf Vnew>Vold, then by XiIt is added to πjIn.
S5:Define evolution stop condition.If | πj| >=μ stops evolving.
Finally illustrate, preferred embodiment above is merely illustrative of the technical solution of the present invention and unrestricted, although logical
It crosses above preferred embodiment the present invention is described in detail, however, those skilled in the art should understand that, can be
Various changes are made to it in form and in details, without departing from claims of the present invention limited range.
Claims (1)
1. a kind of algorithm of Bayesian network structure learning based on parallel evolutionary search, it is characterised in that:This method includes following
Step:
S1:Using Markov chain Monte Carlo (MarkovChainMonteCarlo, MCMC) to raw sample data into
Row stochastical sampling;Processing optimizes random sampling procedure using gibbs method, in higher-dimension so that sampling efficiency
Higher;Consider the presence of transition probability, data from the sample survey is made of several relatively independent data subsets with preliminary conditional probability;
S2:The S1 data subsets generated are subjected to gene code respectively, obtain initial population Xj, i.e., several are without edge graph;
S3:Evaluation function is defined using Bayes's methods of marking, i.e., using the principle G of posterior probability maximum*=argmaxP (G |
D), wherein, D represents data set, and G represents a network structure on Bayesian network N;N={ X1,X2,…,Xn, wherein Xi
Value range beFor arbitrary node xiIf his father's set of node is πi, then
The evaluation function of this bayesian network structure is defined as:
I represents node serial number in Bayesian network to be evaluated;J represents node xiDifferent father node collection serial numbers;qiFor πiReality
Example;NijkFor variable X in data set DiValue take xikWhen, father node Π xiValue is πijNumber when a, wijRepresent jth
It is a,
S4:Using the legal adopted evolutionary process of search by hill climbing;During evolutional operation, variable is examined one by one from initial population
It examines, determines the father node of each node, generate the directed edge that child node is directed toward by father node;For Xj, obtained father section
Point set is combined into πj;Definition μ is variable father node number upper limit threshold, if | πj| < μ, then XjFather node number less than regulation
The upper limit, examination come XjAnd not in πjIn variable, therefrom select such XiSo that new family scoring is maximum;Then compare again
Compared with VnewAnd VoldIf Vnew>Vold, then by XiIt is added to πjIn;
S5:Define evolution stop condition;If | πj| >=μ stops evolving.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810085728.7A CN108197665A (en) | 2018-01-29 | 2018-01-29 | A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810085728.7A CN108197665A (en) | 2018-01-29 | 2018-01-29 | A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108197665A true CN108197665A (en) | 2018-06-22 |
Family
ID=62591136
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810085728.7A Pending CN108197665A (en) | 2018-01-29 | 2018-01-29 | A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108197665A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109002928A (en) * | 2018-08-13 | 2018-12-14 | 中国电力科学研究院有限公司 | A kind of electric load peak value prediction technique and device based on Bayesian network model |
CN109697512A (en) * | 2018-12-26 | 2019-04-30 | 东南大学 | Personal data analysis method and computer storage medium based on Bayesian network |
CN111008705A (en) * | 2019-12-06 | 2020-04-14 | 东软集团股份有限公司 | Searching method, device and equipment |
CN117474106A (en) * | 2023-10-24 | 2024-01-30 | 江南大学 | Bayesian network structure learning algorithm based on full-flow parallel genetic algorithm |
-
2018
- 2018-01-29 CN CN201810085728.7A patent/CN108197665A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109002928A (en) * | 2018-08-13 | 2018-12-14 | 中国电力科学研究院有限公司 | A kind of electric load peak value prediction technique and device based on Bayesian network model |
CN109697512A (en) * | 2018-12-26 | 2019-04-30 | 东南大学 | Personal data analysis method and computer storage medium based on Bayesian network |
CN109697512B (en) * | 2018-12-26 | 2023-10-27 | 东南大学 | Personal data analysis method based on Bayesian network and computer storage medium |
CN111008705A (en) * | 2019-12-06 | 2020-04-14 | 东软集团股份有限公司 | Searching method, device and equipment |
CN111008705B (en) * | 2019-12-06 | 2024-02-13 | 东软集团股份有限公司 | Searching method, device and equipment |
CN117474106A (en) * | 2023-10-24 | 2024-01-30 | 江南大学 | Bayesian network structure learning algorithm based on full-flow parallel genetic algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hassib et al. | An imbalanced big data mining framework for improving optimization algorithms performance | |
Zhong et al. | Applying big data based deep learning system to intrusion detection | |
Telikani et al. | A survey of evolutionary computation for association rule mining | |
Li et al. | PS–ABC: A hybrid algorithm based on particle swarm and artificial bee colony for high-dimensional optimization problems | |
De la Hoz et al. | Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps | |
Wu et al. | Attribute weighting via differential evolution algorithm for attribute weighted naive bayes (wnb) | |
Laskey et al. | Population markov chain monte carlo | |
CN108197665A (en) | A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search | |
Wu et al. | Adaptive spammer detection with sparse group modeling | |
Gasse et al. | An experimental comparison of hybrid algorithms for Bayesian network structure learning | |
Akojwar et al. | A novel probabilistic-PSO based learning algorithm for optimization of neural networks for benchmark problems | |
Singla et al. | Approximate lifting techniques for belief propagation | |
Pandey et al. | Data clustering using hybrid improved cuckoo search method | |
Abasi et al. | A text feature selection technique based on binary multi-verse optimizer for text clustering | |
Zhou et al. | Probabilistic graphical models parameter learning with transferred prior and constraints | |
Liu et al. | Scaling up probabilistic circuits by latent variable distillation | |
Qiao et al. | A framework for multi-prototype based federated learning: Towards the edge intelligence | |
Castellana et al. | The infinite contextual graph markov model | |
Gao et al. | Raftgp: Random fast graph partitioning | |
Liang et al. | A new hybrid ant colony optimization based on brain storm optimization for feature selection | |
Xue et al. | Hybrid resampling and weighted majority voting for multi-class anomaly detection on imbalanced malware and network traffic data | |
Chu et al. | A binary superior tracking artificial bee colony with dynamic Cauchy mutation for feature selection | |
Shafia et al. | A hybrid algorithm for data clustering using honey bee algorithm, genetic algorithm and k-means method | |
Leng et al. | An effective multi-level algorithm based on ant colony optimization for bisecting graph | |
Kumar et al. | Result merging in meta-search engine using genetic algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180622 |
|
RJ01 | Rejection of invention patent application after publication |