CN108197665A - A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search - Google Patents

A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search Download PDF

Info

Publication number
CN108197665A
CN108197665A CN201810085728.7A CN201810085728A CN108197665A CN 108197665 A CN108197665 A CN 108197665A CN 201810085728 A CN201810085728 A CN 201810085728A CN 108197665 A CN108197665 A CN 108197665A
Authority
CN
China
Prior art keywords
bayesian network
node
network structure
algorithm
father
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810085728.7A
Other languages
Chinese (zh)
Inventor
林小光
钟坤华
孙启龙
张矩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Institute of Green and Intelligent Technology of CAS
Original Assignee
Chongqing Institute of Green and Intelligent Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Institute of Green and Intelligent Technology of CAS filed Critical Chongqing Institute of Green and Intelligent Technology of CAS
Priority to CN201810085728.7A priority Critical patent/CN108197665A/en
Publication of CN108197665A publication Critical patent/CN108197665A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2111Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physiology (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a kind of algorithm of Bayesian network structure learning based on parallel evolutionary search, belong to artificial intelligence field.Thought of this method based on EVOLUTIONARY COMPUTATION, to scoring search process parallelization processing, to realize efficient bayesian network structure learning.Genetic evolution algorithm is combined by the present invention by using Map Reduce technologies with structure searching process, the ability of multiple servers parallel computation to be made full use of to realize high-efficient rapid studying.Traditional genetic algorithm is applied to cloud computing by the present invention, utilize the ability and genetic algorithm concurrency and ability of searching optimum of distributed computing method processing mass data, bayesian network structure learning in mass data is quickly and efficiently carried out, is that current technology is no, there is substantive breakthrough.

Description

A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search
Technical field
The invention belongs to artificial intelligence fields, are related to a kind of bayesian network structure learning side searched for based on parallel evolutionary Method
Background technology
Bayesian network is the mathematical model based on probability inference, is the extension of bayes method, is not know to know at present Know one of expression and the most effective theoretical model in reasoning field.The information representation of Bayesian network is made of two parts, first, adopting Represent the bayesian network structure of conditional independence information with directed acyclic graph, each node in network is represented in special domain One variable, the connection between node represent mutual causality;First, conditional probability distribution function (or conditional probability table).
Bayesian network structure learning is under the premise of a data sample set is given, and finds one and training sample Collection matches best network structure.The purpose of bayesian network structure learning is that the logic obtained in special domain between each variable is closed System can obtain bayesian network structure by the searching algorithm that scores, and structure searching process is a NP hard problem.It is existing There is solution to include K2 learning algorithms, Tan learning algorithms, Bayes's tolerance mechanism, conditional likelihood methods of marking etc..
The basic thought of Algorithm for Bayesian Networks Structure Learning based on scoring search is gone out from a basic network topology Hair, structure is changed using certain searching algorithm, such as edged, subtract while, change while direction, use certain scoring Function pair network structure scores, whether the result of scoring determines the reservation of the network.It is mainly concerned with two problems, one It is the selection of score function;Second is that the selection of searching algorithm.
Under big data environment, data dimension is high, and sample space is big, and Bayesian network is carried out using common learning algorithm Structure learning, process processing is cumbersome, and time-consuming.And unit processing capacity is limited, tends not to rapid within reasonable time obtain To bayesian network structure, next step analysis and decision is significantly limited.
Existing domestic and international mainstream algorithm of Bayesian network structure learning is essentially all using serial process, while Bayes Network structure study is NP hard problems, therefore, the performance of traditional bayesian network structure learning process under big data environment It all can be than relatively low, it is difficult to obtain required bayesian network structure within reasonable time with efficiency.
Invention content
In view of this, the purpose of the present invention is to provide a kind of bayesian network structure learnings based on parallel evolutionary search Method, the thought based on EVOLUTIONARY COMPUTATION, to scoring search process parallelization processing, to realize efficient bayesian network structure It practises.Genetic evolution algorithm is combined by the present invention by using Map Reduce technologies with structure searching process, to make full use of The ability of multiple servers parallel computation realizes high-efficient rapid studying.
In order to achieve the above objectives, the present invention provides following technical solution:
A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search, includes the following steps:
S1:Using Markov chain Monte Carlo (Markov Chain Monte Carlo, MCMC) to original sample Notebook data carries out stochastical sampling;Processing optimizes random sampling procedure using gibbs method, causes in higher-dimension Sampling efficiency higher;Consider the presence of transition probability, data from the sample survey is by several relatively independent data with preliminary conditional probability Subset is formed;
S2:The S1 data subsets generated are subjected to gene code respectively, obtain initial population Xj, i.e., several are without edge graph;
S3:Evaluation function is defined using Bayes's methods of marking, i.e., using the principle G*=arg max of posterior probability maximum P (G | D), wherein, D represents data set, and G represents a network structure on Bayesian network N;For arbitrary node xi, If his father's set of node is πi, then the evaluation function of this bayesian network structure be defined as:
I represents node serial number in Bayesian network to be evaluated;J represents node xiDifferent father node collection serial numbers;qiFor πi's Example;NijkFor variable X in data set DiValue take xikWhen, father node Π xiValue is πijNumber when a, wijRepresent the J,
S4:Using the legal adopted evolutionary process of search by hill climbing;During evolutional operation, from initial population one by one to variable into Row is investigated, and determines the father node of each node, generates the directed edge that child node is directed toward by father node;For Xj, obtained Father node collection is combined into πj;Definition μ is variable father node number upper limit threshold, if | πj| < μ, then XjFather node number be less than The defined upper limit, examination come XjAnd not in πjIn variable, therefrom select such XiSo that new family scoring is maximum;Then Compare V againnewAnd VoldIf Vnew>Vold, then by XiIt is added to πjIn;
S5:Define evolution stop condition;If | πj| >=μ stops evolving.
The beneficial effects of the present invention are:
(1) present invention carries out random sampling using gibbs method, since acceptance probability is 1, markov transition rule It does not need to carry out state and receives judgement, refuse this step without state, so fast convergence rate, random sampling is efficient.
(2) for the present invention using genetic algorithm, the operation in entire evolutionary process has randomness, but is different from complete Random search can effectively utilize historical information follow-on desired value to be made to increase.Mechanism in this way, a generation For genetic evolution, finally converge on a suitable individual.
(3) traditional genetic algorithm is applied in cloud computing by the present invention, and magnanimity number is handled using distributed computing method According to ability and genetic algorithm concurrency and ability of searching optimum, quickly and efficiently carry out bayesian network structure in mass data Study is current technology without having substantive breakthrough.
Description of the drawings
In order to make the purpose of the present invention, technical solution and advantageous effect clearer, the present invention provides drawings described below and carries out Explanation:
Fig. 1 is the technology of the present invention block diagram.
Specific embodiment
Below in conjunction with attached drawing, the preferred embodiment of the present invention is described in detail.
The technology of the present invention block diagram is as shown in Figure 1, process flow is as follows:
S1:Stochastical sampling is carried out to raw sample data using Markov chain Monte Carlo (MCMC).The present invention Processing optimizes random sampling procedure using gibbs method, can cause sampling efficiency higher in higher-dimension.By In the presence of transition probability, therefore data from the sample survey is made of several relatively independent data subsets with preliminary conditional probability.
S2:The data subset that the first step generates is subjected to gene code respectively, obtains initial population, i.e., several are boundless Figure.
S3:Evaluation function is defined using Bayes's methods of marking.That is the principle G*=arg max P (G of posterior probability maximum | D), wherein, D represents data set, and G represents a network structure on Bayesian network N.For arbitrary node xiIf His father's set of node is πi, then the evaluation function of this bayesian network structure be defined as:
S4:Using the legal adopted evolutionary process of search by hill climbing.During evolutional operation, from initial population one by one to variable into Row is investigated, and determines the father node of each node, generates the directed edge that child node is directed toward by father node.For Xj, obtained Father node collection is combined into πj.If | πj| < μ (definition μ be variable father node number upper limit threshold), then XjFather node number be less than The defined upper limit, examination come XjAnd not in πjIn variable, therefrom select such XiSo that new family scoring is maximum.Then Compare V againnewAnd VoldIf Vnew>Vold, then by XiIt is added to πjIn.
S5:Define evolution stop condition.If | πj| >=μ stops evolving.
Finally illustrate, preferred embodiment above is merely illustrative of the technical solution of the present invention and unrestricted, although logical It crosses above preferred embodiment the present invention is described in detail, however, those skilled in the art should understand that, can be Various changes are made to it in form and in details, without departing from claims of the present invention limited range.

Claims (1)

1. a kind of algorithm of Bayesian network structure learning based on parallel evolutionary search, it is characterised in that:This method includes following Step:
S1:Using Markov chain Monte Carlo (MarkovChainMonteCarlo, MCMC) to raw sample data into Row stochastical sampling;Processing optimizes random sampling procedure using gibbs method, in higher-dimension so that sampling efficiency Higher;Consider the presence of transition probability, data from the sample survey is made of several relatively independent data subsets with preliminary conditional probability;
S2:The S1 data subsets generated are subjected to gene code respectively, obtain initial population Xj, i.e., several are without edge graph;
S3:Evaluation function is defined using Bayes's methods of marking, i.e., using the principle G of posterior probability maximum*=argmaxP (G | D), wherein, D represents data set, and G represents a network structure on Bayesian network N;N={ X1,X2,…,Xn, wherein Xi Value range beFor arbitrary node xiIf his father's set of node is πi, then The evaluation function of this bayesian network structure is defined as:
I represents node serial number in Bayesian network to be evaluated;J represents node xiDifferent father node collection serial numbers;qiFor πiReality Example;NijkFor variable X in data set DiValue take xikWhen, father node Π xiValue is πijNumber when a, wijRepresent jth It is a,
S4:Using the legal adopted evolutionary process of search by hill climbing;During evolutional operation, variable is examined one by one from initial population It examines, determines the father node of each node, generate the directed edge that child node is directed toward by father node;For Xj, obtained father section Point set is combined into πj;Definition μ is variable father node number upper limit threshold, if | πj| < μ, then XjFather node number less than regulation The upper limit, examination come XjAnd not in πjIn variable, therefrom select such XiSo that new family scoring is maximum;Then compare again Compared with VnewAnd VoldIf Vnew>Vold, then by XiIt is added to πjIn;
S5:Define evolution stop condition;If | πj| >=μ stops evolving.
CN201810085728.7A 2018-01-29 2018-01-29 A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search Pending CN108197665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810085728.7A CN108197665A (en) 2018-01-29 2018-01-29 A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810085728.7A CN108197665A (en) 2018-01-29 2018-01-29 A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search

Publications (1)

Publication Number Publication Date
CN108197665A true CN108197665A (en) 2018-06-22

Family

ID=62591136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810085728.7A Pending CN108197665A (en) 2018-01-29 2018-01-29 A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search

Country Status (1)

Country Link
CN (1) CN108197665A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002928A (en) * 2018-08-13 2018-12-14 中国电力科学研究院有限公司 A kind of electric load peak value prediction technique and device based on Bayesian network model
CN109697512A (en) * 2018-12-26 2019-04-30 东南大学 Personal data analysis method and computer storage medium based on Bayesian network
CN111008705A (en) * 2019-12-06 2020-04-14 东软集团股份有限公司 Searching method, device and equipment
CN117474106A (en) * 2023-10-24 2024-01-30 江南大学 Bayesian network structure learning algorithm based on full-flow parallel genetic algorithm

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002928A (en) * 2018-08-13 2018-12-14 中国电力科学研究院有限公司 A kind of electric load peak value prediction technique and device based on Bayesian network model
CN109697512A (en) * 2018-12-26 2019-04-30 东南大学 Personal data analysis method and computer storage medium based on Bayesian network
CN109697512B (en) * 2018-12-26 2023-10-27 东南大学 Personal data analysis method based on Bayesian network and computer storage medium
CN111008705A (en) * 2019-12-06 2020-04-14 东软集团股份有限公司 Searching method, device and equipment
CN111008705B (en) * 2019-12-06 2024-02-13 东软集团股份有限公司 Searching method, device and equipment
CN117474106A (en) * 2023-10-24 2024-01-30 江南大学 Bayesian network structure learning algorithm based on full-flow parallel genetic algorithm

Similar Documents

Publication Publication Date Title
Hassib et al. An imbalanced big data mining framework for improving optimization algorithms performance
Zhong et al. Applying big data based deep learning system to intrusion detection
Telikani et al. A survey of evolutionary computation for association rule mining
Li et al. PS–ABC: A hybrid algorithm based on particle swarm and artificial bee colony for high-dimensional optimization problems
De la Hoz et al. Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps
Wu et al. Attribute weighting via differential evolution algorithm for attribute weighted naive bayes (wnb)
Laskey et al. Population markov chain monte carlo
CN108197665A (en) A kind of algorithm of Bayesian network structure learning based on parallel evolutionary search
Wu et al. Adaptive spammer detection with sparse group modeling
Gasse et al. An experimental comparison of hybrid algorithms for Bayesian network structure learning
Akojwar et al. A novel probabilistic-PSO based learning algorithm for optimization of neural networks for benchmark problems
Singla et al. Approximate lifting techniques for belief propagation
Pandey et al. Data clustering using hybrid improved cuckoo search method
Abasi et al. A text feature selection technique based on binary multi-verse optimizer for text clustering
Zhou et al. Probabilistic graphical models parameter learning with transferred prior and constraints
Liu et al. Scaling up probabilistic circuits by latent variable distillation
Qiao et al. A framework for multi-prototype based federated learning: Towards the edge intelligence
Castellana et al. The infinite contextual graph markov model
Gao et al. Raftgp: Random fast graph partitioning
Liang et al. A new hybrid ant colony optimization based on brain storm optimization for feature selection
Xue et al. Hybrid resampling and weighted majority voting for multi-class anomaly detection on imbalanced malware and network traffic data
Chu et al. A binary superior tracking artificial bee colony with dynamic Cauchy mutation for feature selection
Shafia et al. A hybrid algorithm for data clustering using honey bee algorithm, genetic algorithm and k-means method
Leng et al. An effective multi-level algorithm based on ant colony optimization for bisecting graph
Kumar et al. Result merging in meta-search engine using genetic algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180622

RJ01 Rejection of invention patent application after publication