CN102999477B

CN102999477B - A kind of parallel sorting method based on MCMC

Info

Publication number: CN102999477B
Application number: CN201210563427.3A
Authority: CN
Inventors: 迟学斌; 周纯葆; 郎显宇; 王珏; 邓笋根
Original assignee: Computer Network Information Center of CAS
Current assignee: Computer Network Information Center of CAS
Priority date: 2012-12-21
Filing date: 2012-12-21
Publication date: 2016-05-25
Anticipated expiration: 2032-12-21
Also published as: CN102999477A

Abstract

The invention discloses a kind of parallel sorting method based on MCMC, comprising: calculate likelihood according to original state and estimate; Calculate the posterior probability of parameter according to likelihood; Carry out MCMC simulation trial according to posterior probability, taking current state as basis, produce new state; Calculate acceptance probability according to new state, and produce the first random number, in the time that the first random number is less than acceptance probability, the state in next moment is new state, otherwise keeps current state constant; Produce the markovian label that in same column processor, preparation exchanges; When the Markov Chain in processor has participated in exchange, calculate exchange probability, and produce the second random number, the comparative result of judgement exchange probability and the second random number, in the time that the second random number is less than exchange probability, the markovian heating parameters in exchange processing, otherwise exchange does not occur. The present invention has shortened MC³The time of implementation of algorithm and MCMC algorithm, and reduced communication-cost.

Description

A kind of parallel sorting method based on MCMC

Technical field

The present invention relates to Data Classification Technology, relate in particular to a kind of parallel sorting side based on MCMCMethod.

Background technology

For Data classification problem, there are at present many sorting techniques, single sorting technique is mainly wrappedDraw together: decision tree, Bayes, artificial neural network, K-neighbour, SVMs with based on associated ruleClassification etc. Also be useful in addition the integrated learning approach of the single sorting technique of combination, as BaggingMethod and Boosting method etc.

In many sorting techniques, Bayesian Classification Arithmetic is that a class utilizes probability statistics knowledge to classifyAlgorithm. In the time facing the classification problem of large data, just show based on statistical bayesian algorithmIts advantage. Bayesian algorithm basic thought is to carry out parameter by Bayes rule (referring to formula 1)The process of rear checking probabilistic inference.

P (H | E) = \frac{P (E | H) \cdot P (H)}{P (E)}

(formula 1)

Wherein, E is available data, and H is the parameter that needs concern, is exactly individual in classification problemBody belongs to the probability of a certain class, and P (E) is the non-conditional probability of data, and P (H) is the prior probability of parameter,P (E|H) is that likelihood is estimated, P (H|E) is the posterior probability of parameter.

In bayesian algorithm, the deduction of the posterior probability of parameter can utilize MCMC (MarkovChainMonteCarlo, Markov Chain Monte Carlo) algorithm, the parameter that wherein we pay close attention to, as state space, is searchedIt is exactly the posterior probability distribution of parameter that the markovian equilibrium probability that strand state space forms distributes. HorseMarkov's chain (MarkovChain), has described a kind of status switch, and its each state value depends on itA state above. Markov chain is the stochastic variable x_1 with Markov property, x_2, x_3 ...An ordered series of numbers. These ranges of variables, they the likely set of value, be called as " state skyBetween ", the value of X_n is the state at time n. If X_{n+1} is for the condition of past stateProbability distribution is only a function of X_n, P (X_{n+1}=x|X_1=x_1, X_2=x_2 ...,X_n=x_n)=P (X_{n+1}=x|X_n=x_n). Here X is certain state in process. AboveThis identity can be counted as Markov property.

The major defect of MCMC algorithm is exactly to be easily absorbed in local optimum, therefore utilizesMC³(MetropolisCoupledMarkovChainMonteCarlo, multichain coupling Markov Chain covers specialCaro) algorithm solves the classification problem of data, MC³Algorithm can be avoided MCMC algorithm effectivelyBe absorbed in the situation of local optimum. MC³Algorithm utilizes many Markov Chains to carry out MCMC calculating simultaneously,Reach by exchanging markovian status information the situation of avoiding MCMC algorithm to be absorbed in local optimum.But in the face of googol is according to amount, MCMC algorithm itself is very consuming time, MC³Algorithm is just more consuming time.

Summary of the invention

The object of this invention is to provide a kind of parallel sorting method based on MCMC, existing for solvingThe problems referred to above that in technology, MCMC algorithm occurs in the time facing googol according to amount.

For achieving the above object, the invention provides a kind of parallel sorting method based on MCMC, shouldFor the arithmetic system that comprises that the capable processor of N and P column processor form, each processor at least wrapsContaining a Markov Chain and a feature, there is identical Markov with the processor of the P in a lineChain, the processor of the N in same row has identical personal feature, and the method step comprises:

Calculating likelihood according to original state estimates;

Calculate the posterior probability of parameter according to likelihood;

Carry out MCMC simulation trial according to described posterior probability, taking current state as basis, produce newState;

Calculate acceptance probability according to described new state, and produce first random by the first tandom number generatorNumber, the processor in described same a line has the first identical tandom number generator;

Judge the comparative result of described acceptance probability and described the first random number, when described the first random number littleIn the time of described acceptance probability, the state in next moment is described new state, otherwise it is constant to maintain the original state;

Produce by the second tandom number generator the Markov that in same column processor, preparation exchangesThe label of chain, described each processor has the second identical tandom number generator;

When the Markov Chain in described same column processor has participated in exchange, the processor in same rowCalculate exchange probability, and utilize the second tandom number generator to produce the second random number, judge described exchangeThe comparative result of probability and described the second random number, when described the second random number is less than described exchange probabilityTime, exchange the markovian heating parameters in described same row processing, otherwise exchange does not occur.

The present invention utilizes multiple processors to calculate local likelihood estimation simultaneously, and then merges into overall likelihood and estimateThe mode of meter, has shortened time of implementation of MCMC; Utilize multiple processors simultaneously to many Ma ErkeThe mode that husband's chain calculates, has shortened MC³Time of implementation; By exchanging markovian heatingParameter, has reduced communication-cost.

Brief description of the drawings

After embodiments of the present invention being described in detail with way of example below in conjunction with accompanying drawing,Other features of the present invention, feature and advantage will be more obvious.

Fig. 1 is the virtual topological structure schematic diagram of the multiple processors of the embodiment of the present invention;

A kind of MC that Fig. 2 provides for the embodiment of the present invention³(multichain) parallel sorting method flow diagram.

Detailed description of the invention

Below by drawings and Examples, the application's technical scheme is described in further detail.

Fig. 1 is the virtual topological structure schematic diagram of the multiple processors of the embodiment of the present invention. Shown in the Fig. 1 of institute,Utilize 16 processors (P (1,1) ... P (4,4)) MCMC of 8 link couplings is calculatedMethod is carried out computing, and the feature quantity that in the training set that has comprised test sample book, each individuality comprises is 800,All processors are formed to virtual two dimensional topology, and wherein every a line processor is respectively to identical horseEr Kefu chain calculates, thereby has realized the concurrent operation of MCMC algorithm; Each column processor dividesOther identical data are calculated, can be realized MC by the calculating of 16 processors³Walking abreast of algorithmComputing.

A kind of MC that Fig. 2 provides for the embodiment of the present invention³The parallel sorting method flow diagram of (multichain). ExistingSuppose to have the capable * P row of a N processor, the M bar Markov Chain utilizing in Bayesian Classification Arithmetic is flatAll distribute to capable processor of N, every row processor at least comprises a Markov Chain, therefore must be firstM satisfies condition >=N. Individual data centralization characteristic of correspondence is averagely allocated to a P row processor, everyColumn processor has identical personal feature. Every Markov Chain has different probability distributionπ_i(x)＝π(x)^β, wherein i ∈ 1,2 ..., M}, β is heating parameters, and β ∈ (0,1]. Only has a Ma ErThe β value of the husband of section chain is 1, is called as cold chain, and it is exactly the posteriority of parameter that the last equilibrium probability of cold chain distributesProbability distribution, other all markovian β values are not 1, are called as hot chain, the order of hot chainBe exactly for better search condition space. Then according to given data, a rational mould is setType is for calculating the likelihood estimation of individual cluster situation, and it is poor between two individualities in model, to have defined in dataDifferent measure, comprising: Euclidean distance, correlation etc., calculate simultaneously parameter prior probability.

Each processor (i, j) in the capable * P row of a N processor is carried out following steps, wherein

Step 1: each processor produces respectively original state S_i, and setup times t and time (0)The value of MAX, initial value=0 of time t.

Step 2: carry out respectively following steps with all processors in a line:

1, according to given model, data E_jWith state S_i(t), calculating local likelihood estimateslogL(E_j|S_i(t))；

2, estimate to merge meter by the local likelihood of the stipulations operation handlebar in parallel computationCalculating overall likelihood estimates

l o g L (E_{j} | S_{i} (t)) = Σ_{j = 1}^{P} l o g L (E_{j} | S_{i} (t)) .

Step 3: utilize formula on the basis of estimating in likelihoodCalculate parameterPosterior probability, wherein E is the training dataset that comprises test data, and H is parameter, and namely classification is askedThe individual probability that belongs to a certain class in topic, P (E) is the non-conditional probability of data set, P (H) is according to dataThe parameter prior probability calculating, P (E|H) is that the parameter likelihood of calculating according to the model arranging is estimated,P (H|E) is exactly the posterior probability of parameter.

Step 4: with current state S_i(t) be basis, carry out MCMC simulation, utilize transfer functionT(S_i(t),S_i(t) '), in state space, produce new state S_i(t)'。

I article of Markov Chain of Si (t) expression is at the state in t moment, and Si (t) ' represents i article of Ma ErkeThe new state that husband's chain produced through calculating as basis taking Si (t) in the t moment.

Step 5: utilize formula according to Metropolis-Hastings adaptive algorithm

γ (S_{i} (t), S_{i} {(t)}^{'}) = \min (1, \frac{π (S_{i} {(t)}^{'})}{π (S_{i} (t))} \cdot \frac{T (S_{i} (t), S_{i} {(t)}^{'})}{T (S_{i} {(t)}^{'}, S_{i} (t))}),

Calculate new state S_i(t) 'Acceptance probability R. Because different Markov Chains has different probability distribution π_i(x)＝π(x)^β，Therefore acceptance probability R computing formula is

γ (S_{i} (t), S_{i} {(t)}^{'}) = \min (1, {(\frac{π (S_{i} {(t)}^{'})}{π (S_{i} (t))})}^{β} \cdot \frac{T (S_{i} (t), S_{i} {(t)}^{'})}{T (S_{i} {(t)}^{'}, S_{i} (t))}),

Si (t) represents i article of Ma ErkeHusband's chain is at the state in t moment, Si (t) ' represent i article of Markov Chain in the t moment taking Si (t) as basePlinth is through calculating the new state producing. γ (S_i(t),S_i(t) ') to be Si (t) ' as the t moment initialThe acceptance probability of state, π is markovian probability distribution.

It should be noted that, the each processor in same a line has the first identical random number and producesRaw device Rand1, the each processor in different rows has the first different tandom number generatorsRand1, utilize the first tandom number generator Rand1 can calculate [0,1] in being uniformly distributed one withMachine is counted U1.

Step 6: judge the comparative result of probability of acceptance R and random number U1, as random number U1While being less than acceptance probability R, the state in next moment is new state S_i(t) ', otherwise maintain the original stateConstant.

Step 7: utilize the second tandom number generator Rand2 to produce to prepare in same column processor intoThe markovian label of row exchange. It should be noted that, all processors all have identicalThe second tandom number generator.

Step 8: if the Markov Chain comprising in same column processor has participated in exchange, same rowIn processor execution step eight (one)～step 8 (two); Otherwise, utilize the second random number to produceA random number U2 during device calculating [0,1] is uniformly distributed, and jump to step 9.

Step 8 (one): according to Metropolis-Hastings algorithm, utilize formulaCalculate exchange probability R, and utilize the second tandom number generatorA random number U2 during Rand2 calculating [0,1] is uniformly distributed.

Step 8 (two): the comparative result of judgement exchange probability R and the second random number U2, asFruit the second random number U2 is less than exchange probability R, exchanges two chain heating parameters β, otherwise hands overChange and do not occur. If need the Markov Chain of exchange in same processor, directly carry outThe exchange of heating parameters. If need the Markov Chain of exchange not in same processor, logicalThe exchange of heating parameters β is carried out in the communication of crossing between processor.

Step 9: the value cumulative 1 of time t.

Step 10: if the value of time t has reached termination condition, reached journey of time MAX valueOrder finishes, and continues to carry out otherwise jump to step 2.

After all processors all execute, gather the result of each processor.

Obviously, do not departing under the prerequisite of true spirit of the present invention and scope described hereInvention can have many variations. Therefore, all changes that it will be apparent to those skilled in the art that,Within all should being included in the scope that these claims contain. The present invention's scope required for protection only byDescribed claims limit.

Claims

1. the parallel sorting method based on MCMC, is applied to and comprises the capable processor of N and P row placeIn the arithmetic system that reason device forms, each processor at least comprises a Markov Chain and a feature,There is identical Markov Chain with the processor of the P in a line, the N in same row processing utensilThere is identical personal feature, it is characterized in that:

Calculating likelihood according to the individual difference of data estimates;

Calculate the posterior probability of parameter according to likelihood;

Produce by the second tandom number generator the markovian label that preparation exchanges, described everyIndividual processor has the second identical tandom number generator;

When the Markov Chain comprising in processor has participated in exchange, the processor in same row calculatesExchange probability, and utilizes the second tandom number generator to produce the second random number, judge described exchange probability andThe comparative result of described the second random number, in the time that described the second random number is less than described exchange probability, hands overChange two markovian heating parameters, otherwise exchange does not occur;

The described individual difference according to data is calculated likelihood estimating step and is comprised:

Processor in described same a line calculates part seemingly according to the individual difference of data and partial dataSo estimate;

Processor in described same a line is estimated to merge into overall likelihood by described local likelihood and is estimated;

The M bar Markov Chain utilizing in Bayesian Classification Arithmetic is averagely allocated to capable processor of N,Every row processor at least comprises a Markov Chain, and M >=N therefore must satisfy condition; DataConcentrate individual characteristic of correspondence to be averagely allocated to a P row processor, every column processor has identical individualityFeature.

2. parallel sorting method according to claim 1, is characterized in that, described in described basisThe posterior probability step that likelihood calculates parameter comprises:

Estimate to utilize formula according to described likelihoodCalculate the posterior probability of parameter,Wherein, E is the training dataset that comprises test data, and H is parameter, and P (E) is the non-condition of data setProbability, P (H) is the prior probability calculating according to data, P (E|H) calculates according to the model setting in advanceParameter likelihood estimate, P (H|E) is the posterior probability of parameter.

3. parallel sorting method according to claim 1, is characterized in that, described in described basisNew state is calculated acceptance probability step and is comprised:

Utilize formula according to described new state

γ (S_{i} (t), S_{i} {(t)}^{'}) = \min (1, \frac{π (S_{i} {(t)}^{'})}{π (S_{i} (t))} \cdot \frac{T (S_{i} (t), S_{i} {(t)}^{'})}{T (S_{i} {(t)}^{'}, S_{i} (t))}),

CalculateGo out described acceptance probability;

T(S_i(t),S_i(t) ') be transfer function; Si (t) represents that i article of Markov Chain is at the state in t moment,Si (t) ' represents the new shape that i article of Markov Chain produced through calculating as basis taking Si (t) in the t momentState; γ (S_i(t),S_i(t) ') be S_i(t) ' as the acceptance probability of the original state in t moment, π is horseThe probability distribution of Er Kefu chain.

4. parallel sorting method according to claim 1, is characterized in that, described same rowIn processor calculate exchange probability step and comprise:

Processor in same row is according to formulaCalculate exchange probability.

5. parallel sorting method according to claim 1, is characterized in that, described when described theWhen two random numbers are less than described exchange probability, exchange the step bag of two markovian heating parametersDraw together: in the time that two Markov Chains that exchange are in same processor, directly carry out the friendship of heating parametersChange, otherwise, carry out the exchange of heating parameters by the communication between processor.