CN109977359B - Non-autocorrelation sampling method for large sample space complex probability distribution - Google Patents

Non-autocorrelation sampling method for large sample space complex probability distribution Download PDF

Info

Publication number
CN109977359B
CN109977359B CN201910265051.XA CN201910265051A CN109977359B CN 109977359 B CN109977359 B CN 109977359B CN 201910265051 A CN201910265051 A CN 201910265051A CN 109977359 B CN109977359 B CN 109977359B
Authority
CN
China
Prior art keywords
sample
buffer
sampling
samples
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910265051.XA
Other languages
Chinese (zh)
Other versions
CN109977359A (en
Inventor
吴俊杰
刘雍
熊敏
徐平
强晓刚
黄安琪
付祥
邓明堂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201910265051.XA priority Critical patent/CN109977359B/en
Publication of CN109977359A publication Critical patent/CN109977359A/en
Application granted granted Critical
Publication of CN109977359B publication Critical patent/CN109977359B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Abstract

The invention discloses a non-autocorrelation sampling method for large sample space complex probability distribution, and aims to solve the autocorrelation problem of an MCMC method and the efficiency loss problem of skip sampling. The technical scheme is as follows: when the samples are generated, the correlation among the samples is eliminated by setting a sample buffer area; or as a post-processing method, the generated samples are read in sequence, and the sample sequence is updated through the sample buffer so as to eliminate the sample sequence autocorrelation. When a sample is generated each time or is read from a file, the obtained sample is stored in a sample buffer, and when the buffer is full, the sample is randomly selected from the buffer to be output. By means of the random output mode, an extra layer of randomness is added among samples, and finally the purpose of eliminating the sample sequence autocorrelation is achieved. The invention has fast calculation speed, maintains MCMC efficiency, has relatively negligible overhead of additionally added buffer operation and eliminates sample autocorrelation.

Description

Non-autocorrelation sampling method for large sample space complex probability distribution
Technical Field
The invention relates to a sampling method for a large sample space complex probability distribution sample, in particular to a Markov chain-based non-autocorrelation sampling method.
Background
The computer processing of random events is an essential means for engineering application analysis. In recent decades, with the rapid development of computer performance, computers have been widely used in aerospace, automobile and ship design and manufacture, bridge building design and manufacture, environmental engineering, weather forecasting, polymer materials, and other aspects, and in these engineering applications, random events need to be calculated and analyzed by computers to achieve the purposes of bridge reliability detection, weather forecasting, and the like. In random event analysis, one necessary step is to sample a complex probability distribution to obtain independent, random samples.
Efficient sampling algorithms are a necessary means to improve sampling efficiency. Generally, the source of random events involved in engineering applications is complex, the data size is large, the calculation difficulty of sample probability is large, and the number of sample types may be as high as 10 20 Above the order of magnitude. The analysis of such random events requires a large amount of computational resources. Besides improving the hardware performance as much as possible to meet the demand of computing resources, optimization of the sampling method is indispensable.
The large sample spatial complex probability distribution sampling problem can be defined as follows:
sample space X = { X = ×) 1 ,x 2 ,…,x i ,…,x N Denotes the possible presence of N samples (1. Ltoreq. I.ltoreq.N). The value of N is quite large, possibly up to 10 20 Magnitude.
The complex probability distribution is described by a domain-dependent complex probability distribution function f (x), i.e. the samples x are taken i Has a probability of f (x) i ). Probability value f (x) i ) Is difficult (typically exponential in time complexity) and requires a significant amount of computing resources.
Constructing a sample sequence t 1 ,t 2 ,…,t j ,…,t M Where there is a sample t for any j (1. Ltoreq. J.ltoreq.M, the value of M varying from application to application, 1. Ltoreq. M.ltoreq.N) j E X and subjecting the samples in the sequence to a probability distribution described by a probability distribution function f. The calculation of the probability distribution function f generally requires a large amount of computing resources, and is a key point and a difficulty in the sampling process.
Currently, commonly used sampling methods include brute force sampling methods, rejection sampling methods (a. Neville et al, 10.2017, nature-physics, classical bosom sampling algorithms with experimental performance beyond recent physics), and markov chain monte carlo methods (j.s. Liu, 6.1996, statistics and calculations, metropolized independent sampling methods and comparisons with rejection sampling methods and importance sampling methods).
1. Violence sampling method
The brute force sampling method firstly and completely calculates the probability distribution of the whole to-be-sampled data, and then completes sampling. And the computing resource requirement for computing the whole probability distribution to be sampled is huge, so that the method is not suitable for engineering application.
2. Sampling rejection method
The sampling rejection method firstly generates a pending sample (denoted as x) based on an assumed simple auxiliary probability g distribution, and then calculates a probability value f (x) of the pending sample x in an actual probability distribution f to be sampled. The auxiliary probability g needs to be multiplied by a factor λ such that for any sample there is λ · g (x) > f (x). Then, the ratio f (x)/λ · g (x) of the actual probability of the undetermined sample to the auxiliary probability is used as the basis for outputting the undetermined sample, i.e. f (x)/λ · g (x) is used as the probability for outputting, or 1-f (x)/λ · g (x) is used as the probability for discarding (i.e. rejecting) the undetermined sample.
Although the reject sampling method does not require the calculation of the entire probability distribution when generating samples, it still has the following disadvantages:
(1) Because the probability value of the actual probability distribution is difficult to calculate, and the maximum probability value is difficult to obtain, the value of the factor lambda is difficult to determine;
(2) When the auxiliary probability distribution is positioned too small, the actual probability distribution is cut off to a certain extent, so that the sampling result is distorted;
(3) When the auxiliary probability distribution is positioned too much, the acceptance probability is very low, the probability that the sample is rejected is greatly improved, and the sampling efficiency is greatly reduced;
(4) Even if the value of the factor lambda is well determined, the probability number which needs to be calculated when an average sample is received is improved along with the improvement of the complexity of probability distribution to be sampled, the requirement on calculation resources is still high, and the efficiency is not high.
3. Markov Chain Monte Carlo (MCMC)
The MCMC method carries out sampling by constructing a Markov chain taking the probability distribution to be sampled as the convergence probability distribution. When a probability value is calculated, the Markov chain can expand one node, and simultaneously, one sample can be output, so that the method is the most efficient sampling method at present.
The general flow of the markov chain monte carlo method is shown in figure 1, which includes.
The first step is as follows: and selecting an initial state.
1.1 define the state space of the Markov chain as S = { S = 1 ,s 2 ,…,s i ,…,s N Where for positive integers i (1. Ltoreq. I.ltoreq.N), there is s i And X = { X in sample space 1 ,x 2 ,…,x i ,…,x N Element x of } i Corresponding;
1.2 constructing a sample probability function f based on a Markov chain state space S s (s) such that for integer i, there is f s (s i )=f(x i ) Wherein f (x) i ) Representing samples taken to obtain a sample x i The probability of (c).
1.3 arbitrarily choosing an easy-to-sample random distribution g (S) in the state space S as an auxiliary probability distribution (e.g., uniform distribution, etc.), where g (S) i ) Representing the sampled state s i As the probability of the sample. The sampling from g(s) is simple, and an independent sampling method is generally selected, namely the sampling process in g(s) is independent of the current state;
1.4 generating a positive integer o e [1,N with a uniform random distribution]And with s o As an initial state of the Markov chain, a state s is calculated o Corresponding sample probability p o =f s (so), and outputs the state s o Corresponding sample x of o
1.5 setting the Markov chain Current State s c =s o
1.6 setting the target value N of the number of sampling samples according to the requirements and application characteristics of users s
The second step is that: generating candidate states
2.1 random selection of states s from the conditional probability distribution g(s) n (N is more than or equal to 1 and less than or equal to N) is taken as a candidate state;
2.2 calculating the probability value f corresponding to the state to be selected s (s n )。
The third step: deciding to determine the subsequent state, and outputting the sample
3.1 calculating the acceptance probability p accept =min(1,(f s (s n )g(s c ))/(f s (s c )g(s n )));
3.2 generating random number u in the interval of [0,1 ];
3.3 if u<p accept Let s stand for c =s n I.e. accepting the candidate state s n Is the new current state of the Markov chain; otherwise s c If the state is not changed, the state to be selected is rejected;
3.4 generating and outputting State s c The corresponding samples.
The fourth step: judging whether the sampling is finished
If the number of samples obtained by sampling reaches the set target value, jumping to the fifth step; if the predetermined target value is not reached, the process jumps to the second step.
The fifth step: and (6) ending.
The MCMC method can generate one sample every time a sample value is calculated, however, the generated sample sequence has a great disadvantage that a great number of consecutive repeated samples may appear in the generated sequence due to the fact that the event probability of the acceptance or rejection process of the second step depends heavily on the current sample, and the sample sequence has a serious autocorrelation. In order to quantitatively characterize the degree of autocorrelation of a sample sequence, generally by means of autocorrelation coefficients of various orders, in particular autocorrelation coefficients of the first order r 1 The autocorrelation degree of the sample sequence is evaluated, as shown in formula (1).
Figure BDA0002016512640000041
Where k denotes the autocorrelation order of the evaluation, x i For the ith sample value of the signal, the sample value,
Figure BDA0002016512640000042
is the sample mean. Setting k =1 evaluates the first order autocorrelation. r is 1 The value of (a) is between-1 and 1, the closer to 1 the absolute value of (b), meaning the greater the degree of autocorrelation in the sample sequence, r 1 A value of 0 means that the samples are independent of each other, i.e. there is no autocorrelation. And the first-order autocorrelation coefficient r of the sample sequence generated by the Markov chain 1 Can make it possible toUp to over 0.8.
The currently common method for removing autocorrelation, called "skip sampling", is an improvement of the basic MCMC method, and its general flow is shown in fig. 2, and includes:
the first step is as follows: selecting an initial state
1.1 define the state space of the Markov chain as S = { S = } 1 ,s 2 ,…,s i ,…,s N Where for positive integers i (1. Ltoreq. I.ltoreq.N), there is s i And X = { X in sample space 1 ,x 2 ,…,x i ,…,x N Element x of } element x i Corresponding;
1.2 constructing a sample probability function f based on a Markov chain state space S s (s) such that for integer i, there is f s (s i )=f(x i ) Wherein f (x) i ) Representing samples taken to obtain a sample x i The probability of (c).
1.3 arbitrarily choosing an easy-to-sample random distribution g (S) in the state space S as an auxiliary probability distribution (e.g., uniform distribution, etc.), where g (S) i ) Indicating the sampled state s i As the probability of the sample. Sampling from g(s) is simple, and an independent sampling method is generally selected, namely the sampling process in g(s) is independent of the current state;
1.4 generating a positive integer o e [1,N with a uniform random distribution]And with s o As an initial state of the Markov chain, a state s is calculated o Corresponding sample probability p o =f s (s o ) And output the state s o Corresponding sample x of o
1.5 setting the Markov chain Current State s c =s o
1.6 determine the step size k of the skip sample. The value of k is generally 100;
1.7 setting the target value N of the number of sampling samples according to the requirements and application characteristics of users s
The second step is that: generating candidate states
2.1 obtaining the State s from random sampling in the probability distribution g(s) n As a candidate state;
2.2 calculate candidateProbability value f corresponding to state s (s n )。
The third step: deciding to determine the subsequent state, generating a sample
3.1 calculating the acceptance probability p accept =min(1,(f s (s n )g(s c ))/(f s (s c )g(s n )));
3.2 generating random number u in the interval of [0,1 ];
3.3 if u<p accept Then let s c =s n I.e. accepting the candidate state s n Is the new current state of the Markov chain; otherwise s c If the state is not changed, the state to be selected is rejected;
3.4 generating State s c And recording the serial number n generated by the corresponding sample, but not outputting the serial number n.
The fourth step: judging whether to output the sample
If the serial number n of the sample generated in the step 3.4 can divide k completely, outputting the sample and jumping to the fifth step; if the sample generated in step 3.4 has a sequence number n that does not divide k evenly, the sample is discarded and the second step is skipped.
The fifth step: judging whether the sampling is finished
If the number of samples obtained by sampling reaches the set target value, jumping to the sixth step; if the predetermined target value is not reached, the process jumps to the second step.
And a sixth step: and (6) ending.
The autocorrelation problem of the sample sequence can be effectively solved by skipping sampling, namely r of the sequence can be obtained 1 The value is reduced to 10 -3 ~10 -2 Magnitude. However, the skip sampling introduces another problem, that is, when the fourth step is executed, the output can be performed only when the sample number can be divided by k, which means that only one of the k samples can be retained every time k samples are generated, and the other k-1 samples are discarded, so that although the auto-correlation problem is solved, the sampling speed of the skip sampling is reduced by k times relatively, and the calculation amount is increased by k times.
Disclosure of Invention
The invention provides a non-autocorrelation sampling method for large sample space complex probability distribution, aiming at the autocorrelation problem of the MCMC method and the efficiency loss problem of skip sampling, so as to generate a non-autocorrelation sample sequence on the premise of achieving the same sampling efficiency as the MCMC method.
The method can be divided into two modes according to different application scenes, (1) based on an MCMC method, the MCMC method is used as a complete sampling method, and the autocorrelation of a sample sequence is eliminated while sampling is realized; (2) As a post-processing method of the MCMC method, the autocorrelation of a sample sequence obtained by the MCMC method is eliminated. Generally, when the sample buffer size can be determined, the first approach is taken; the second way is used to measure the size of the sample buffer needed by the probability distribution to be sampled, as the debugging process of the first way.
The technical scheme of the invention is as follows: when the samples are generated, the association between the samples is eliminated by setting a sample buffer area; or as a post-processing method, the generated samples are read in sequence, and the sample sequence is updated through a sample buffer so as to eliminate the sample sequence autocorrelation. Each time a sample is generated or read from a file, the obtained sample is stored in a sample buffer, and when the buffer is full, the sample is randomly selected from the buffer for output. By means of the random output mode, an extra layer of randomness is added among samples, and finally the purpose of eliminating the sample sequence autocorrelation is achieved.
The invention comprises the following steps:
the first step is as follows: selecting an initial state and initializing a buffer area, wherein the method comprises the following steps:
1.1 define the state space of the Markov chain as S = { S = 1 ,s 2 ,…,s i ,…,s N Where for positive integers i (1. Ltoreq. I.ltoreq.N), there is s i And X = { X in sample space 1 ,x 2 ,…,x i ,…,x N Element x of } element x i Corresponding;
1.2 constructing a sample probability function f based on a Markov chain state space S s (s) such that for integer i, there is f s (s i )=f(x i ),Wherein f (x) i ) Representing samples taken to obtain a sample x i The probability of (d);
1.3 arbitrarily choosing an easy-to-sample random distribution g (S) (e.g. uniform distribution, etc.) over the state space S as the auxiliary probability distribution, where g (S) i ) Indicating the sampled state s i As the probability of the sample.
1.4 set the initial value of the sample buffer size L to the value to be debugged or a sufficiently large value, L being a positive integer, (typically set to 4000), initialize the sample buffer to empty, with a size that can hold L samples, where each location where a sample can be stored is called a buffer bin. Initializing a current storage position variable p =1;
1.5 generating a positive integer o e [1,N with a uniform random distribution]And with s o As an initial state of the Markov chain, a state s is calculated o Corresponding sample probability p o =f s (s o ) And output the state s o Corresponding sample x of o
1.6 setting the Markov chain Current State s c =s o
1.7 setting the target value N of the number of sampling samples according to the requirements of users and the characteristics of applications s (greater than L).
The second step: generating a candidate state by the method:
2.1 obtaining the State s from random sampling of g(s) n (N is more than or equal to 1 and less than or equal to N) is taken as a candidate state; sampling from g(s) is simple, and an independent sampling method is generally selected, namely the process of sampling from g(s) is irrelevant to the current state;
2.2 calculating probability value f corresponding to the state to be selected s (s n )。
The third step: deciding to determine a subsequent state and generating a sample, wherein the method comprises the following steps:
3.1 calculating the acceptance probability p accept =min(1,(f s (s n )g(s c ))/(f s (s c )g(s n )));
3.2 generating random number u in the interval of [0,1 ];
3.3 if u<p accept Then let s c =s n Ready to accept candidate statesState s n Is the new current state of the markov chain; otherwise s c If the state is not changed, the state to be selected is rejected;
3.4 generating State s c Corresponding sample is recorded, and the serial number n generated by the sample is recorded generate But not output first.
The fourth step: judging the use mode:
if the required buffer area size value is obtained through measurement or is applied to the scenes such as aerospace, ship and bridge reliability analysis and the like, the requirement on the real-time performance of the sample is high, and when no sufficient time is available for debugging, a mode variable mode =1 is set, and the eighth step is carried out; if the debugging process is carried out, namely the required buffer capacity is measured, setting a mode variable mode =2, and turning to the fifth step;
the fifth step: storing a sample:
and storing the sample generated in the step 3.4 into a sample file.
And a sixth step: judging whether sampling is completed:
if the sample obtained in step 3.4 generates the serial number n generate =N s Turning to the seventh step; if the sample obtained in step 3.4 generates the serial number n generate <N s Then go to the second step.
The seventh step: reading a sample:
and reading one sample in sequence from the sample file saved in the fifth step.
Eighth step: judging whether the sample buffer area is full:
the last buffer bin of the sample buffer is checked for a sample. If yes, turning to the ninth step; if not, go to the tenth step.
The ninth step: randomly outputting samples by the following method:
9.1 production [1,L]Random positive integer r between b
9.2 buffer of the r-th sample b Outputting the samples stored in the sample buffer slot, and emptying the r-th buffer slot b A buffer slot for recording the output serial number n of the sample output
9.3 if mode =1, sample generated in step 3.4 is sampledThe data is stored to the r-th sample buffer b In each buffer tank; if mode =2, the sample read in the seventh step is stored in the r-th sample buffer b In each buffer tank;
9.4 go to the tenth step.
The tenth step: filling a sample buffer zone by the following method:
10.1 if mode =1, storing the sample generated in the step 3.4 into the p sample buffer slot, where p is the current storage location variable initialized in the step 1.4;
10.2 if mode =2, storing the sample read in the seventh step into the p-th sample buffer slot, where p is the current storage location variable initialized in the 1.4 step;
10.3p=p+1;
10.4 go to the tenth step.
The eleventh step: judging the use mode:
if mode =1, go to the twelfth step; if mode =2, jump to the thirteenth step.
The twelfth step: judging whether sampling is finished:
output sample number n recorded in step 9.2 output =N s L, jumping to the fourteenth step; output sample number n recorded in step 9.2 output <N s and-L, then turning to the second step.
The thirteenth step: judging whether the sample is read completely:
if the sample in the sample file is completely read, turning to the fourteenth step; and if the unread samples still exist in the file, turning to the seventh step.
The fourteenth step is that: emptying a sample buffer zone by the following method:
14.1 initializing the residual sample number variable N left =L;
14.2 production [1,N left ]Random positive integer r between c And will buffer the r-th data c Outputting samples in each buffer groove;
14.3 converting N left The sample in each buffer tank moves to the r-th stage c A buffer tank;
14.4N left =N left –1;
14.5 if N left If the value is 0, go to the fifteenth step, otherwise go to 14.2.
The fifteenth step: and (6) ending.
Compared with the prior art, the method has the advantages of high calculation speed, maintenance of the original efficiency of MCMC, relatively negligible overhead of additionally added buffer operation relative to the calculation of the probability value, and elimination of sample autocorrelation. The specific reason is as follows:
1. the steps with the largest calculated amount and the largest consumed time are the first step, the second step and the third step, and are the same as the standard MCMC method, the extra operation expenses (the eighth step, the ninth step and the tenth step) of the buffer area are very small and can be ignored, the extra operation expenses (the fifth step and the seventh step) of the file are very small, the I/O amount of the related computer is small, and the relative cost can be ignored. And particularly when the using mode is synchronous execution, no file operation is needed. Therefore, the calculated amount of the method is basically the same as that of the standard MCMC method;
2. compared with the skip sampling, in the eighth step, the ninth step and the tenth step of the invention, no sample is discarded, so that no sample waste behavior exists, the original sampling efficiency of MCMC is kept on the premise of keeping the calculated amount basically the same, finally all generated samples are output, and the performance that only one probability value needs to be calculated when one sample is generated on average is realized;
3. in the ninth and tenth steps of the invention, samples are output in random order by introducing a sample buffer zone and indexing by random numbers in the 9.1,9.2,9.3 step, a layer of randomness is introduced between the samples, thus r of the sample sequence is introduced 1 The value is reduced to 10 -3 And magnitude, correlation between samples is eliminated.
The different size buffers have different effects on the removal of the sequence autocorrelation as shown in fig. 4. The larger the buffer, the more sufficient randomness is introduced and the better the cancellation of the autocorrelation. The effect of the present invention on eliminating the autocorrelation of the sample sequence is shown in fig. 5, i.e. the effect of the present invention applied to the quantum dominance benchmark test on a supercomputer, where fig. 5 (a) is the autocorrelation curve of the sample sequence sampled by the standard MCMC method, and fig. 5 (b) is the autocorrelation curve obtained by using the present invention. Comparing the autocorrelation of each order of the two sample sequences, the autocorrelation of the first 100 orders of the sample sequences obtained by the standard MCMC method is relatively serious, and the autocorrelation of the sample sequences generated by the method is eliminated to a negligible degree when the order is more than 0. In addition, no sample loss exists in the execution process of the invention, so no matter how large the sample buffer area is set, the total execution time of the whole process is not greatly influenced, and only the generation time of the first independent sample is increased. As large a buffer as possible can be used when the preparation time allowed in the present period is sufficient.
Drawings
FIG. 1 is a flow chart of a MCMC method as described in the background art;
FIG. 2 is a flow chart of a skip-step sampling method according to the background art;
FIG. 3 is an overall flow diagram of the present invention;
FIG. 4 is a schematic diagram illustrating the effect of eliminating the first-order autocorrelation coefficient of a sample sequence under different configuration conditions (different sample buffers);
fig. 5 is a graph comparing the effect of the present invention with that of the background art. Fig. 5 (a) is an autocorrelation curve of a sample sequence sampled by the MCMC method of the background art, and fig. 5 (b) is an autocorrelation curve obtained by using the present invention;
fig. 6 is a comparison diagram of the efficiency of the skip-step sampling method in the present invention and the background art.
Detailed Description
The general flow of the present invention is shown in fig. 3, and comprises the following steps:
the first step is as follows: selecting an initial state, initializing a buffer area
1.1 define the state space of the Markov chain as S = { S = 1 ,s 2 ,…,s i ,…,s N Where for positive integers i (1. Ltoreq. I.ltoreq.N), there is s i And X = { X in sample space 1 ,x 2 ,…,x i ,…,x N Element x of } element x i Corresponding;
1.2 constructing a sample probability function based on Markov chain state space SNumber f s (s) such that for integer i, there is f s (s i )=f(x i ) Wherein f (x) i ) Representing samples taken to obtain a sample x i The probability of (d);
1.3 randomly choosing an easy-to-sample random distribution g (S) in the state space S as an auxiliary probability distribution, wherein g (S) i ) Indicating the sampled state s i As the probability of the sample.
1.4 set the sample buffer capacity L to 4000, initialize the sample buffer to empty, capacity to hold L samples, where each location where a sample can be stored is called a buffer bin. Initializing a current storage position variable p =1;
1.5 generating a positive integer o e [1,N with a uniform random distribution]And with s o As the initial state of the Markov chain, the state s is calculated o Corresponding sample probability p o =f s (s o ) And output the state s o Corresponding sample x of o
1.6 setting the Markov chain Current State s c =s o
1.7 setting the target value N of the number of sampling samples according to the requirements of users and the characteristics of applications s Is greater than L.
The second step is that: generating candidate states
2.1 obtaining the State s from the probability distribution g(s) by random sampling with independent sampling method n (N is more than or equal to 1 and less than or equal to N) is taken as a candidate state;
2.2 calculating probability value f corresponding to the state to be selected s (s n )。
The third step: and (3) deciding to determine a subsequent state, generating a sample:
3.1 calculating the acceptance probability p accept =min(1,(f s (s n )g(s c ))/(f s (s c )g(s n )));
3.2 generating random number u in the interval of [0,1 ];
3.3 if u<p accept Then let s c =s n I.e. accepting the candidate state s n Is the new current state of the Markov chain; otherwise s c Remaining unchanged, i.e. rejecting candidate state;
3.4 generating State s c Corresponding sample is recorded, and the serial number n generated by the sample is recorded generate
The fourth step: determining a usage pattern
If the required buffer area size value is obtained through measurement or is applied to scenes such as aerospace, ship and bridge reliability analysis and the like, the requirement on the real-time performance of the sample is high, and no sufficient time is available for debugging, setting a mode variable mode =1, and going to the eighth step; if the debugging process is carried out, namely the capacity of the required buffer area is measured, setting a mode variable mode =2, and turning to the fifth step;
the fifth step: storing a sample:
and storing the sample generated in the step 3.4 into a sample file.
And a sixth step: judging whether sampling is completed:
if the sample obtained in step 3.4 generates the serial number n generate =N s Turning to the seventh step; if the sample obtained in step 3.4 generates the serial number n generate <N s Then go to the second step.
The seventh step: reading a sample:
and reading one sample in sequence from the sample file saved in the fifth step.
Eighth step: judging whether the sample buffer area is full:
the last buffer bin of the sample buffer is checked for samples. If yes, turning to the ninth step; if not, go to the tenth step.
The ninth step: randomly outputting samples:
9.1 production [1,L]Random positive integer r between b
9.2 buffer of the r-th sample b Outputting the samples stored in the sample buffer slot, and emptying the r-th buffer slot b A buffer tank for recording the output serial number n of the sample output
9.3 if mode =1, store the sample generated in step 3.4 to the r-th sample buffer b In each buffer tank; if mode =2, the samples read in the seventh step are stored in a sample bufferR is b In each buffer tank;
9.4 go to the tenth step.
The tenth step: filling a sample buffer:
10.1 if mode =1, storing the sample generated in the 3.4 step into the p sample buffer slot, where p is the current storage location variable initialized in the 1.4 step;
10.2 if mode =2, storing the sample read in the seventh step into the p-th sample buffer slot, where p is the current storage location variable initialized in the 1.4 step;
10.3p=p+1;
10.4 go to the tenth step.
The eleventh step: judging the use mode:
if mode =1, go to the twelfth step; if mode =2, jump to the thirteenth step.
The twelfth step: judging whether sampling is completed:
output sample number n recorded in step 9.2 output =N s -L, then go to fourteenth step; if the output sample number n recorded in step 9.2 output <N s And L, then the second step is carried out.
The thirteenth step: judging whether the sample is read completely:
if the sample in the sample file is completely read, turning to the fourteenth step; and if the unread samples still exist in the file, turning to the seventh step.
The fourteenth step is that: emptying the sample buffer:
14.1 initializing the residual sample number variable N left =L;
14.2 production [1,N left ]Random positive integer r between c And will buffer the r-th data c Outputting samples in each buffer groove;
14.3 mixing N left The sample in the buffer tank moves to the r-th c A buffer tank;
14.4N left =N left –1;
14.5 if N left If the value is 0, go to the fifteenth step, otherwise go to 14.2.
The fifteenth step: and (6) ending.
FIG. 4 is a schematic diagram illustrating the effect of eliminating the first-order autocorrelation coefficient of a sample sequence under different configuration conditions (different sample buffers). In fig. 4, the abscissa is the size of the sample buffer, and the ordinate is the first-order autocorrelation coefficient of the sample sequence, and it can be seen from fig. 4 that the first-order autocorrelation coefficient of the sample sequence decreases rapidly as the sample buffer increases. When the sample buffer size reaches 1000, the sample autocorrelation coefficient is reduced to 10 -2 Magnitude; after the sample buffer size reaches 2000, the sample sequence autocorrelation is negligible.
FIG. 5 is a diagram illustrating the comparison between the MCMC method of the background art and the autocorrelation of the sampled sequences of the present invention. Fig. 5 (a) is an autocorrelation curve of a sample sequence sampled by the MCMC method described in the background art, and fig. 5 (b) is an autocorrelation curve obtained by using the present invention. The abscissa is the order and the ordinate is the sample sequence autocorrelation coefficient. The experimental data is a quantum dominance benchmark test based on 30-photon 900-mode glass color sampling. In this test, 30 photons can be present in any 30 of the 900 modes, and thus there is a total of 30 modes
Figure BDA0002016512640000131
Probability, each probability representing a possible sample, sampling space size of
Figure BDA0002016512640000132
Figure BDA0002016512640000133
The test results were measured on a Tianhe second super computer (64 compute nodes). Since the low-order (less than 10-order) autocorrelation is the most significant in engineering application, as shown in fig. 5 (a), the sequence obtained by the MCMC method has a low-order autocorrelation (1-order autocorrelation is as high as 0.97, and 10-order autocorrelation is still as high as 0.77; as shown in FIG. 5 (b), the autocorrelation of each order of the sequence obtained by the present invention is 10 -3 ~10 -2 Magnitude, negligible.
FIG. 6 is a comparison of the efficiency of the skip-step sampling method in the background art and the present inventionThe experimental data are based on a quantum robustness benchmark test of 20-photon 400-mode glass color sampling and a quantum robustness benchmark test of 25-photon 625-mode glass color sampling. The test environment is a Tianhe second-grade super computer, and the number of the used nodes is 4 calculation nodes and 32 calculation nodes respectively. The sample space sizes of the two tests are respectively
Figure BDA0002016512640000134
And
Figure BDA0002016512640000135
comparison two methods were used to generate 5 and 2 million samples for the two test scales, respectively. For the 20-photon 400-mode glass color sampling quantum domination test, the skip sampling execution time is about 8.84 hours, and the first-order autocorrelation coefficient of the obtained sequence is 0.0424; the execution time of the invention is about 5.31 minutes, and the first-order autocorrelation coefficient is 0.0067. For a 25-photon 625-mode glass color sampling quantum dominating standard test, the execution time of step skipping sampling is about 24.46 hours, and the first-order autocorrelation coefficient of an obtained sequence is 0.1083; the execution time of the invention is about 14.67 minutes, and the first-order autocorrelation coefficient is 0.0089. The test result shows that the autocorrelation elimination effect of the invention is better, and the execution time is accelerated by about 100 times compared with the leapfrog sampling.

Claims (5)

1. A method for sampling large sample space complex probability distribution without autocorrelation is characterized by comprising the following steps:
the first step is as follows: selecting an initial state and initializing a buffer area, wherein the method comprises the following steps:
1.1 define the state space of the Markov chain as S = { S = 1 ,s 2 ,…,s i ,…,s N Where for a positive integer i, 1. Ltoreq. I.ltoreq.N, there is s i And X = { X in sample space 1 ,x 2 ,…,x i ,…,x N Element x of } element x i Corresponding;
1.2 constructing a sample probability function f based on a Markov chain state space S s (s) such that for integer i, there is f s (s i )=f(x i ) Wherein f (x) i ) Is complicated byProbability distribution function representing sampled samples x i The probability of (d);
1.3 randomly choosing an easy-to-sample random distribution g (S) in the state space S as an auxiliary probability distribution, wherein g (S) i ) Representing the sampled state s i Probability as a sample;
1.4 setting an initial value of a sample buffer area capacity L, wherein L is a positive integer, initializing the sample buffer area to be empty, and the sample buffer area capacity to accommodate L samples, wherein each position capable of storing samples is called a buffer slot, and initializing a current storage position variable p =1;
1.5 generating a positive integer o e [1,N with a uniform random distribution]And with s o As an initial state of the Markov chain, a state s is calculated o Corresponding sample probability p o =f s (s o ) And output the state s o Corresponding sample x of o
1.6 setting the Markov chain Current State s c =s o
1.7 setting the target value N of the number of sampling samples according to the requirements of users and the characteristics of applications s
The second step is that: generating a candidate state by the method:
2.1 obtaining the State s from random sampling of g(s) n N is more than or equal to 1 and less than or equal to N as a candidate state;
2.2 calculating probability value f corresponding to the state to be selected s (s n );
The third step: deciding to determine a subsequent state and generating a sample, wherein the method comprises the following steps:
3.1 calculating the probability of acceptance p accept =min(1,(f s (s n )g(s c ))/(f s (s c )g(s n )));
3.2 generating a random number u in the interval of [0,1 ];
3.3 if u<p accept Then let s c =s n (ii) a Otherwise s c Keeping the same;
3.4 generating State s c Corresponding sample is recorded, and the serial number n generated by the sample is recorded generate
The fourth step: the method for judging the use mode comprises the following steps:
if the real-time requirement on the sample is high and no sufficient time is available for debugging, setting a mode variable mode =1, and going to the eighth step; if the debugging process is carried out, namely the required buffer capacity is measured, setting a mode variable mode =2, and turning to the fifth step;
the fifth step: storing a sample:
storing the sample generated in the step 3.4 into a sample file;
and a sixth step: judging whether sampling is finished:
if the sample obtained in step 3.4 generates the serial number n generate =N s Turning to the seventh step; if the sample obtained in step 3.4 generates the serial number n generate <N s Then, the second step is carried out;
the seventh step: reading a sample:
reading a sample from a sample file in sequence;
the eighth step: judging whether the sample buffer area is full:
checking whether a sample exists in the last buffer slot of the sample buffer area, and if so, turning to the ninth step; if not, turning to the tenth step;
the ninth step: randomly outputting samples by the following method:
9.1 production [1,L]Random positive integer r between b
9.2 buffer of samples b Outputting the samples stored in the sample buffer slot, and emptying the r-th buffer slot b A buffer slot for recording the output serial number n of the sample output
9.3 if mode =1, store the sample generated in step 3.4 to the r-th sample buffer b In each buffer tank; if mode =2, the sample read in the seventh step is stored in the r-th sample buffer b In each buffer tank;
9.4 go to the tenth step;
the tenth step: filling the sample buffer
10.1 if mode =1, storing the sample generated in step 3.4 into the p-th sample buffer bin;
10.2 if mode =2, storing the sample read in the seventh step into the p-th sample buffer bin;
10.3p=p+1;
10.4 go to the tenth step;
the eleventh step: judging the use mode:
if mode =1, go to the twelfth step; if mode =2, go to the thirteenth step;
the twelfth step: judging whether sampling is completed:
output sample number n recorded in step 9.2 output =N s -L, go to the fourteenth step; if the output sample number n recorded in step 9.2 output <N s L, then go to the second step;
the thirteenth step: judging whether the sample is read completely:
if the sample in the sample file is completely read, turning to the fourteenth step; if the unread samples still exist in the file, turning to the seventh step;
a fourteenth step of: emptying a sample buffer zone by the following method:
14.1 initializing the residual sample number variable N left =L;
14.2 production [1,N left ]Random positive integer r between c And will buffer the r-th bit in the buffer c Outputting samples in each buffer groove;
14.3 mixing N left The sample in the buffer tank moves to the r-th c A buffer tank;
14.4N left =N left –1;
14.5 if N left If the value is 0, the fifteenth step is executed, otherwise, 14.2 is executed;
the fifteenth step: and (6) ending.
2. The method of non-autocorrelation sampling of a large sample spatial complex probability distribution as claimed in claim 1 wherein the easy-to-sample random distribution g(s) employs a uniform distribution.
3. The method for non-autocorrelation sampling of a spatially complex probability distribution of large samples of claim 1 where the sample buffer size L in 1.3 steps is initially set to 4000.
4. The method of non-autocorrelation sampling of a spatially complex probability distribution of large samples of claim 1 in which the target value of the number of samples sampled, N, is 1.7 steps s Is greater than L.
5. The method of non-autocorrelation sampling of a large sample spatial complex probability distribution as in claim 1 wherein 2.1 steps said sampling from g(s) employs an independent sampling method.
CN201910265051.XA 2019-04-03 2019-04-03 Non-autocorrelation sampling method for large sample space complex probability distribution Active CN109977359B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910265051.XA CN109977359B (en) 2019-04-03 2019-04-03 Non-autocorrelation sampling method for large sample space complex probability distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910265051.XA CN109977359B (en) 2019-04-03 2019-04-03 Non-autocorrelation sampling method for large sample space complex probability distribution

Publications (2)

Publication Number Publication Date
CN109977359A CN109977359A (en) 2019-07-05
CN109977359B true CN109977359B (en) 2022-11-22

Family

ID=67082668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910265051.XA Active CN109977359B (en) 2019-04-03 2019-04-03 Non-autocorrelation sampling method for large sample space complex probability distribution

Country Status (1)

Country Link
CN (1) CN109977359B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7072811B2 (en) * 2002-07-15 2006-07-04 Carnegie Mellon University Method and system for identifying regeneration points in a Markov chain Monte Carlo simulation
EP2260421A2 (en) * 2008-03-04 2010-12-15 Massachusetts Institute of Technology Combinational stochastic logic
CN102999477B (en) * 2012-12-21 2016-05-25 中国科学院计算机网络信息中心 A kind of parallel sorting method based on MCMC
CN107045530B (en) * 2017-01-20 2019-07-26 华中科技大学 A method of object storage system is embodied as local file system

Also Published As

Publication number Publication date
CN109977359A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
RU2447486C2 (en) Presentation of cycle transitions in transitions prehistory register using multiple bits
CN108600246B (en) Network intrusion detection parallelization acceleration method based on KNN algorithm
CN110178123B (en) Performance index evaluation method and device
US8718803B2 (en) Method for calculating measures of similarity between time signals
Daskalakis et al. On the structure, covering, and learning of poisson multinomial distributions
CN109325510A (en) A kind of image characteristic point matching method based on lattice statistical
CN112580793A (en) Neural network accelerator based on time domain memory computing and acceleration method
US4388491A (en) Speech pitch period extraction apparatus
CN109977359B (en) Non-autocorrelation sampling method for large sample space complex probability distribution
Ho et al. Posit arithmetic for the training and deployment of generative adversarial networks
US20190331721A1 (en) Noise spectrum analysis for electronic device
US9866778B2 (en) Predictive sigma-delta ADC filter for power consumption
CN110543482A (en) maximum time interval error calculation method and system
CN105678083A (en) Rapid detection method capable of performing single-bit frequency detection and frequency detection within block
Hayfron-Acquah et al. Improved selection sort algorithm
Laxman Discovering frequent episodes: fast algorithms, connections with HMMs and generalizations
Dooling et al. An intermediate model method for obtaining a discrete relaxation spectrum from creep data
CN112285406A (en) High-precision time domain measuring method and device and storage medium
EP0418499B1 (en) Time interval triggering and hardware histogram generation
CN111490789B (en) Periodic weak signal detection method and device based on pseudo median accumulation
CN113098519A (en) Pre-adding circuit for expanding single-bit coherent accumulation algorithm
CN110210077B (en) Response surface optimizing method and device based on key item screening strategy
CN112487646B (en) Life prediction method based on associated synchronous time series signal change point detection
CN111123687B (en) Time measuring method and system
CN110781062B (en) Sampling method for accelerating extraction of trace information of software

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant