CN109977359B

CN109977359B - Non-autocorrelation sampling method for large sample space complex probability distribution

Info

Publication number: CN109977359B
Application number: CN201910265051.XA
Authority: CN
Inventors: 吴俊杰; 刘雍; 熊敏; 徐平; 强晓刚; 黄安琪; 付祥; 邓明堂
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2019-04-03
Filing date: 2019-04-03
Publication date: 2022-11-22
Anticipated expiration: 2039-04-03
Also published as: CN109977359A

Abstract

The invention discloses a non-autocorrelation sampling method for large sample space complex probability distribution, and aims to solve the autocorrelation problem of an MCMC method and the efficiency loss problem of skip sampling. The technical scheme is as follows: when the samples are generated, the correlation among the samples is eliminated by setting a sample buffer area; or as a post-processing method, the generated samples are read in sequence, and the sample sequence is updated through the sample buffer so as to eliminate the sample sequence autocorrelation. When a sample is generated each time or is read from a file, the obtained sample is stored in a sample buffer, and when the buffer is full, the sample is randomly selected from the buffer to be output. By means of the random output mode, an extra layer of randomness is added among samples, and finally the purpose of eliminating the sample sequence autocorrelation is achieved. The invention has fast calculation speed, maintains MCMC efficiency, has relatively negligible overhead of additionally added buffer operation and eliminates sample autocorrelation.

Description

Non-autocorrelation sampling method for large sample space complex probability distribution

Technical Field

The invention relates to a sampling method for a large sample space complex probability distribution sample, in particular to a Markov chain-based non-autocorrelation sampling method.

Background

The computer processing of random events is an essential means for engineering application analysis. In recent decades, with the rapid development of computer performance, computers have been widely used in aerospace, automobile and ship design and manufacture, bridge building design and manufacture, environmental engineering, weather forecasting, polymer materials, and other aspects, and in these engineering applications, random events need to be calculated and analyzed by computers to achieve the purposes of bridge reliability detection, weather forecasting, and the like. In random event analysis, one necessary step is to sample a complex probability distribution to obtain independent, random samples.

Efficient sampling algorithms are a necessary means to improve sampling efficiency. Generally, the source of random events involved in engineering applications is complex, the data size is large, the calculation difficulty of sample probability is large, and the number of sample types may be as high as 10 ²⁰ Above the order of magnitude. The analysis of such random events requires a large amount of computational resources. Besides improving the hardware performance as much as possible to meet the demand of computing resources, optimization of the sampling method is indispensable.

The large sample spatial complex probability distribution sampling problem can be defined as follows:

sample space X = { X = ×) ₁ ,x ₂ ,…,x _i ,…,x _N Denotes the possible presence of N samples (1. Ltoreq. I.ltoreq.N). The value of N is quite large, possibly up to 10 ²⁰ Magnitude.

The complex probability distribution is described by a domain-dependent complex probability distribution function f (x), i.e. the samples x are taken _i Has a probability of f (x) _i ). Probability value f (x) _i ) Is difficult (typically exponential in time complexity) and requires a significant amount of computing resources.

Constructing a sample sequence t ₁ ,t ₂ ,…,t _j ,…,t _M Where there is a sample t for any j (1. Ltoreq. J.ltoreq.M, the value of M varying from application to application, 1. Ltoreq. M.ltoreq.N) _j E X and subjecting the samples in the sequence to a probability distribution described by a probability distribution function f. The calculation of the probability distribution function f generally requires a large amount of computing resources, and is a key point and a difficulty in the sampling process.

Currently, commonly used sampling methods include brute force sampling methods, rejection sampling methods (a. Neville et al, 10.2017, nature-physics, classical bosom sampling algorithms with experimental performance beyond recent physics), and markov chain monte carlo methods (j.s. Liu, 6.1996, statistics and calculations, metropolized independent sampling methods and comparisons with rejection sampling methods and importance sampling methods).

1. Violence sampling method

The brute force sampling method firstly and completely calculates the probability distribution of the whole to-be-sampled data, and then completes sampling. And the computing resource requirement for computing the whole probability distribution to be sampled is huge, so that the method is not suitable for engineering application.

2. Sampling rejection method

The sampling rejection method firstly generates a pending sample (denoted as x) based on an assumed simple auxiliary probability g distribution, and then calculates a probability value f (x) of the pending sample x in an actual probability distribution f to be sampled. The auxiliary probability g needs to be multiplied by a factor λ such that for any sample there is λ · g (x) > f (x). Then, the ratio f (x)/λ · g (x) of the actual probability of the undetermined sample to the auxiliary probability is used as the basis for outputting the undetermined sample, i.e. f (x)/λ · g (x) is used as the probability for outputting, or 1-f (x)/λ · g (x) is used as the probability for discarding (i.e. rejecting) the undetermined sample.

Although the reject sampling method does not require the calculation of the entire probability distribution when generating samples, it still has the following disadvantages:

(1) Because the probability value of the actual probability distribution is difficult to calculate, and the maximum probability value is difficult to obtain, the value of the factor lambda is difficult to determine;

(2) When the auxiliary probability distribution is positioned too small, the actual probability distribution is cut off to a certain extent, so that the sampling result is distorted;

(3) When the auxiliary probability distribution is positioned too much, the acceptance probability is very low, the probability that the sample is rejected is greatly improved, and the sampling efficiency is greatly reduced;

(4) Even if the value of the factor lambda is well determined, the probability number which needs to be calculated when an average sample is received is improved along with the improvement of the complexity of probability distribution to be sampled, the requirement on calculation resources is still high, and the efficiency is not high.

3. Markov Chain Monte Carlo (MCMC)

The MCMC method carries out sampling by constructing a Markov chain taking the probability distribution to be sampled as the convergence probability distribution. When a probability value is calculated, the Markov chain can expand one node, and simultaneously, one sample can be output, so that the method is the most efficient sampling method at present.

The general flow of the markov chain monte carlo method is shown in figure 1, which includes.

The first step is as follows: and selecting an initial state.

1.1 define the state space of the Markov chain as S = { S = ₁ ,s ₂ ,…,s _i ,…,s _N Where for positive integers i (1. Ltoreq. I.ltoreq.N), there is s _i And X = { X in sample space ₁ ,x ₂ ,…,x _i ,…,x _N Element x of } _i Corresponding;

1.2 constructing a sample probability function f based on a Markov chain state space S _s (s) such that for integer i, there is f _s (s _i )＝f(x _i ) Wherein f (x) _i ) Representing samples taken to obtain a sample x _i The probability of (c).

1.3 arbitrarily choosing an easy-to-sample random distribution g (S) in the state space S as an auxiliary probability distribution (e.g., uniform distribution, etc.), where g (S) _i ) Representing the sampled state s _i As the probability of the sample. The sampling from g(s) is simple, and an independent sampling method is generally selected, namely the sampling process in g(s) is independent of the current state;

1.4 generating a positive integer o e [1,N with a uniform random distribution]And with s _o As an initial state of the Markov chain, a state s is calculated _o Corresponding sample probability p _o ＝f _s (so), and outputs the state s _o Corresponding sample x of _o ；

1.5 setting the Markov chain Current State s _c ＝s _o ；

1.6 setting the target value N of the number of sampling samples according to the requirements and application characteristics of users _s 。

The second step is that: generating candidate states

2.1 random selection of states s from the conditional probability distribution g(s) _n (N is more than or equal to 1 and less than or equal to N) is taken as a candidate state;

2.2 calculating the probability value f corresponding to the state to be selected _s (s _n )。

The third step: deciding to determine the subsequent state, and outputting the sample

3.1 calculating the acceptance probability p _accept ＝min(1,(f _s (s _n )g(s _c ))/(f _s (s _c )g(s _n )))；

3.2 generating random number u in the interval of [0,1 ];

3.3 if u<p _accept Let s stand for _c ＝s _n I.e. accepting the candidate state s _n Is the new current state of the Markov chain; otherwise s _c If the state is not changed, the state to be selected is rejected;

3.4 generating and outputting State s _c The corresponding samples.

The fourth step: judging whether the sampling is finished

If the number of samples obtained by sampling reaches the set target value, jumping to the fifth step; if the predetermined target value is not reached, the process jumps to the second step.

The fifth step: and (6) ending.

The MCMC method can generate one sample every time a sample value is calculated, however, the generated sample sequence has a great disadvantage that a great number of consecutive repeated samples may appear in the generated sequence due to the fact that the event probability of the acceptance or rejection process of the second step depends heavily on the current sample, and the sample sequence has a serious autocorrelation. In order to quantitatively characterize the degree of autocorrelation of a sample sequence, generally by means of autocorrelation coefficients of various orders, in particular autocorrelation coefficients of the first order r ₁ The autocorrelation degree of the sample sequence is evaluated, as shown in formula (1).

Where k denotes the autocorrelation order of the evaluation, x _i For the ith sample value of the signal, the sample value,

is the sample mean. Setting k =1 evaluates the first order autocorrelation. r is ₁ The value of (a) is between-1 and 1, the closer to 1 the absolute value of (b), meaning the greater the degree of autocorrelation in the sample sequence, r ₁ A value of 0 means that the samples are independent of each other, i.e. there is no autocorrelation. And the first-order autocorrelation coefficient r of the sample sequence generated by the Markov chain ₁ Can make it possible toUp to over 0.8.

The currently common method for removing autocorrelation, called "skip sampling", is an improvement of the basic MCMC method, and its general flow is shown in fig. 2, and includes:

the first step is as follows: selecting an initial state

1.1 define the state space of the Markov chain as S = { S = } ₁ ,s ₂ ,…,s _i ,…,s _N Where for positive integers i (1. Ltoreq. I.ltoreq.N), there is s _i And X = { X in sample space ₁ ,x ₂ ,…,x _i ,…,x _N Element x of } element x _i Corresponding;

1.3 arbitrarily choosing an easy-to-sample random distribution g (S) in the state space S as an auxiliary probability distribution (e.g., uniform distribution, etc.), where g (S) _i ) Indicating the sampled state s _i As the probability of the sample. Sampling from g(s) is simple, and an independent sampling method is generally selected, namely the sampling process in g(s) is independent of the current state;

1.4 generating a positive integer o e [1,N with a uniform random distribution]And with s _o As an initial state of the Markov chain, a state s is calculated _o Corresponding sample probability p _o ＝f _s (s _o ) And output the state s _o Corresponding sample x of _o ；

1.5 setting the Markov chain Current State s _c ＝s _o ；

1.6 determine the step size k of the skip sample. The value of k is generally 100;

1.7 setting the target value N of the number of sampling samples according to the requirements and application characteristics of users _s 。

The second step is that: generating candidate states

2.1 obtaining the State s from random sampling in the probability distribution g(s) _n As a candidate state;

2.2 calculate candidateProbability value f corresponding to state _s (s _n )。

The third step: deciding to determine the subsequent state, generating a sample

3.2 generating random number u in the interval of [0,1 ];

3.3 if u<p _accept Then let s _c ＝s _n I.e. accepting the candidate state s _n Is the new current state of the Markov chain; otherwise s _c If the state is not changed, the state to be selected is rejected;

3.4 generating State s _c And recording the serial number n generated by the corresponding sample, but not outputting the serial number n.

The fourth step: judging whether to output the sample

If the serial number n of the sample generated in the step 3.4 can divide k completely, outputting the sample and jumping to the fifth step; if the sample generated in step 3.4 has a sequence number n that does not divide k evenly, the sample is discarded and the second step is skipped.

The fifth step: judging whether the sampling is finished

If the number of samples obtained by sampling reaches the set target value, jumping to the sixth step; if the predetermined target value is not reached, the process jumps to the second step.

And a sixth step: and (6) ending.

The autocorrelation problem of the sample sequence can be effectively solved by skipping sampling, namely r of the sequence can be obtained ₁ The value is reduced to 10 ^-3 ～10 ^-2 Magnitude. However, the skip sampling introduces another problem, that is, when the fourth step is executed, the output can be performed only when the sample number can be divided by k, which means that only one of the k samples can be retained every time k samples are generated, and the other k-1 samples are discarded, so that although the auto-correlation problem is solved, the sampling speed of the skip sampling is reduced by k times relatively, and the calculation amount is increased by k times.

Disclosure of Invention

The invention provides a non-autocorrelation sampling method for large sample space complex probability distribution, aiming at the autocorrelation problem of the MCMC method and the efficiency loss problem of skip sampling, so as to generate a non-autocorrelation sample sequence on the premise of achieving the same sampling efficiency as the MCMC method.

The method can be divided into two modes according to different application scenes, (1) based on an MCMC method, the MCMC method is used as a complete sampling method, and the autocorrelation of a sample sequence is eliminated while sampling is realized; (2) As a post-processing method of the MCMC method, the autocorrelation of a sample sequence obtained by the MCMC method is eliminated. Generally, when the sample buffer size can be determined, the first approach is taken; the second way is used to measure the size of the sample buffer needed by the probability distribution to be sampled, as the debugging process of the first way.

The technical scheme of the invention is as follows: when the samples are generated, the association between the samples is eliminated by setting a sample buffer area; or as a post-processing method, the generated samples are read in sequence, and the sample sequence is updated through a sample buffer so as to eliminate the sample sequence autocorrelation. Each time a sample is generated or read from a file, the obtained sample is stored in a sample buffer, and when the buffer is full, the sample is randomly selected from the buffer for output. By means of the random output mode, an extra layer of randomness is added among samples, and finally the purpose of eliminating the sample sequence autocorrelation is achieved.

The invention comprises the following steps:

the first step is as follows: selecting an initial state and initializing a buffer area, wherein the method comprises the following steps:

1.1 define the state space of the Markov chain as S = { S = ₁ ,s ₂ ,…,s _i ,…,s _N Where for positive integers i (1. Ltoreq. I.ltoreq.N), there is s _i And X = { X in sample space ₁ ,x ₂ ,…,x _i ,…,x _N Element x of } element x _i Corresponding;

1.2 constructing a sample probability function f based on a Markov chain state space S _s (s) such that for integer i, there is f _s (s _i )＝f(x _i )，Wherein f (x) _i ) Representing samples taken to obtain a sample x _i The probability of (d);

1.3 arbitrarily choosing an easy-to-sample random distribution g (S) (e.g. uniform distribution, etc.) over the state space S as the auxiliary probability distribution, where g (S) _i ) Indicating the sampled state s _i As the probability of the sample.

1.4 set the initial value of the sample buffer size L to the value to be debugged or a sufficiently large value, L being a positive integer, (typically set to 4000), initialize the sample buffer to empty, with a size that can hold L samples, where each location where a sample can be stored is called a buffer bin. Initializing a current storage position variable p =1;

1.5 generating a positive integer o e [1,N with a uniform random distribution]And with s _o As an initial state of the Markov chain, a state s is calculated _o Corresponding sample probability p _o ＝f _s (s _o ) And output the state s _o Corresponding sample x of _o ；

1.6 setting the Markov chain Current State s _c ＝s _o ；

1.7 setting the target value N of the number of sampling samples according to the requirements of users and the characteristics of applications _s (greater than L).

The second step: generating a candidate state by the method:

2.1 obtaining the State s from random sampling of g(s) _n (N is more than or equal to 1 and less than or equal to N) is taken as a candidate state; sampling from g(s) is simple, and an independent sampling method is generally selected, namely the process of sampling from g(s) is irrelevant to the current state;

2.2 calculating probability value f corresponding to the state to be selected _s (s _n )。

The third step: deciding to determine a subsequent state and generating a sample, wherein the method comprises the following steps:

3.2 generating random number u in the interval of [0,1 ];

3.3 if u<p _accept Then let s _c ＝s _n Ready to accept candidate statesState s _n Is the new current state of the markov chain; otherwise s _c If the state is not changed, the state to be selected is rejected;

3.4 generating State s _c Corresponding sample is recorded, and the serial number n generated by the sample is recorded _generate But not output first.

The fourth step: judging the use mode:

if the required buffer area size value is obtained through measurement or is applied to the scenes such as aerospace, ship and bridge reliability analysis and the like, the requirement on the real-time performance of the sample is high, and when no sufficient time is available for debugging, a mode variable mode =1 is set, and the eighth step is carried out; if the debugging process is carried out, namely the required buffer capacity is measured, setting a mode variable mode =2, and turning to the fifth step;

the fifth step: storing a sample:

and storing the sample generated in the step 3.4 into a sample file.

And a sixth step: judging whether sampling is completed:

if the sample obtained in step 3.4 generates the serial number n _generate ＝N _s Turning to the seventh step; if the sample obtained in step 3.4 generates the serial number n _generate <N _s Then go to the second step.

The seventh step: reading a sample:

and reading one sample in sequence from the sample file saved in the fifth step.

Eighth step: judging whether the sample buffer area is full:

the last buffer bin of the sample buffer is checked for a sample. If yes, turning to the ninth step; if not, go to the tenth step.

The ninth step: randomly outputting samples by the following method:

9.1 production [1,L]Random positive integer r between _b ；

9.2 buffer of the r-th sample _b Outputting the samples stored in the sample buffer slot, and emptying the r-th buffer slot _b A buffer slot for recording the output serial number n of the sample _output ；

9.3 if mode =1, sample generated in step 3.4 is sampledThe data is stored to the r-th sample buffer _b In each buffer tank; if mode =2, the sample read in the seventh step is stored in the r-th sample buffer _b In each buffer tank;

9.4 go to the tenth step.

The tenth step: filling a sample buffer zone by the following method:

10.1 if mode =1, storing the sample generated in the step 3.4 into the p sample buffer slot, where p is the current storage location variable initialized in the step 1.4;

10.2 if mode =2, storing the sample read in the seventh step into the p-th sample buffer slot, where p is the current storage location variable initialized in the 1.4 step;

10.3p＝p+1；

10.4 go to the tenth step.

The eleventh step: judging the use mode:

if mode =1, go to the twelfth step; if mode =2, jump to the thirteenth step.

The twelfth step: judging whether sampling is finished:

output sample number n recorded in step 9.2 _output ＝N _s L, jumping to the fourteenth step; output sample number n recorded in step 9.2 _output <N _s and-L, then turning to the second step.

The thirteenth step: judging whether the sample is read completely:

if the sample in the sample file is completely read, turning to the fourteenth step; and if the unread samples still exist in the file, turning to the seventh step.

The fourteenth step is that: emptying a sample buffer zone by the following method:

14.1 initializing the residual sample number variable N _left ＝L；

14.2 production [1,N _left ]Random positive integer r between _c And will buffer the r-th data _c Outputting samples in each buffer groove;

14.3 converting N _left The sample in each buffer tank moves to the r-th stage _c A buffer tank;

14.4N _left ＝N _left –1；

14.5 if N _left If the value is 0, go to the fifteenth step, otherwise go to 14.2.

The fifteenth step: and (6) ending.

Compared with the prior art, the method has the advantages of high calculation speed, maintenance of the original efficiency of MCMC, relatively negligible overhead of additionally added buffer operation relative to the calculation of the probability value, and elimination of sample autocorrelation. The specific reason is as follows:

1. the steps with the largest calculated amount and the largest consumed time are the first step, the second step and the third step, and are the same as the standard MCMC method, the extra operation expenses (the eighth step, the ninth step and the tenth step) of the buffer area are very small and can be ignored, the extra operation expenses (the fifth step and the seventh step) of the file are very small, the I/O amount of the related computer is small, and the relative cost can be ignored. And particularly when the using mode is synchronous execution, no file operation is needed. Therefore, the calculated amount of the method is basically the same as that of the standard MCMC method;

2. compared with the skip sampling, in the eighth step, the ninth step and the tenth step of the invention, no sample is discarded, so that no sample waste behavior exists, the original sampling efficiency of MCMC is kept on the premise of keeping the calculated amount basically the same, finally all generated samples are output, and the performance that only one probability value needs to be calculated when one sample is generated on average is realized;

3. in the ninth and tenth steps of the invention, samples are output in random order by introducing a sample buffer zone and indexing by random numbers in the 9.1,9.2,9.3 step, a layer of randomness is introduced between the samples, thus r of the sample sequence is introduced ₁ The value is reduced to 10 ^-3 And magnitude, correlation between samples is eliminated.

The different size buffers have different effects on the removal of the sequence autocorrelation as shown in fig. 4. The larger the buffer, the more sufficient randomness is introduced and the better the cancellation of the autocorrelation. The effect of the present invention on eliminating the autocorrelation of the sample sequence is shown in fig. 5, i.e. the effect of the present invention applied to the quantum dominance benchmark test on a supercomputer, where fig. 5 (a) is the autocorrelation curve of the sample sequence sampled by the standard MCMC method, and fig. 5 (b) is the autocorrelation curve obtained by using the present invention. Comparing the autocorrelation of each order of the two sample sequences, the autocorrelation of the first 100 orders of the sample sequences obtained by the standard MCMC method is relatively serious, and the autocorrelation of the sample sequences generated by the method is eliminated to a negligible degree when the order is more than 0. In addition, no sample loss exists in the execution process of the invention, so no matter how large the sample buffer area is set, the total execution time of the whole process is not greatly influenced, and only the generation time of the first independent sample is increased. As large a buffer as possible can be used when the preparation time allowed in the present period is sufficient.

Drawings

FIG. 1 is a flow chart of a MCMC method as described in the background art;

FIG. 2 is a flow chart of a skip-step sampling method according to the background art;

FIG. 3 is an overall flow diagram of the present invention;

FIG. 4 is a schematic diagram illustrating the effect of eliminating the first-order autocorrelation coefficient of a sample sequence under different configuration conditions (different sample buffers);

fig. 5 is a graph comparing the effect of the present invention with that of the background art. Fig. 5 (a) is an autocorrelation curve of a sample sequence sampled by the MCMC method of the background art, and fig. 5 (b) is an autocorrelation curve obtained by using the present invention;

fig. 6 is a comparison diagram of the efficiency of the skip-step sampling method in the present invention and the background art.

Detailed Description

The general flow of the present invention is shown in fig. 3, and comprises the following steps:

the first step is as follows: selecting an initial state, initializing a buffer area

1.2 constructing a sample probability function based on Markov chain state space SNumber f _s (s) such that for integer i, there is f _s (s _i )＝f(x _i ) Wherein f (x) _i ) Representing samples taken to obtain a sample x _i The probability of (d);

1.3 randomly choosing an easy-to-sample random distribution g (S) in the state space S as an auxiliary probability distribution, wherein g (S) _i ) Indicating the sampled state s _i As the probability of the sample.

1.4 set the sample buffer capacity L to 4000, initialize the sample buffer to empty, capacity to hold L samples, where each location where a sample can be stored is called a buffer bin. Initializing a current storage position variable p =1;

1.5 generating a positive integer o e [1,N with a uniform random distribution]And with s _o As the initial state of the Markov chain, the state s is calculated _o Corresponding sample probability p _o ＝f _s (s _o ) And output the state s _o Corresponding sample x of _o ；

1.6 setting the Markov chain Current State s _c ＝s _o ；

1.7 setting the target value N of the number of sampling samples according to the requirements of users and the characteristics of applications _s Is greater than L.

The second step is that: generating candidate states

2.1 obtaining the State s from the probability distribution g(s) by random sampling with independent sampling method _n (N is more than or equal to 1 and less than or equal to N) is taken as a candidate state;

The third step: and (3) deciding to determine a subsequent state, generating a sample:

3.2 generating random number u in the interval of [0,1 ];

3.3 if u<p _accept Then let s _c ＝s _n I.e. accepting the candidate state s _n Is the new current state of the Markov chain; otherwise s _c Remaining unchanged, i.e. rejecting candidate state；

3.4 generating State s _c Corresponding sample is recorded, and the serial number n generated by the sample is recorded _generate 。

The fourth step: determining a usage pattern

If the required buffer area size value is obtained through measurement or is applied to scenes such as aerospace, ship and bridge reliability analysis and the like, the requirement on the real-time performance of the sample is high, and no sufficient time is available for debugging, setting a mode variable mode =1, and going to the eighth step; if the debugging process is carried out, namely the capacity of the required buffer area is measured, setting a mode variable mode =2, and turning to the fifth step;

the fifth step: storing a sample:

and storing the sample generated in the step 3.4 into a sample file.

And a sixth step: judging whether sampling is completed:

The seventh step: reading a sample:

Eighth step: judging whether the sample buffer area is full:

the last buffer bin of the sample buffer is checked for samples. If yes, turning to the ninth step; if not, go to the tenth step.

The ninth step: randomly outputting samples:

9.1 production [1,L]Random positive integer r between _b ；

9.2 buffer of the r-th sample _b Outputting the samples stored in the sample buffer slot, and emptying the r-th buffer slot _b A buffer tank for recording the output serial number n of the sample _output ；

9.3 if mode =1, store the sample generated in step 3.4 to the r-th sample buffer _b In each buffer tank; if mode =2, the samples read in the seventh step are stored in a sample bufferR is _b In each buffer tank;

9.4 go to the tenth step.

The tenth step: filling a sample buffer:

10.1 if mode =1, storing the sample generated in the 3.4 step into the p sample buffer slot, where p is the current storage location variable initialized in the 1.4 step;

10.3p＝p+1；

10.4 go to the tenth step.

The eleventh step: judging the use mode:

if mode =1, go to the twelfth step; if mode =2, jump to the thirteenth step.

The twelfth step: judging whether sampling is completed:

output sample number n recorded in step 9.2 _output ＝N _s -L, then go to fourteenth step; if the output sample number n recorded in step 9.2 _output <N _s And L, then the second step is carried out.

The thirteenth step: judging whether the sample is read completely:

The fourteenth step is that: emptying the sample buffer:

14.1 initializing the residual sample number variable N _left ＝L；

14.3 mixing N _left The sample in the buffer tank moves to the r-th _c A buffer tank;

14.4N _left ＝N _left –1；

The fifteenth step: and (6) ending.

FIG. 4 is a schematic diagram illustrating the effect of eliminating the first-order autocorrelation coefficient of a sample sequence under different configuration conditions (different sample buffers). In fig. 4, the abscissa is the size of the sample buffer, and the ordinate is the first-order autocorrelation coefficient of the sample sequence, and it can be seen from fig. 4 that the first-order autocorrelation coefficient of the sample sequence decreases rapidly as the sample buffer increases. When the sample buffer size reaches 1000, the sample autocorrelation coefficient is reduced to 10 ^-2 Magnitude; after the sample buffer size reaches 2000, the sample sequence autocorrelation is negligible.

FIG. 5 is a diagram illustrating the comparison between the MCMC method of the background art and the autocorrelation of the sampled sequences of the present invention. Fig. 5 (a) is an autocorrelation curve of a sample sequence sampled by the MCMC method described in the background art, and fig. 5 (b) is an autocorrelation curve obtained by using the present invention. The abscissa is the order and the ordinate is the sample sequence autocorrelation coefficient. The experimental data is a quantum dominance benchmark test based on 30-photon 900-mode glass color sampling. In this test, 30 photons can be present in any 30 of the 900 modes, and thus there is a total of 30 modes

Probability, each probability representing a possible sample, sampling space size of

The test results were measured on a Tianhe second super computer (64 compute nodes). Since the low-order (less than 10-order) autocorrelation is the most significant in engineering application, as shown in fig. 5 (a), the sequence obtained by the MCMC method has a low-order autocorrelation (1-order autocorrelation is as high as 0.97, and 10-order autocorrelation is still as high as 0.77; as shown in FIG. 5 (b), the autocorrelation of each order of the sequence obtained by the present invention is 10 ^-3 ～10 ^-2 Magnitude, negligible.

FIG. 6 is a comparison of the efficiency of the skip-step sampling method in the background art and the present inventionThe experimental data are based on a quantum robustness benchmark test of 20-photon 400-mode glass color sampling and a quantum robustness benchmark test of 25-photon 625-mode glass color sampling. The test environment is a Tianhe second-grade super computer, and the number of the used nodes is 4 calculation nodes and 32 calculation nodes respectively. The sample space sizes of the two tests are respectively

And

comparison two methods were used to generate 5 and 2 million samples for the two test scales, respectively. For the 20-photon 400-mode glass color sampling quantum domination test, the skip sampling execution time is about 8.84 hours, and the first-order autocorrelation coefficient of the obtained sequence is 0.0424; the execution time of the invention is about 5.31 minutes, and the first-order autocorrelation coefficient is 0.0067. For a 25-photon 625-mode glass color sampling quantum dominating standard test, the execution time of step skipping sampling is about 24.46 hours, and the first-order autocorrelation coefficient of an obtained sequence is 0.1083; the execution time of the invention is about 14.67 minutes, and the first-order autocorrelation coefficient is 0.0089. The test result shows that the autocorrelation elimination effect of the invention is better, and the execution time is accelerated by about 100 times compared with the leapfrog sampling.

Claims

1. A method for sampling large sample space complex probability distribution without autocorrelation is characterized by comprising the following steps:

1.1 define the state space of the Markov chain as S = { S = ₁ ,s ₂ ,…,s _i ,…,s _N Where for a positive integer i, 1. Ltoreq. I.ltoreq.N, there is s _i And X = { X in sample space ₁ ,x ₂ ,…,x _i ,…,x _N Element x of } element x _i Corresponding;

1.2 constructing a sample probability function f based on a Markov chain state space S _s (s) such that for integer i, there is f _s (s _i )＝f(x _i ) Wherein f (x) _i ) Is complicated byProbability distribution function representing sampled samples x _i The probability of (d);

1.3 randomly choosing an easy-to-sample random distribution g (S) in the state space S as an auxiliary probability distribution, wherein g (S) _i ) Representing the sampled state s _i Probability as a sample;

1.4 setting an initial value of a sample buffer area capacity L, wherein L is a positive integer, initializing the sample buffer area to be empty, and the sample buffer area capacity to accommodate L samples, wherein each position capable of storing samples is called a buffer slot, and initializing a current storage position variable p =1;

1.6 setting the Markov chain Current State s _c ＝s _o ；

1.7 setting the target value N of the number of sampling samples according to the requirements of users and the characteristics of applications _s ；

The second step is that: generating a candidate state by the method:

2.1 obtaining the State s from random sampling of g(s) _n N is more than or equal to 1 and less than or equal to N as a candidate state;

2.2 calculating probability value f corresponding to the state to be selected _s (s _n )；

3.1 calculating the probability of acceptance p _accept ＝min(1,(f _s (s _n )g(s _c ))/(f _s (s _c )g(s _n )))；

3.2 generating a random number u in the interval of [0,1 ];

3.3 if u<p _accept Then let s _c ＝s _n (ii) a Otherwise s _c Keeping the same;

3.4 generating State s _c Corresponding sample is recorded, and the serial number n generated by the sample is recorded _generate ；

The fourth step: the method for judging the use mode comprises the following steps:

if the real-time requirement on the sample is high and no sufficient time is available for debugging, setting a mode variable mode =1, and going to the eighth step; if the debugging process is carried out, namely the required buffer capacity is measured, setting a mode variable mode =2, and turning to the fifth step;

the fifth step: storing a sample:

storing the sample generated in the step 3.4 into a sample file;

and a sixth step: judging whether sampling is finished:

if the sample obtained in step 3.4 generates the serial number n _generate ＝N _s Turning to the seventh step; if the sample obtained in step 3.4 generates the serial number n _generate <N _s Then, the second step is carried out;

the seventh step: reading a sample:

reading a sample from a sample file in sequence;

the eighth step: judging whether the sample buffer area is full:

checking whether a sample exists in the last buffer slot of the sample buffer area, and if so, turning to the ninth step; if not, turning to the tenth step;

the ninth step: randomly outputting samples by the following method:

9.1 production [1,L]Random positive integer r between _b ；

9.2 buffer of samples _b Outputting the samples stored in the sample buffer slot, and emptying the r-th buffer slot _b A buffer slot for recording the output serial number n of the sample _output ；

9.3 if mode =1, store the sample generated in step 3.4 to the r-th sample buffer _b In each buffer tank; if mode =2, the sample read in the seventh step is stored in the r-th sample buffer _b In each buffer tank;

9.4 go to the tenth step;

the tenth step: filling the sample buffer

10.1 if mode =1, storing the sample generated in step 3.4 into the p-th sample buffer bin;

10.2 if mode =2, storing the sample read in the seventh step into the p-th sample buffer bin;

10.3p＝p+1；

10.4 go to the tenth step;

the eleventh step: judging the use mode:

if mode =1, go to the twelfth step; if mode =2, go to the thirteenth step;

the twelfth step: judging whether sampling is completed:

output sample number n recorded in step 9.2 _output ＝N _s -L, go to the fourteenth step; if the output sample number n recorded in step 9.2 _output <N _s L, then go to the second step;

the thirteenth step: judging whether the sample is read completely:

if the sample in the sample file is completely read, turning to the fourteenth step; if the unread samples still exist in the file, turning to the seventh step;

a fourteenth step of: emptying a sample buffer zone by the following method:

14.1 initializing the residual sample number variable N _left ＝L；

14.2 production [1,N _left ]Random positive integer r between _c And will buffer the r-th bit in the buffer _c Outputting samples in each buffer groove;

14.4N _left ＝N _left –1；

14.5 if N _left If the value is 0, the fifteenth step is executed, otherwise, 14.2 is executed;

the fifteenth step: and (6) ending.

2. The method of non-autocorrelation sampling of a large sample spatial complex probability distribution as claimed in claim 1 wherein the easy-to-sample random distribution g(s) employs a uniform distribution.

3. The method for non-autocorrelation sampling of a spatially complex probability distribution of large samples of claim 1 where the sample buffer size L in 1.3 steps is initially set to 4000.

4. The method of non-autocorrelation sampling of a spatially complex probability distribution of large samples of claim 1 in which the target value of the number of samples sampled, N, is 1.7 steps _s Is greater than L.

5. The method of non-autocorrelation sampling of a large sample spatial complex probability distribution as in claim 1 wherein 2.1 steps said sampling from g(s) employs an independent sampling method.