CN110874453A

CN110874453A - Self-service capacity expansion method based on correlation coefficient criterion

Info

Publication number: CN110874453A
Application number: CN201910929226.2A
Authority: CN
Inventors: 彭维仕
Original assignee: Air Force Engineering University of PLA
Current assignee: Air Force Engineering University of PLA
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2020-03-10

Abstract

The invention discloses a self-help capacity expansion method based on a correlation coefficient criterion. The method comprises the following steps: s1, initializing parameters and setting a correlation coefficient threshold value rho_εPositive integer M, self-help sample capacity B and equal division number M of histogram function; s2, copying a new sample, randomly generating a positive integer R-U (0, M) subject to uniform distribution, and calculating a remainder p which is mod (R, n), wherein n is the sample capacity; s3, calculating the mean value of the copied samples, and making the ith new sample be

S2 is repeated to obtain a set of duplicate samples

And calculating the mean of the duplicate samples

S4, obtaining the self-help resampling sample, and repeating the steps S2 and S3 to obtain the self-help resampling sample

Computing resample samples

And the original sample

The degree of similarity rho (f (x); f (x)^*) If ρ (f (x); f (x)^*))≥ρ_εThen output the resample sample

Otherwise, repeating the steps S1-S4 until the similarity condition is satisfied. The resampling sample obtained by the invention fully utilizes the information of the given sample, and the obtained resampling sample is closer to the real situation.

Description

Self-service capacity expansion method based on correlation coefficient criterion

Technical Field

The invention belongs to the field of probability mathematical statistics, and particularly relates to a self-service capacity expansion method based on a correlation coefficient criterion.

Background

In engineering practice, a common method for solving the problem of small samples is to increase the original sample amount by utilizing a resampling method based on a probability mathematical statistics theory. In 1977, professor Efren, B of Stanford university, USA proposed a new statistical analysis method, namely a self-service method (Bootstrap method). Prior to this, Quenodille, M.H proposed Jacknife estimates of the estimated volume bias and Tukey, J.W proposed Jacknife estimates of the estimated volume variance. After this time, the bayesian Bootstrap method (also called random weighting method) has appeared.

Through analysis, the self-help method mainly copies samples from original samples by using a random method, so that the self-help samples are likely to deviate from the original samples, and further, the calculation result deviates from the real distribution. This phenomenon is very noticeable when the sample size is small. Especially, when the true distribution is continuous distribution, it is difficult to obtain a parameter estimation of the true distribution because the distribution characteristics at the non-sample observation points cannot be obtained.

Disclosure of Invention

The invention aims to provide a self-help capacity expansion method based on a correlation coefficient criterion, and aims to overcome the defects of the prior art in the background art.

The invention is realized in this way, a self-help capacity expansion method based on the correlation coefficient criterion, the method includes the following steps:

s1, parameter initialization: setting a correlation coefficient threshold value rho_εPositive integer M, self-help sample capacity B and equal division number M of histogram function;

s2, copy new sample: randomly generating a positive integer R-U (0, M) subject to uniform distribution, wherein n is the sample capacity, and calculating the remainder p as mod (R, n);

s3, calculating the mean value of the copy samples: let the ith new sample be

S2 is repeated to obtain a set of duplicate samples

And calculating the mean of the duplicate samples

S4, obtaining a self-help resampling sample: repeating the steps S2 and S3 to obtain a self-service resampling sample

Computing resample samples

And the original sample

The degree of similarity rho (f (x); f (x)^*) ); if ρ (f (x); f (x)^*))≥ρ_εThen output the resample sample

Otherwise, repeating the steps S1-S4 until the similarity condition is satisfied.

Preferably, in step S2, the calculation of the remainder is specifically:

preferably, in step S4, the similarity ρ (f (x)^*) Specific calculation is:

in the formula, h (-) represents a histogram function with equal division spacing m (m < n); x is the number of_k,

Respectively, the center coordinates of the kth histogram in the corresponding histogram.

The invention overcomes the defects of the prior art, and provides a self-help capacity expansion method based on a correlation coefficient criterion, the basic principle of the method is shown in figure 1, and the method comprises the following steps:

s1, initializing parameters and setting a correlation coefficient threshold value rho_εPositive integer M, self-help sample capacity B and equal division number M of histogram function;

s2, copying a new sample, randomly generating a positive integer R-U (0, M) which is subject to uniform distribution, and calculating the remainder p which is mod (R, n);

s3, calculating the mean value of the copied samples, and making the ith new sample be

S2 is repeated to obtain a set of duplicate samples

And calculating the mean of the duplicate samples

Computing resample samples

And the original sample

Otherwise, repeating S1-S4 until the similarity condition is satisfied.

Compared with the defects and shortcomings of the prior art, the invention has the following beneficial effects:

(1) the invention discloses a self-service capacity expansion method based on a similarity criterion and a correlation coefficient criterion, and the correctness of a resample sample is ensured by judging the consistency of the improved self-service sample characteristic and the original sample characteristic;

(2) the resampling sample obtained by the method fully utilizes the information of the given sample, and the obtained resampling sample is closer to the real situation;

(3) the invention is not only suitable for self-help expansion under the condition of small samples, but also suitable for self-help expansion under the condition of extremely small samples.

Drawings

FIG. 1 is a flow chart of the steps of the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention discloses a self-help capacity expansion method based on a correlation coefficient criterion, which comprises the following steps:

s1, parameter initialization

In order to ensure the consistency of the self-help sample characteristics and the original sample characteristics, a correlation coefficient threshold value rho is set_εThe characteristics of both are calculated. To randomly copy new ground samples, a relatively large positive integer M (M > n) is set, where n is the size of the original sample. Since the probability density functions of the original sample and the copied sample are not easy to obtain, in practice, the histogram is often used to replace the probability density function, and therefore, the bisection number m of the histogram function needs to be set.

S2, copying new sample

Randomly generating a positive integer R-U (0, M) subject to uniform distribution, and calculating a remainder:

the new sample replicated is:

s3, calculating the mean value of the copy samples

Repeating the step S2 n times to obtain a group of copy samples

And calculating the mean of the duplicate samples:

s4, obtaining self-help resampling sample

Further, repeating steps S2-S3B times to obtain a set of self-help resample samples:

further calculating resample samples

And the original sample

Similarity of (2):

since the probability density functions of the original sample and the copied sample are not easy to obtain, and in practice, the probability density function is usually replaced by a histogram, equation (5) can be approximated as:

If ρ (f (x); f (x)^*))≥ρ_εThen output the resample sample

Assume that data x obeys a rayleigh distribution, i.e.:

wherein k represents the degree of freedom of Rayleigh distribution and satisfies k > 0; exp (·) represents an exponential function.

Taking the degree of freedom k as 3, generating 10 original samples by the method of Monte Carlo

And according to the original sample data, estimating the degree of freedom of the distribution function by respectively utilizing a self-service method and a self-service expansion method based on a correlation coefficient criterion. To illustrate the advantages of the present invention, let B be 1000 for the self-sampling sample size, and Num be 500 for the total number of estimated degrees of freedom. Then, the average value of the 500 estimated degrees of freedom is used as a final degree of freedom estimated value.

Further calculating the estimated average of 500 degrees of freedom can obtain:

the corresponding absolute error is:

it is clear that,

e_Bootstrap＜e_NewBootstrap(10)

therefore, the estimation result of the invention has smaller error and is closer to the real degree of freedom.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A self-service capacity expansion method based on a correlation coefficient criterion is characterized by comprising the following steps:

s1, setting a correlation coefficient threshold value rho_εPositive integer M, self-help sample capacity B and equal division number M of histogram function to initialize parameters;

s2, randomly generating a positive integer R-U (0, M) subject to uniform distribution, and calculating the remainder p ═ mod (R, n) to copy new samples;

s3, let the ith new sample be

S2 is repeated to obtain a set of duplicate samples

And calculating the mean of the duplicate samples

S4, repeating the steps S2 and S3 to obtain self-help resampling sample

Computing resample samples

And the original sample

2. The self-service capacity expansion method based on the correlation coefficient criterion as claimed in claim 1, wherein in step S2, the calculation of the remainder is specifically:

where n is the sample volume.

3. The self-help capacity expansion method based on correlation coefficient criterion in claim 1, wherein in step S4, the similarity p (f (x); and^*) Specific calculation is: