CN110874453A - Self-service capacity expansion method based on correlation coefficient criterion - Google Patents
Self-service capacity expansion method based on correlation coefficient criterion Download PDFInfo
- Publication number
- CN110874453A CN110874453A CN201910929226.2A CN201910929226A CN110874453A CN 110874453 A CN110874453 A CN 110874453A CN 201910929226 A CN201910929226 A CN 201910929226A CN 110874453 A CN110874453 A CN 110874453A
- Authority
- CN
- China
- Prior art keywords
- sample
- self
- help
- correlation coefficient
- samples
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012952 Resampling Methods 0.000 claims abstract description 13
- 238000009827 uniform distribution Methods 0.000 claims abstract description 5
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000009826 distribution Methods 0.000 description 7
- 230000007547 defect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/17—Function evaluation by approximation methods, e.g. inter- or extrapolation, smoothing, least mean square method
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Algebra (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a self-help capacity expansion method based on a correlation coefficient criterion. The method comprises the following steps: s1, initializing parameters and setting a correlation coefficient threshold value rhoεPositive integer M, self-help sample capacity B and equal division number M of histogram function; s2, copying a new sample, randomly generating a positive integer R-U (0, M) subject to uniform distribution, and calculating a remainder p which is mod (R, n), wherein n is the sample capacity; s3, calculating the mean value of the copied samples, and making the ith new sample beS2 is repeated to obtain a set of duplicate samplesAnd calculating the mean of the duplicate samplesS4, obtaining the self-help resampling sample, and repeating the steps S2 and S3 to obtain the self-help resampling sampleComputing resample samplesAnd the original sampleThe degree of similarity rho (f (x); f (x)*) If ρ (f (x); f (x)*))≥ρεThen output the resample sampleOtherwise, repeating the steps S1-S4 until the similarity condition is satisfied. The resampling sample obtained by the invention fully utilizes the information of the given sample, and the obtained resampling sample is closer to the real situation.
Description
Technical Field
The invention belongs to the field of probability mathematical statistics, and particularly relates to a self-service capacity expansion method based on a correlation coefficient criterion.
Background
In engineering practice, a common method for solving the problem of small samples is to increase the original sample amount by utilizing a resampling method based on a probability mathematical statistics theory. In 1977, professor Efren, B of Stanford university, USA proposed a new statistical analysis method, namely a self-service method (Bootstrap method). Prior to this, Quenodille, M.H proposed Jacknife estimates of the estimated volume bias and Tukey, J.W proposed Jacknife estimates of the estimated volume variance. After this time, the bayesian Bootstrap method (also called random weighting method) has appeared.
Through analysis, the self-help method mainly copies samples from original samples by using a random method, so that the self-help samples are likely to deviate from the original samples, and further, the calculation result deviates from the real distribution. This phenomenon is very noticeable when the sample size is small. Especially, when the true distribution is continuous distribution, it is difficult to obtain a parameter estimation of the true distribution because the distribution characteristics at the non-sample observation points cannot be obtained.
Disclosure of Invention
The invention aims to provide a self-help capacity expansion method based on a correlation coefficient criterion, and aims to overcome the defects of the prior art in the background art.
The invention is realized in this way, a self-help capacity expansion method based on the correlation coefficient criterion, the method includes the following steps:
s1, parameter initialization: setting a correlation coefficient threshold value rhoεPositive integer M, self-help sample capacity B and equal division number M of histogram function;
s2, copy new sample: randomly generating a positive integer R-U (0, M) subject to uniform distribution, wherein n is the sample capacity, and calculating the remainder p as mod (R, n);
s3, calculating the mean value of the copy samples: let the ith new sample beS2 is repeated to obtain a set of duplicate samplesAnd calculating the mean of the duplicate samples
S4, obtaining a self-help resampling sample: repeating the steps S2 and S3 to obtain a self-service resampling sampleComputing resample samplesAnd the original sampleThe degree of similarity rho (f (x); f (x)*) ); if ρ (f (x); f (x)*))≥ρεThen output the resample sampleOtherwise, repeating the steps S1-S4 until the similarity condition is satisfied.
Preferably, in step S2, the calculation of the remainder is specifically:
preferably, in step S4, the similarity ρ (f (x)*) Specific calculation is:
in the formula, h (-) represents a histogram function with equal division spacing m (m < n); x is the number ofk,Respectively, the center coordinates of the kth histogram in the corresponding histogram.
The invention overcomes the defects of the prior art, and provides a self-help capacity expansion method based on a correlation coefficient criterion, the basic principle of the method is shown in figure 1, and the method comprises the following steps:
s1, initializing parameters and setting a correlation coefficient threshold value rhoεPositive integer M, self-help sample capacity B and equal division number M of histogram function;
s2, copying a new sample, randomly generating a positive integer R-U (0, M) which is subject to uniform distribution, and calculating the remainder p which is mod (R, n);
s3, calculating the mean value of the copied samples, and making the ith new sample beS2 is repeated to obtain a set of duplicate samplesAnd calculating the mean of the duplicate samples
S4, obtaining the self-help resampling sample, and repeating the steps S2 and S3 to obtain the self-help resampling sampleComputing resample samplesAnd the original sampleThe degree of similarity rho (f (x); f (x)*) If ρ (f (x); f (x)*))≥ρεThen output the resample sampleOtherwise, repeating S1-S4 until the similarity condition is satisfied.
Compared with the defects and shortcomings of the prior art, the invention has the following beneficial effects:
(1) the invention discloses a self-service capacity expansion method based on a similarity criterion and a correlation coefficient criterion, and the correctness of a resample sample is ensured by judging the consistency of the improved self-service sample characteristic and the original sample characteristic;
(2) the resampling sample obtained by the method fully utilizes the information of the given sample, and the obtained resampling sample is closer to the real situation;
(3) the invention is not only suitable for self-help expansion under the condition of small samples, but also suitable for self-help expansion under the condition of extremely small samples.
Drawings
FIG. 1 is a flow chart of the steps of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention discloses a self-help capacity expansion method based on a correlation coefficient criterion, which comprises the following steps:
s1, parameter initialization
In order to ensure the consistency of the self-help sample characteristics and the original sample characteristics, a correlation coefficient threshold value rho is setεThe characteristics of both are calculated. To randomly copy new ground samples, a relatively large positive integer M (M > n) is set, where n is the size of the original sample. Since the probability density functions of the original sample and the copied sample are not easy to obtain, in practice, the histogram is often used to replace the probability density function, and therefore, the bisection number m of the histogram function needs to be set.
S2, copying new sample
Randomly generating a positive integer R-U (0, M) subject to uniform distribution, and calculating a remainder:
the new sample replicated is:
s3, calculating the mean value of the copy samples
Repeating the step S2 n times to obtain a group of copy samplesAnd calculating the mean of the duplicate samples:
s4, obtaining self-help resampling sample
Further, repeating steps S2-S3B times to obtain a set of self-help resample samples:
since the probability density functions of the original sample and the copied sample are not easy to obtain, and in practice, the probability density function is usually replaced by a histogram, equation (5) can be approximated as:
in the formula, h (-) represents a histogram function with equal division spacing m (m < n); x is the number ofk,Respectively, the center coordinates of the kth histogram in the corresponding histogram.
If ρ (f (x); f (x)*))≥ρεThen output the resample sampleOtherwise, repeating the steps S1-S4 until the similarity condition is satisfied.
Assume that data x obeys a rayleigh distribution, i.e.:
wherein k represents the degree of freedom of Rayleigh distribution and satisfies k > 0; exp (·) represents an exponential function.
Taking the degree of freedom k as 3, generating 10 original samples by the method of Monte CarloAnd according to the original sample data, estimating the degree of freedom of the distribution function by respectively utilizing a self-service method and a self-service expansion method based on a correlation coefficient criterion. To illustrate the advantages of the present invention, let B be 1000 for the self-sampling sample size, and Num be 500 for the total number of estimated degrees of freedom. Then, the average value of the 500 estimated degrees of freedom is used as a final degree of freedom estimated value.
Further calculating the estimated average of 500 degrees of freedom can obtain:
the corresponding absolute error is:
it is clear that,
eBootstrap<eNewBootstrap(10)
therefore, the estimation result of the invention has smaller error and is closer to the real degree of freedom.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (3)
1. A self-service capacity expansion method based on a correlation coefficient criterion is characterized by comprising the following steps:
s1, setting a correlation coefficient threshold value rhoεPositive integer M, self-help sample capacity B and equal division number M of histogram function to initialize parameters;
s2, randomly generating a positive integer R-U (0, M) subject to uniform distribution, and calculating the remainder p ═ mod (R, n) to copy new samples;
s3, let the ith new sample beS2 is repeated to obtain a set of duplicate samplesAnd calculating the mean of the duplicate samples
S4, repeating the steps S2 and S3 to obtain self-help resampling sampleComputing resample samplesAnd the original sampleThe degree of similarity rho (f (x); f (x)*) ); if ρ (f (x); f (x)*))≥ρεThen output the resample sampleOtherwise, repeating the steps S1-S4 until the similarity condition is satisfied.
3. The self-help capacity expansion method based on correlation coefficient criterion in claim 1, wherein in step S4, the similarity p (f (x); and*) Specific calculation is:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910929226.2A CN110874453A (en) | 2019-09-29 | 2019-09-29 | Self-service capacity expansion method based on correlation coefficient criterion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910929226.2A CN110874453A (en) | 2019-09-29 | 2019-09-29 | Self-service capacity expansion method based on correlation coefficient criterion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110874453A true CN110874453A (en) | 2020-03-10 |
Family
ID=69718064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910929226.2A Pending CN110874453A (en) | 2019-09-29 | 2019-09-29 | Self-service capacity expansion method based on correlation coefficient criterion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110874453A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021185330A1 (en) * | 2020-03-20 | 2021-09-23 | 京东方科技集团股份有限公司 | Data enhancement method and data enhancement apparatus |
-
2019
- 2019-09-29 CN CN201910929226.2A patent/CN110874453A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021185330A1 (en) * | 2020-03-20 | 2021-09-23 | 京东方科技集团股份有限公司 | Data enhancement method and data enhancement apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Didelot et al. | Likelihood-free estimation of model evidence | |
Zhang et al. | Advances in surrogate modeling for storm surge prediction: storm selection and addressing characteristics related to climate change | |
Bollen et al. | Issues in the structural equation modeling of complex survey data | |
Francisco‐Fernandez et al. | Smoothing parameter selection methods for nonparametric regression with spatially correlated errors | |
Zhou et al. | A comparison of fractal dimension estimators based on multiple surface generation algorithms | |
Jiang et al. | Weibull failure probability estimation based on zero-failure data | |
CN112229403B (en) | Method for improving ocean gravity reconstruction accuracy based on three-dimensional correction principle of ground level | |
CN109143196A (en) | Tertile point method for parameter estimation based on K Distribution Sea Clutter amplitude model | |
CN106960420B (en) | Image reconstruction method of segmented iterative matching tracking algorithm | |
Chen et al. | Mapping topological characteristics of dynamical systems into neural networks: A reservoir computing approach | |
CN111814342A (en) | Complex equipment reliability hybrid model and construction method thereof | |
CN111709454B (en) | Multi-wind-field output clustering evaluation method based on optimal copula model | |
CN110874453A (en) | Self-service capacity expansion method based on correlation coefficient criterion | |
CN110969639B (en) | Image segmentation method based on LFMVO optimization algorithm | |
CN108460208A (en) | More performance parameter degenerative process dependence measures based on Copula entropys | |
CN111192302A (en) | Feature matching method based on motion smoothness and RANSAC algorithm | |
Yaghouti et al. | Determining optimal value of the shape parameter $ c $ in RBF for unequal distances topographical points by Cross-Validation algorithm | |
CN115859116A (en) | Marine environment field reconstruction method based on radial basis function regression interpolation method | |
Lee et al. | A comparative study of uncertainty propagation methods for black-box type functions | |
Santos et al. | A local maximum likelihood estimator for Poisson regression | |
Xinya et al. | Confirmatory factor analysis under violations of structural and distributional assumptions: A comparison of robust Maximum likelihood and Bayesian estimation methods | |
Hofierka | Interpolation of radioactivity data using regularized spline with tension | |
CN113806959B (en) | Quantitative research method for estuary design high tide level in future situation | |
Chen et al. | The clustering analysis and spatial interpolation of intense rainfall data | |
Gladius Jennifer et al. | Spatial Sampling Technique: Method to Collect Data Randomly with Geographical Indicators in Public Health Research. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200310 |
|
RJ01 | Rejection of invention patent application after publication |