CN110874453A - Self-service capacity expansion method based on correlation coefficient criterion - Google Patents

Self-service capacity expansion method based on correlation coefficient criterion Download PDF

Info

Publication number
CN110874453A
CN110874453A CN201910929226.2A CN201910929226A CN110874453A CN 110874453 A CN110874453 A CN 110874453A CN 201910929226 A CN201910929226 A CN 201910929226A CN 110874453 A CN110874453 A CN 110874453A
Authority
CN
China
Prior art keywords
sample
self
help
correlation coefficient
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910929226.2A
Other languages
Chinese (zh)
Inventor
彭维仕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Air Force Engineering University of PLA
Original Assignee
Air Force Engineering University of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Air Force Engineering University of PLA filed Critical Air Force Engineering University of PLA
Priority to CN201910929226.2A priority Critical patent/CN110874453A/en
Publication of CN110874453A publication Critical patent/CN110874453A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/17Function evaluation by approximation methods, e.g. inter- or extrapolation, smoothing, least mean square method

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a self-help capacity expansion method based on a correlation coefficient criterion. The method comprises the following steps: s1, initializing parameters and setting a correlation coefficient threshold value rhoεPositive integer M, self-help sample capacity B and equal division number M of histogram function; s2, copying a new sample, randomly generating a positive integer R-U (0, M) subject to uniform distribution, and calculating a remainder p which is mod (R, n), wherein n is the sample capacity; s3, calculating the mean value of the copied samples, and making the ith new sample be
Figure DDA0002219830670000011
S2 is repeated to obtain a set of duplicate samples
Figure DDA0002219830670000012
And calculating the mean of the duplicate samples
Figure DDA0002219830670000013
S4, obtaining the self-help resampling sample, and repeating the steps S2 and S3 to obtain the self-help resampling sample
Figure DDA0002219830670000014
Computing resample samples
Figure DDA0002219830670000015
And the original sample
Figure DDA0002219830670000016
The degree of similarity rho (f (x); f (x)*) If ρ (f (x); f (x)*))≥ρεThen output the resample sample
Figure DDA0002219830670000017
Otherwise, repeating the steps S1-S4 until the similarity condition is satisfied. The resampling sample obtained by the invention fully utilizes the information of the given sample, and the obtained resampling sample is closer to the real situation.

Description

Self-service capacity expansion method based on correlation coefficient criterion
Technical Field
The invention belongs to the field of probability mathematical statistics, and particularly relates to a self-service capacity expansion method based on a correlation coefficient criterion.
Background
In engineering practice, a common method for solving the problem of small samples is to increase the original sample amount by utilizing a resampling method based on a probability mathematical statistics theory. In 1977, professor Efren, B of Stanford university, USA proposed a new statistical analysis method, namely a self-service method (Bootstrap method). Prior to this, Quenodille, M.H proposed Jacknife estimates of the estimated volume bias and Tukey, J.W proposed Jacknife estimates of the estimated volume variance. After this time, the bayesian Bootstrap method (also called random weighting method) has appeared.
Through analysis, the self-help method mainly copies samples from original samples by using a random method, so that the self-help samples are likely to deviate from the original samples, and further, the calculation result deviates from the real distribution. This phenomenon is very noticeable when the sample size is small. Especially, when the true distribution is continuous distribution, it is difficult to obtain a parameter estimation of the true distribution because the distribution characteristics at the non-sample observation points cannot be obtained.
Disclosure of Invention
The invention aims to provide a self-help capacity expansion method based on a correlation coefficient criterion, and aims to overcome the defects of the prior art in the background art.
The invention is realized in this way, a self-help capacity expansion method based on the correlation coefficient criterion, the method includes the following steps:
s1, parameter initialization: setting a correlation coefficient threshold value rhoεPositive integer M, self-help sample capacity B and equal division number M of histogram function;
s2, copy new sample: randomly generating a positive integer R-U (0, M) subject to uniform distribution, wherein n is the sample capacity, and calculating the remainder p as mod (R, n);
s3, calculating the mean value of the copy samples: let the ith new sample be
Figure BDA0002219830650000021
S2 is repeated to obtain a set of duplicate samples
Figure BDA0002219830650000022
And calculating the mean of the duplicate samples
Figure BDA0002219830650000023
S4, obtaining a self-help resampling sample: repeating the steps S2 and S3 to obtain a self-service resampling sample
Figure BDA0002219830650000024
Computing resample samples
Figure BDA0002219830650000025
And the original sample
Figure BDA0002219830650000026
The degree of similarity rho (f (x); f (x)*) ); if ρ (f (x); f (x)*))≥ρεThen output the resample sample
Figure BDA0002219830650000027
Otherwise, repeating the steps S1-S4 until the similarity condition is satisfied.
Preferably, in step S2, the calculation of the remainder is specifically:
Figure BDA0002219830650000028
preferably, in step S4, the similarity ρ (f (x)*) Specific calculation is:
Figure BDA0002219830650000029
in the formula, h (-) represents a histogram function with equal division spacing m (m < n); x is the number ofk,
Figure BDA00022198306500000210
Respectively, the center coordinates of the kth histogram in the corresponding histogram.
The invention overcomes the defects of the prior art, and provides a self-help capacity expansion method based on a correlation coefficient criterion, the basic principle of the method is shown in figure 1, and the method comprises the following steps:
s1, initializing parameters and setting a correlation coefficient threshold value rhoεPositive integer M, self-help sample capacity B and equal division number M of histogram function;
s2, copying a new sample, randomly generating a positive integer R-U (0, M) which is subject to uniform distribution, and calculating the remainder p which is mod (R, n);
s3, calculating the mean value of the copied samples, and making the ith new sample be
Figure BDA00022198306500000211
S2 is repeated to obtain a set of duplicate samples
Figure BDA0002219830650000031
And calculating the mean of the duplicate samples
Figure BDA0002219830650000032
S4, obtaining the self-help resampling sample, and repeating the steps S2 and S3 to obtain the self-help resampling sample
Figure BDA0002219830650000033
Computing resample samples
Figure BDA0002219830650000034
And the original sample
Figure BDA0002219830650000035
The degree of similarity rho (f (x); f (x)*) If ρ (f (x); f (x)*))≥ρεThen output the resample sample
Figure BDA0002219830650000036
Otherwise, repeating S1-S4 until the similarity condition is satisfied.
Compared with the defects and shortcomings of the prior art, the invention has the following beneficial effects:
(1) the invention discloses a self-service capacity expansion method based on a similarity criterion and a correlation coefficient criterion, and the correctness of a resample sample is ensured by judging the consistency of the improved self-service sample characteristic and the original sample characteristic;
(2) the resampling sample obtained by the method fully utilizes the information of the given sample, and the obtained resampling sample is closer to the real situation;
(3) the invention is not only suitable for self-help expansion under the condition of small samples, but also suitable for self-help expansion under the condition of extremely small samples.
Drawings
FIG. 1 is a flow chart of the steps of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention discloses a self-help capacity expansion method based on a correlation coefficient criterion, which comprises the following steps:
s1, parameter initialization
In order to ensure the consistency of the self-help sample characteristics and the original sample characteristics, a correlation coefficient threshold value rho is setεThe characteristics of both are calculated. To randomly copy new ground samples, a relatively large positive integer M (M > n) is set, where n is the size of the original sample. Since the probability density functions of the original sample and the copied sample are not easy to obtain, in practice, the histogram is often used to replace the probability density function, and therefore, the bisection number m of the histogram function needs to be set.
S2, copying new sample
Randomly generating a positive integer R-U (0, M) subject to uniform distribution, and calculating a remainder:
Figure BDA0002219830650000041
the new sample replicated is:
Figure BDA0002219830650000042
s3, calculating the mean value of the copy samples
Repeating the step S2 n times to obtain a group of copy samples
Figure BDA0002219830650000043
And calculating the mean of the duplicate samples:
Figure BDA0002219830650000044
s4, obtaining self-help resampling sample
Further, repeating steps S2-S3B times to obtain a set of self-help resample samples:
Figure BDA0002219830650000045
further calculating resample samples
Figure BDA0002219830650000046
And the original sample
Figure BDA0002219830650000047
Similarity of (2):
Figure BDA0002219830650000048
since the probability density functions of the original sample and the copied sample are not easy to obtain, and in practice, the probability density function is usually replaced by a histogram, equation (5) can be approximated as:
Figure BDA0002219830650000049
in the formula, h (-) represents a histogram function with equal division spacing m (m < n); x is the number ofk,
Figure BDA00022198306500000410
Respectively, the center coordinates of the kth histogram in the corresponding histogram.
If ρ (f (x); f (x)*))≥ρεThen output the resample sample
Figure BDA00022198306500000411
Otherwise, repeating the steps S1-S4 until the similarity condition is satisfied.
Assume that data x obeys a rayleigh distribution, i.e.:
Figure BDA0002219830650000051
wherein k represents the degree of freedom of Rayleigh distribution and satisfies k > 0; exp (·) represents an exponential function.
Taking the degree of freedom k as 3, generating 10 original samples by the method of Monte Carlo
Figure BDA0002219830650000052
And according to the original sample data, estimating the degree of freedom of the distribution function by respectively utilizing a self-service method and a self-service expansion method based on a correlation coefficient criterion. To illustrate the advantages of the present invention, let B be 1000 for the self-sampling sample size, and Num be 500 for the total number of estimated degrees of freedom. Then, the average value of the 500 estimated degrees of freedom is used as a final degree of freedom estimated value.
Further calculating the estimated average of 500 degrees of freedom can obtain:
Figure BDA0002219830650000053
the corresponding absolute error is:
Figure BDA0002219830650000054
it is clear that,
eBootstrap<eNewBootstrap(10)
therefore, the estimation result of the invention has smaller error and is closer to the real degree of freedom.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (3)

1. A self-service capacity expansion method based on a correlation coefficient criterion is characterized by comprising the following steps:
s1, setting a correlation coefficient threshold value rhoεPositive integer M, self-help sample capacity B and equal division number M of histogram function to initialize parameters;
s2, randomly generating a positive integer R-U (0, M) subject to uniform distribution, and calculating the remainder p ═ mod (R, n) to copy new samples;
s3, let the ith new sample be
Figure FDA0002219830640000011
S2 is repeated to obtain a set of duplicate samples
Figure FDA0002219830640000012
And calculating the mean of the duplicate samples
Figure FDA0002219830640000013
S4, repeating the steps S2 and S3 to obtain self-help resampling sample
Figure FDA0002219830640000014
Computing resample samples
Figure FDA0002219830640000015
And the original sample
Figure FDA0002219830640000016
The degree of similarity rho (f (x); f (x)*) ); if ρ (f (x); f (x)*))≥ρεThen output the resample sample
Figure FDA0002219830640000017
Otherwise, repeating the steps S1-S4 until the similarity condition is satisfied.
2. The self-service capacity expansion method based on the correlation coefficient criterion as claimed in claim 1, wherein in step S2, the calculation of the remainder is specifically:
Figure FDA0002219830640000018
where n is the sample volume.
3. The self-help capacity expansion method based on correlation coefficient criterion in claim 1, wherein in step S4, the similarity p (f (x); and*) Specific calculation is:
Figure FDA0002219830640000019
in the formula, h (-) represents a histogram function with equal division spacing m (m < n); x is the number ofk,
Figure FDA0002219830640000021
Respectively, the center coordinates of the kth histogram in the corresponding histogram.
CN201910929226.2A 2019-09-29 2019-09-29 Self-service capacity expansion method based on correlation coefficient criterion Pending CN110874453A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910929226.2A CN110874453A (en) 2019-09-29 2019-09-29 Self-service capacity expansion method based on correlation coefficient criterion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910929226.2A CN110874453A (en) 2019-09-29 2019-09-29 Self-service capacity expansion method based on correlation coefficient criterion

Publications (1)

Publication Number Publication Date
CN110874453A true CN110874453A (en) 2020-03-10

Family

ID=69718064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910929226.2A Pending CN110874453A (en) 2019-09-29 2019-09-29 Self-service capacity expansion method based on correlation coefficient criterion

Country Status (1)

Country Link
CN (1) CN110874453A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021185330A1 (en) * 2020-03-20 2021-09-23 京东方科技集团股份有限公司 Data enhancement method and data enhancement apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021185330A1 (en) * 2020-03-20 2021-09-23 京东方科技集团股份有限公司 Data enhancement method and data enhancement apparatus

Similar Documents

Publication Publication Date Title
Didelot et al. Likelihood-free estimation of model evidence
Zhang et al. Advances in surrogate modeling for storm surge prediction: storm selection and addressing characteristics related to climate change
Bollen et al. Issues in the structural equation modeling of complex survey data
Francisco‐Fernandez et al. Smoothing parameter selection methods for nonparametric regression with spatially correlated errors
Zhou et al. A comparison of fractal dimension estimators based on multiple surface generation algorithms
Jiang et al. Weibull failure probability estimation based on zero-failure data
CN112229403B (en) Method for improving ocean gravity reconstruction accuracy based on three-dimensional correction principle of ground level
CN109143196A (en) Tertile point method for parameter estimation based on K Distribution Sea Clutter amplitude model
CN106960420B (en) Image reconstruction method of segmented iterative matching tracking algorithm
Chen et al. Mapping topological characteristics of dynamical systems into neural networks: A reservoir computing approach
CN111814342A (en) Complex equipment reliability hybrid model and construction method thereof
CN111709454B (en) Multi-wind-field output clustering evaluation method based on optimal copula model
CN110874453A (en) Self-service capacity expansion method based on correlation coefficient criterion
CN110969639B (en) Image segmentation method based on LFMVO optimization algorithm
CN108460208A (en) More performance parameter degenerative process dependence measures based on Copula entropys
CN111192302A (en) Feature matching method based on motion smoothness and RANSAC algorithm
Yaghouti et al. Determining optimal value of the shape parameter $ c $ in RBF for unequal distances topographical points by Cross-Validation algorithm
CN115859116A (en) Marine environment field reconstruction method based on radial basis function regression interpolation method
Lee et al. A comparative study of uncertainty propagation methods for black-box type functions
Santos et al. A local maximum likelihood estimator for Poisson regression
Xinya et al. Confirmatory factor analysis under violations of structural and distributional assumptions: A comparison of robust Maximum likelihood and Bayesian estimation methods
Hofierka Interpolation of radioactivity data using regularized spline with tension
CN113806959B (en) Quantitative research method for estuary design high tide level in future situation
Chen et al. The clustering analysis and spatial interpolation of intense rainfall data
Gladius Jennifer et al. Spatial Sampling Technique: Method to Collect Data Randomly with Geographical Indicators in Public Health Research.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200310

RJ01 Rejection of invention patent application after publication