CN115455359A - Automatic correction and distribution fitting method for small-batch error data - Google Patents

Automatic correction and distribution fitting method for small-batch error data Download PDF

Info

Publication number
CN115455359A
CN115455359A CN202210876577.3A CN202210876577A CN115455359A CN 115455359 A CN115455359 A CN 115455359A CN 202210876577 A CN202210876577 A CN 202210876577A CN 115455359 A CN115455359 A CN 115455359A
Authority
CN
China
Prior art keywords
distribution
data
value
correction
anderson
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210876577.3A
Other languages
Chinese (zh)
Inventor
曾静文
李晓蕊
杨扬
邓晓春
郭双明
樊娜娜
陈氖华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Aircraft Industrial Group Co Ltd
Original Assignee
Chengdu Aircraft Industrial Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Aircraft Industrial Group Co Ltd filed Critical Chengdu Aircraft Industrial Group Co Ltd
Priority to CN202210876577.3A priority Critical patent/CN115455359A/en
Publication of CN115455359A publication Critical patent/CN115455359A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Factory Administration (AREA)

Abstract

The invention belongs to the field of aviation production, and particularly relates to a small-batch error data automatic correction and distribution fitting method based on Anderson-Darling inspection and cycle estimation, which comprises the following steps: reading annual error data of the same characteristics of the product from the record table; clearing abnormal data of the error data; constructing Anderson-Darling test statistics under four continuous distributions; constructing a p value of the test statistic under continuous distribution; automatically correcting each data by adopting cycle estimation; and setting different compensation values repeatedly to find the optimal compensation value. The method can automatically search the optimal correction value, can quickly and effectively make up the data deviation caused by improper measurement mode or replacement of operators, promotes the data set to be more similar to the real distribution of data, and provides a basis for the subsequent statistical process control and control chart construction.

Description

Automatic correction and distribution fitting method for small-batch error data
Technical Field
The invention belongs to the field of aviation production, and particularly relates to a small-batch error data automatic correction and distribution fitting method based on Anderson-Darling inspection and cycle estimation.
Background
As the top of an industrial system, the aviation industry has strict control on the product quality, and the distribution characteristics of error data of the observed quantity value and the theoretical value outside the product reflect the quality information of the manufacturing process, so that the method is a basic basis for realizing statistical process control, optimization and production management. However, due to improper measurement mode and operator replacement, the recorded values of the error data tend to deviate from the real values, which especially brings about greater challenges to the statistical distribution inference of small batches of error data. Therefore, the method has important significance in accurately automatically correcting the error data in small batches and analyzing the statistical distribution characteristics of the error data.
Currently, most documents are as follows: the application of multivariable statistical process control in the reverse flotation production process, wind turbine generator gear case state evaluation integrating SCADA data, research and practice of leveling process on-line monitoring and statistical process control, and data preprocessing method comparative analysis based on a typical data set are disclosed, wherein the contents disclosed in the documents are mainly that abnormal data are removed based on a 4-quantile method before statistical process control is carried out, namely data beyond the upper and lower 4-quantile are removed from sample data. The method is suitable for the condition of large samples, for a small batch process, the fitting of the sample amount for further reduction on the distribution is unfavorable, and a more reasonable method is to find the true value of the data by a data correction method so as to accurately obtain the statistical distribution characteristics of the data. However, the current preprocessing method of the data is only limited to normalization, standardization and normalization of the data, wherein the normalization method comprises Box-Cox conversion, johnson transformation and the like, and the normality and symmetry of the data can be improved; normalization and normalization methods aim at dimensionless data by mathematical operations. These methods are only applicable to cases that follow normal distribution, and real manufacturing error data may also follow truncated normal distribution, gamma distribution, t distribution, or the like.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention provides an automatic correction and distribution fitting method for small-batch error data based on Anderson-Darling test and cycle estimation, which comprises the steps of constructing Anderson-Darling test statistics under different continuous distributions (normal distribution, truncated normal distribution, gamma distribution and t distribution), determining the statistical distribution type of the error data according to the p values of the test statistics under different distributions, randomly dividing a data set into a history set and an observation set based on the distribution type by adopting cycle estimation, selecting a correction mode with the highest total distribution fitting p value for the data in the observation set, and cyclically selecting different data as the observation set until each data is optimally corrected or the p values are converged, thereby completing the automatic correction of the data.
In order to realize the invention, the technical scheme is as follows:
an automatic correction and distribution fitting method for small batch error data,
the method specifically comprises the following steps:
step 1: reading annual error data of the same characteristics of a small batch of production products from the record table;
and 2, step: removing abnormal data from error data to obtain an initial data set D = { x = i ,i=1,…,n};
And step 3: building Anderson-Darling test statistics under four continuous distributions of normal distribution, truncated normal distribution, gamma distribution and t distribution;
and 4, step 4: quantity A according to Anderson-Darling test 2 The method comprises the steps of constructing p values of Anderson-Darling test statistics under four continuous distributions of normal distribution, truncated normal distribution, gamma distribution and t distribution, determining the statistical distribution type of error data by comparing the p values, wherein the higher the p values are, the higher the distribution fitting goodness is, namely the determined distribution type is j * =max j=1,2,3,4 p j
And 5: based on the obtained distribution type j * Using pairs of cyclic estimatesEach data is automatically corrected; a compensation value delta is preset, and a data set D is randomly disturbed and divided into a history set D 1 And observation set D 2 (ii) a Specifying a data correction strategy, and carrying out continuous iteration to obtain final correction data;
and 6: setting different compensation values delta and repeating the step 5 to find the optimal compensation value; under the compensation value, obtaining an optimally corrected data set D' and solving the distribution j by adopting maximum likelihood estimation * The parameter (c) of (c).
Further, in step 3, in order to measure goodness of fit between the real data distribution and the theoretical distribution, the Anderson-Darling test statistics under four continuous distributions of normal distribution, truncated normal distribution, gamma distribution and t distribution are constructed as follows:
Figure BDA0003762400700000021
in the formula:
Figure BDA0003762400700000022
an Anderson-Darling test statistic representing the jth hypothetical distribution, used to measure the difference between the hypothetical distribution and the true distribution of the data,
Figure BDA0003762400700000023
the smaller the distribution of the true data to the hypothesis, n is the number of samples, F D (x) Is a distribution function of the sample;
normal distribution, truncated normal distribution, gamma distribution, t distribution, these four distributions most closely fitting the distribution type of error data in the field of aeronautical manufacturing, F j (x) Theoretical distribution function for jth hypothetical distribution:
Figure BDA0003762400700000031
Figure BDA0003762400700000032
Figure BDA0003762400700000033
Figure BDA0003762400700000034
in the formula: Γ represents the gamma function, μ, σ, a, b, α, β, v represents the distribution coefficient associated with the distribution.
Further, the specific method of step 4 is as follows:
quantity A was examined according to Anderson-Darl 2 Limit distribution of (2) by
Figure BDA0003762400700000035
Figure BDA0003762400700000036
P-values for the four distributions were constructed as follows:
Figure BDA0003762400700000037
p j p-value of the Anderson-Darling test statistic representing the jth hypothesis distribution, the p-value being a probability value between 0 and 1, the goodness of fit of real data distribution and theoretical distribution can be qualitatively and visually represented, and the larger the p value is, the greater the goodness of fit of real data distribution and theoretical distribution is
Figure BDA0003762400700000038
The smaller the distribution goodness of fit, the higher the distribution, and thus, the statistical distribution type of the error data can be determined by comparing the magnitudes of p values, i.e., the determined distribution type is j * =max j=1,2,3,4 p j
Further, the step 5 is based on the obtained distribution type j * Each data was automatically corrected using a loop estimation.
Still further, the step 5 specifically includes the following steps:
step 501: presetting a compensation value delta, and suggesting that the compensation value is set as an integral multiple of the data recording precision;
step 502: the data set D after the r-1 cycle correction r-1 Randomly disorganizing, and dividing the information into history sets according to the proportion of 8
Figure BDA0003762400700000039
And observation set
Figure BDA00037624007000000310
Step 503: there are three data correction strategies: subtracting the compensation value
Figure BDA00037624007000000311
Is kept unchanged
Figure BDA00037624007000000312
And adding the compensation value
Figure BDA0003762400700000041
For each data in the observation set, merging the data with the history set into a new data set, and calculating the distribution type j * The p values of the next three correction strategies are respectively recorded as
Figure BDA0003762400700000042
Selecting the correction mode with the highest p value to correct x, e.g. if
Figure BDA0003762400700000043
Then
Figure BDA0003762400700000044
Repeating the steps on all other data in the observation set to finally obtain the corrected observation set
Figure BDA0003762400700000045
Step 504: after recording the r cycle correctionIs a data set of
Figure BDA0003762400700000046
And p value thereof is p r
Step 505: comparison of p r And p r-1 If the difference is negligible (p) r -p r-1 <0.001 ) then the correction is ended; otherwise let r = r +1 and repeat steps 502-504.
Compared with the prior art, the invention has the following advantages:
the method can automatically search the optimal correction value, is suitable for various small-batch production processes in the field of aviation manufacturing, can quickly and effectively make up for data deviation caused by improper measurement mode or replacement of operators, promotes the data set to be more similar to the real distribution of data, and simultaneously provides a basis for the subsequent statistical process control and control chart construction.
Drawings
Fig. 1 is a block diagram of an automatic calibration process.
Fig. 2 is a fit of the raw data under four distributions.
Fig. 3 is a graph of the p-value of the gamma distribution of the corrected data set at different compensation values.
Fig. 4 shows the gamma distribution fit of the corrected data set at the optimal compensation value (δ = 0.006).
Fig. 5 is a maximum likelihood estimation of gamma distribution parameters.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are for explaining the present invention and not for limiting the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The following description of the embodiments of the present invention is made with reference to the accompanying drawings and examples, and the present invention is not limited to the embodiments.
Example 1
As shown in fig. 1, a method for automatic correction and distribution fitting of small batches of error data,
the method specifically comprises the following steps:
step 1: reading annual error data of the same characteristics of a small batch of production products from the record table;
and 2, step: removing abnormal data by using prior information of the production process, removing data (corresponding to unqualified products) which do not meet the technical requirements of product characteristics, and obtaining an initial data set D = { x = i ,i=1,…,n};
And step 3: from the basic knowledge of statistics, many random variables are subject to normal distribution, such as measurement error, product weight, person height, etc. So the error data is generally defaulted to normal distribution in the production field, and the data is analyzed in the background of normal distribution. However, due to improper measurement mode, replacement of operators and other factors, the recorded value of error data is often deviated from the true value, and the presented data does not necessarily follow the normal distribution. Through actual verification of field data, error data most possibly obeys one of four distributions, namely normal distribution, truncated normal distribution, gamma distribution and t distribution. In order to more accurately determine the actual distribution of the data, anderson-Darling test statistics under four continuous distributions of normal distribution, truncated normal distribution, gamma distribution and t distribution are constructed;
and 4, step 4: quantity A according to Anderson-Darling test 2 The method comprises the steps of constructing p values of Anderson-Darling test statistics under four continuous distributions of normal distribution, truncated normal distribution, gamma distribution and t distribution, determining the statistical distribution type of error data by comparing the p values, wherein the higher the p value is, the higher the distribution fitting goodness is, namely the determined distribution type is j * =max j=1,2,3,4 p j
And 5: based on the obtained distribution type j * Automatically correcting each datum by adopting cycle estimation; a compensation value delta is given in advance, and a data set D is randomly disturbed and divided into a history set D 1 And observation set D 2 (ii) a Specifying a data correction strategy, and carrying out continuous iteration to obtain final correction data;
and 6: setting different compensation values delta and repeating the step 5 to find the optimal compensation value; under the compensation value, obtaining an optimally corrected data set D' and solving a distribution j by adopting maximum likelihood estimation * The parameter (c) of (c).
Further, in the step 3, in order to measure the goodness of fit of the real data distribution and the theoretical distribution, the Anderson-Darling test statistic under four continuous distributions, namely normal distribution, truncated normal distribution, gamma distribution and t distribution, is constructed as follows:
Figure BDA0003762400700000051
in the formula:
Figure BDA0003762400700000052
Anderson-Darling test statistics representing the jth hypothetical distribution, used to measure the difference between the hypothetical distribution and the true distribution of the data,
Figure BDA0003762400700000053
smaller indicates a closer distribution of the true data to the hypothesis, n is the number of samples, F D (x) A distribution function for the sample;
normal distribution, truncated normal distribution, gamma distribution, t distribution, these four distributions most closely fitting the distribution type of error data in the field of aeronautical manufacturing, F j (x) Theoretical distribution function for jth hypothetical distribution:
Figure BDA0003762400700000061
Figure BDA0003762400700000062
Figure BDA0003762400700000063
Figure BDA0003762400700000064
in the formula: Γ represents the gamma function, μ, σ, a, b, α, β, v represents the distribution coefficients associated with the distribution.
Further, the specific method of step 4 is as follows:
quantity A was examined according to Anderson-Darl 2 Limit distribution of (2) by
Figure BDA0003762400700000065
Figure BDA0003762400700000066
P-values for the four distributions were constructed as follows:
Figure BDA0003762400700000067
p j p-value of the Anderson-Darling test statistic representing the jth hypothesis distribution, the p-value being a probability value between 0 and 1, the goodness of fit of real data distribution and theoretical distribution can be qualitatively and visually represented, and the larger the p value is, the greater the goodness of fit of real data distribution and theoretical distribution is
Figure BDA0003762400700000068
The smaller the distribution, the higher the goodness of fit of the distribution, and thus, the statistical distribution type of the error data can be determined by comparing the magnitudes of the p-values, i.e., the determined distribution type is j * =max j=1,2,3,4 p j
Further, the step 5 is based on the obtained distribution type j * Each data is automatically corrected using a loop estimate. The method can quickly and effectively make up the deviation caused by improper measurement mode or replacement of operators, and promote the data set to be more similar to the real distribution of the data. And the method is suitable for the field of aeronautical manufacturingThe various small-batch production processes can accurately automatically correct the small-batch error data, and the existing method is more suitable for the situation of large samples.
Still further, the step 5 specifically includes the following steps:
step 501: presetting a compensation value delta, and suggesting that the compensation value is set as an integral multiple of the data recording precision;
step 502: the data set D after the r-1 cycle correction r-1 Randomly scrambling, and dividing the data into history sets according to the proportion of 8
Figure BDA0003762400700000071
And observation set
Figure BDA0003762400700000072
Step 503: there are three data correction strategies: subtracting the compensation value
Figure BDA0003762400700000073
Is kept unchanged
Figure BDA0003762400700000074
And adding the compensation value
Figure BDA0003762400700000075
For each data in the observation set, merging the data with the history set into a new data set, and calculating the distribution type j * The p values of the next three correction strategies are respectively recorded as
Figure BDA0003762400700000076
The correction method with the highest p value is selected to correct x, for example, if
Figure BDA0003762400700000077
Then
Figure BDA0003762400700000078
Repeating the steps on all other data in the observation set to finally obtain the corrected observation set
Figure BDA0003762400700000079
Step 504: note the data set after the r cycle correction as
Figure BDA00037624007000000710
And p value thereof is p r
Step 505: comparison of p r And p r-1 If the difference is negligible (p) r -p r-1 <0.001 ) then the correction is ended; otherwise let r = r +1 and repeat steps 502-504.
Example 2
An automatic correction and distribution fitting method for small-batch error data based on Anderson-Darling test and cycle estimation is based on error data distribution types (normal distribution, truncated normal distribution, gamma distribution and t distribution) in a small-batch production process which is most suitable for the field of aviation manufacturing, the Anderson-Darling test statistics under four distributions are constructed, and the distribution type of the data is determined through the fitting goodness test. The automatic correction is carried out on each data by adopting the cycle estimation, so that the data deviation caused by improper measurement mode or replacement of operators can be quickly and effectively compensated, the data set is more similar to the real distribution of the data, and meanwhile, a foundation is provided for the subsequent statistical process control and the construction of a control chart.
The Anderson-Darling test statistic under the error data distribution type (normal distribution, truncated normal distribution, gamma distribution, t distribution) which is most suitable for the small-batch production process in the field of aviation manufacturing is constructed. Through actual verification of field data, error data most possibly obeys one of four distributions, namely normal distribution, truncated normal distribution, gamma distribution and t distribution, but the default error data obeys the normal distribution and is analyzed according to statistical basic knowledge blindly. The reason is that the recorded value of error data is often deviated from the true value due to factors such as improper measurement mode and replacement of operators on a production field, and the presented data is not always in accordance with normal distribution.
The method is suitable for the small-batch production process in the field of aviation manufacturing, and the true value of the data is found through a data correction method of circular estimation, so that the statistical distribution characteristics of the data are accurately obtained. At present, most documents can remove abnormal data based on a 4-quantile method, namely, data exceeding the upper and lower 4 quantiles are removed from sample data, and the method is suitable for the condition of large samples. For various small batch processes in the field of aviation manufacturing, the further reduction of the sample size is unfavorable for the fitting of the distribution, and a more reasonable method is to find the true value of the data by a data correction method so as to accurately obtain the statistical distribution characteristics of the data.
Each data was automatically corrected using a loop estimation. A compensation value delta is given in advance, and a data set D is randomly disturbed and divided into a history set D 1 And observation set D 2 . Three correction strategies of the data are specified, and continuous iteration is carried out to obtain the final correction data. The method can quickly and effectively make up the deviation caused by improper measurement mode or replacement of operators, and promote the data set to be more similar to the real distribution of the data.
A flow framework of the method for automatically correcting and fitting the distribution of the small batch of error data is shown in figure 1, and the method comprises the following steps:
step 1: reading annual error data of the same characteristics of a small-batch production product from the record table;
and 2, step: removing abnormal data from the error data by using prior information of the production process to obtain an initial data set D = { x = { (x) } i I =1, \ 8230;, n }. For example, in order to avoid scrapping parts, certain machining processes only allow surplus of the size, namely, error data is required to be positive, and at the moment, a very small number of data recorded as negative can be deleted;
and 3, step 3: the Anderson-Darling test statistics under four continuous distributions of normal distribution, truncated normal distribution, gamma distribution and t distribution are constructed as follows:
Figure BDA0003762400700000081
in the formula:
Figure BDA0003762400700000082
Anderson-Darling test statistic representing the jth hypothesis distribution, which measures the difference between the hypothesis distribution and the true data distribution, n is the number of samples, F D (x) As a distribution function of the sample, F j (x) The theoretical distribution function for the jth hypothetical distribution is as follows:
Figure BDA0003762400700000083
Figure BDA0003762400700000084
Figure BDA0003762400700000085
Figure BDA0003762400700000091
in the formula: Γ represents the gamma function, μ, σ, a, b, α, β, v represents the distribution coefficient associated with the distribution.
And 4, step 4: quantity A according to Anderson-Darling test 2 Limit distribution of (2) by
Figure BDA0003762400700000092
Figure BDA0003762400700000093
The p-values for the four distributions were calculated as follows:
Figure BDA0003762400700000094
the higher the p-value is, the higher the distribution goodness of fit is, and the statistical distribution type of the error data can be determined by comparing the p-value;
and 5: based on the distribution type, each data is automatically corrected using a loop estimation. The method can quickly and effectively make up the deviation caused by improper measurement mode or replacement of operators, and promote the data set to be more similar to the real distribution of the data.
The step 5 comprises the following steps:
step 501: a compensation value δ is given in advance, and it is recommended that the compensation value be set to an integer multiple of the data recording accuracy.
Step 502: the data set D after the r-1 cycle correction r-1 Randomly scrambling, and dividing the data into history sets according to the proportion of 8
Figure BDA0003762400700000095
And observation set
Figure BDA0003762400700000096
Step 503: there are three data correction strategies: subtracting the compensation value
Figure BDA0003762400700000097
Is kept unchanged
Figure BDA0003762400700000098
And adding the compensation value
Figure BDA0003762400700000099
For each data in the observation set, merging the data with the history set into a new data set, and calculating the distribution type j * The p values of the next three correction strategies are respectively recorded as
Figure BDA00037624007000000910
The correction method with the highest p value is selected to correct x, for example, if
Figure BDA00037624007000000911
Then
Figure BDA00037624007000000912
Repeating the steps on all other data in the observation set to finally obtain the corrected observation set
Figure BDA00037624007000000913
Step 504: recording the data set corrected by the r cycle as
Figure BDA00037624007000000914
And p value thereof is p r
Step 505: comparison of p r And p r-1 If the difference is negligible (p) r -p r-1 <0.001 ) then the correction is ended; otherwise let r = r +1 and repeat steps 502-504.
And 6: and (5) setting different compensation values delta and repeating the step to find the optimal compensation value. Under the compensation value, obtaining an optimally corrected data set D' and solving distribution parameters by adopting maximum likelihood estimation.
Analysis by calculation example:
actual measurement error data of quality excircle features machined by a numerical control lathe of an aviation enterprise in the city of Sichuan province are selected, and the number of small-batch data sets is 60. Because the error of the excircle cannot be negative, two negative numbers are removed first, and a removed data set D = { x = is obtained i I =1, \8230;, 58}, as shown in table 1 below.
TABLE 1 original data set D
-0.010 -0.004 0 0 0 0 0 0 0.001 0.001
0.001 0.001 0.001 0.002 0.002 0.002 0.002 0.002 0.002 0.002
0.002 0.003 0.003 0.003 0.003 0.004 0.004 0.004 0.004 0.004
0.005 0.005 0.005 0.005 0.005 0.005 0.006 0.007 0.007 0.009
0.010 0.010 0.010 0.010 0.010 0.010 0.011 0.012 0.012 0.013
0.015 0.015 0.015 0.015 0.016 0.020 0.020 0.020 0.020 0.030
Performing distribution fitting on D by adopting four continuous distributions of normal distribution, truncated normal distribution, gamma distribution and t distribution, wherein the result is shown in FIG. 2, and respectively constructing Anderson-Darling test statistics under the four distributions and calculating p values thereof, as shown in the following Table 2, wherein p corresponding to gamma distribution is p 3 =0.326 max, so the data set is considered to be uniformFrom the gamma distribution.
TABLE 2P-values of Anderson-Darling test statistics under four distributions
Figure BDA0003762400700000101
Based on the gamma distribution, the compensation value δ =0.0001a, a =1, \8230;, 10 is set in accordance with a multiple of the precision of 0.0001 of the error data. For each compensation value, the data is automatically corrected using a round-robin estimation. The data set is randomly divided into a history set and an observation set, a correction mode with the highest p value of overall distribution fitting is selected for the data in the observation set, different data are selected as the observation set in a circulating mode until each data is optimally corrected or the p value is converged, and automatic correction of the data is completed. The p-value of the gamma distribution of the corrected data set at different compensation values is calculated, as shown in fig. 3, the p-value is stable when the compensation value is between 0.002 and 0.006, and the p-value is highest when the compensation value is δ =0.006, so the optimal compensation value is 0.006. Based on the compensation value, a corrected data set D' is obtained as shown in table 3 below, and statistics is performed, wherein a compensation value adding strategy is performed on 4 samples in total, a constant maintaining strategy is performed on 20 samples in total, and a compensation value subtracting strategy is performed on 24 samples in total.
Corrected data set D 'at table 3 δ = 0.006'
Deletion of Deleting 0 0 0 0.001 0.001 0.001 0.002 0.002
0.002 0.002 0.003 0.003 0.003 0.004 0.004 0.004 0.004 0.005
0.005 0.005 0.005 0.005 0.005 0.006 0.006 0.006 0.006 0.007
0.007 0.007 0.007 0.007 0.008 0.009 0.009 0.010 0.010 0.010
0.010 0.010 0.010 0.010 0.011 0.012 0.013 0.015 0.015 0.015
0.015 0.015 0.015 0.017 0.017 0.020 0.020 0.025 0.025 0.035
The corrected data set D' was fitted with a gamma distribution, and the result is shown in fig. 4, where the p-value increased from the original 0.326 to 0.9133. Finally, the Gamma distribution parameters are estimated by the maximum likelihood estimation method, and the result is as shown in fig. 5, i.e., D ' = { x ' | x ' -Gamma (1.257, 0.0063) }, and the statistical distribution can be used for subsequent statistical process control, quality monitoring, and the like.

Claims (5)

1. The automatic correction and distribution fitting method of the small-batch error data is characterized by comprising the following steps of:
the method specifically comprises the following steps:
step 1: reading annual error data of the same characteristics of a small batch of production products from the record table;
step 2: removing abnormal data from error data to obtain an initial data set D = { x = i ,i=1,…,n};
And step 3: building Anderson-Darling test statistics under four continuous distributions of normal distribution, truncated normal distribution, gamma distribution and t distribution;
and 4, step 4: quantity A according to Anderson-Darling test 2 The method comprises the steps of constructing p values of Anderson-Darling test statistics under four continuous distributions of normal distribution, truncated normal distribution, gamma distribution and t distribution, determining the statistical distribution type of error data by comparing the p values, wherein the higher the p value is, the higher the distribution fitting goodness is, namely the determined distribution type is j * =max j=1,2,3,4 p j
And 5: based on the resulting distribution type j * Automatically correcting each datum by adopting cycle estimation; a compensation value delta is preset, and a data set D is randomly disturbed and divided into a history set D 1 And observation set D 2 (ii) a Specifying a data correction strategy, and carrying out continuous iteration to obtain final correction data;
step 6: setting different compensation values delta and repeating the step 5 to find the optimal compensation value; under the compensation value, obtaining an optimally corrected data set D' and solving the distribution j by adopting maximum likelihood estimation * The parameter (c) of (c).
2. The method of claim 1, wherein the method further comprises: in the step 3, in order to measure the goodness of fit of the real data distribution and the theoretical distribution, the Anderson-Darling test statistic under four continuous distributions of normal distribution, truncated normal distribution, gamma distribution and t distribution is constructed as follows:
Figure FDA0003762400690000011
in the formula:
Figure FDA0003762400690000012
an Anderson-Darling test statistic representing the jth hypothetical distribution, used to measure the difference between the hypothetical distribution and the true distribution of the data,
Figure FDA0003762400690000013
the smaller the distribution of the true data to the hypothesis, n is the number of samples, F D (x) A distribution function for the sample;
normal distribution, truncated normal distribution, gamma distribution, t distribution, these four distributions most closely fitting the distribution type of error data in the field of aeronautical manufacturing, F j (x) Theoretical distribution function for jth hypothetical distribution:
Figure FDA0003762400690000021
Figure FDA0003762400690000022
Figure FDA0003762400690000023
Figure FDA0003762400690000024
in the formula: Γ represents the gamma function, μ, σ, a, b,; and β, v represent distribution coefficients associated with the distribution.
3. The method for automatically correcting and fitting a distribution of error data in a small batch of data as recited in claim 2, wherein: the specific method of the step 4 is as follows:
quantity A according to Anderson-Darling test 2 Limit distribution of (2) by
Figure FDA0003762400690000025
j =1,2,3,4 the p-values for the four distributions were constructed as follows:
Figure FDA0003762400690000026
p j p-value of the Anderson-Darling test statistic representing the jth hypothesis distribution, the p-value being a probability value between 0 and 1, the goodness of fit of real data distribution and theoretical distribution can be qualitatively and visually represented, and the larger the p value is, the greater the goodness of fit of real data distribution and theoretical distribution is
Figure FDA0003762400690000027
The smaller the distribution goodness of fit, the higher the distribution, and thus, the statistical distribution type of the error data can be determined by comparing the magnitudes of p values, i.e., the determined distribution type is j * =max j=1,2,3,4 p j
4. The method for automatically correcting and fitting a distribution of error data in a small batch of data according to claim 3, wherein: said step 5 is based on the obtained distribution type j * Each data is automatically corrected using a loop estimate.
5. The method of claim 4, wherein the method further comprises: the step 5 specifically comprises the following steps:
step 501: presetting a compensation value delta, and suggesting that the compensation value is set as an integral multiple of the data recording precision;
step 502: the data set D after the r-1 cycle correction r-1 Randomly disorganized, and divided into 8As a set of histories
Figure FDA0003762400690000028
And observation set
Figure FDA0003762400690000029
Step 503: there are three data correction strategies: subtracting the compensation value
Figure FDA00037624006900000210
Is kept unchanged
Figure FDA00037624006900000211
And adding the compensation value
Figure FDA0003762400690000031
For each data in the observation set, merging the data with the history set into a new data set, and calculating the distribution type j * The p values of the next three correction strategies are respectively recorded as
Figure FDA0003762400690000032
Selecting the correction mode with the highest p value to correct x, repeating the step on all other data in the observation set, and finally obtaining the corrected observation set
Figure FDA0003762400690000033
Step 504: note the data set after the r cycle correction as
Figure FDA0003762400690000034
And p value thereof is p r
Step 505: comparison of p r And p r-1 If the difference is ignored (p) r -p r-1 <0.001 ) then the correction is ended; otherwise let r = r +1 and repeat steps 502-504.
CN202210876577.3A 2022-07-25 2022-07-25 Automatic correction and distribution fitting method for small-batch error data Pending CN115455359A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210876577.3A CN115455359A (en) 2022-07-25 2022-07-25 Automatic correction and distribution fitting method for small-batch error data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210876577.3A CN115455359A (en) 2022-07-25 2022-07-25 Automatic correction and distribution fitting method for small-batch error data

Publications (1)

Publication Number Publication Date
CN115455359A true CN115455359A (en) 2022-12-09

Family

ID=84297297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210876577.3A Pending CN115455359A (en) 2022-07-25 2022-07-25 Automatic correction and distribution fitting method for small-batch error data

Country Status (1)

Country Link
CN (1) CN115455359A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115230191A (en) * 2022-07-25 2022-10-25 成都飞机工业(集团)有限责任公司 Forming method of stealth box section part
CN116225623A (en) * 2023-05-04 2023-06-06 北京庚顿数据科技有限公司 Virtual data generating method and virtual data generator

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115230191A (en) * 2022-07-25 2022-10-25 成都飞机工业(集团)有限责任公司 Forming method of stealth box section part
CN116225623A (en) * 2023-05-04 2023-06-06 北京庚顿数据科技有限公司 Virtual data generating method and virtual data generator

Similar Documents

Publication Publication Date Title
CN115455359A (en) Automatic correction and distribution fitting method for small-batch error data
CN101710235B (en) Method for automatically identifying and monitoring on-line machined workpieces of numerical control machine tool
CN101863088B (en) Method for forecasting Mooney viscosity in rubber mixing process
CN112200327B (en) MES equipment maintenance early warning method and system
CN113051683A (en) Method, system, equipment and storage medium for predicting service life of numerical control machine tool cutter
CN116468160A (en) Aluminum alloy die casting quality prediction method based on production big data
CN111709181B (en) Method for predicting fault of polyester filament yarn industrial production process based on principal component analysis
CN116050644A (en) Method for predicting dam deformation extremum based on gray model
CN116748352B (en) Metal pipe bending machine processing parameter monitoring control method, system and storage medium
CN114926075B (en) Machine part production scheduling method based on man-hour prediction
CN116339262A (en) Numerical control processing production quality monitoring system based on artificial intelligence
CN111273624B (en) Transient performance prediction method for flexible discrete manufacturing system with special buffer zone
RU2295590C1 (en) Method of the statistical control over the quality of the electrode products
CN111174824B (en) Control platform that acid mist discharged
CN113919204A (en) Comprehensive importance analysis method for availability of multi-state manufacturing system
CN112329229A (en) Milling parameter optimization method suitable for surface machining of thin-walled workpiece
CN115213735B (en) System and method for monitoring cutter state in milling process
CN115509196B (en) Manufacturing process optimization method and device based on machine learning
CN113673056B (en) Method for determining cold test parameter limit value of engine
CN114967592B (en) Self-adaptive selection method for machine tool thermal error temperature sensitive point
Bendre et al. Research study of process capability
CN117151290A (en) Motor rotor machining quality prediction method based on mass transfer network
CN116805037A (en) Energy consumption prediction method and system based on data analysis
CN114757412A (en) Bad data identification method based on cluster analysis coupled neural network prediction
CN114239295A (en) Failure assessment method and system for biological material preparation process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination