CN116049157A - Quality data analysis method and system - Google Patents
Quality data analysis method and system Download PDFInfo
- Publication number
- CN116049157A CN116049157A CN202310007166.5A CN202310007166A CN116049157A CN 116049157 A CN116049157 A CN 116049157A CN 202310007166 A CN202310007166 A CN 202310007166A CN 116049157 A CN116049157 A CN 116049157A
- Authority
- CN
- China
- Prior art keywords
- data
- analyzed
- ppm
- quality data
- indexes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 86
- 238000007405 data analysis Methods 0.000 title claims abstract description 33
- 238000007689 inspection Methods 0.000 claims abstract description 43
- 238000004519 manufacturing process Methods 0.000 claims abstract description 27
- 230000002159 abnormal effect Effects 0.000 claims abstract description 24
- 238000004458 analytical method Methods 0.000 claims abstract description 24
- 238000001514 detection method Methods 0.000 claims abstract description 18
- 238000007619 statistical method Methods 0.000 claims abstract description 14
- 238000012360 testing method Methods 0.000 claims description 57
- 230000008569 process Effects 0.000 claims description 29
- 230000007547 defect Effects 0.000 claims description 28
- 238000009826 distribution Methods 0.000 claims description 21
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 5
- 239000000203 mixture Substances 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 4
- 230000006872 improvement Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000007726 management method Methods 0.000 description 7
- 230000005856 abnormality Effects 0.000 description 6
- 238000011161 development Methods 0.000 description 4
- 206010063385 Intellectualisation Diseases 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000010219 correlation analysis Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- 238000012502 risk assessment Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000002759 z-score normalization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Abstract
The invention relates to a quality data analysis method and a quality data analysis system, belongs to the technical field of data analysis, and solves the problems that in the prior art, quality analysis is inaccurate due to redundancy of check characteristic indexes and abnormal data. Comprising the following steps: acquiring quality data and check characteristic indexes in the production process, and removing redundant check characteristic indexes according to the quality data and the correlation coefficient to obtain the check characteristic indexes to be analyzed; removing abnormal data of quality data of each to-be-analyzed inspection characteristic index according to the statistical analysis and the variation self-encoder to obtain to-be-analyzed data; according to the detection characteristic index to be analyzed and the data to be analyzed, calculating each PPM value in the production process, comparing the PPM value with the corresponding PPM threshold range, and carrying out data envelope analysis on the data to be analyzed which is not in the PPM threshold range. Accurate quality data analysis is realized.
Description
Technical Field
The present invention relates to the field of data analysis technologies, and in particular, to a quality data analysis method and system.
Background
With the continuous improvement of the informatization level, weaponry generates huge and complex quality data in the development and production processes of design, process, inspection, production, test, use and the like, wherein the data consists of structured data existing in a business system, semi-structured data existing in a detection tool and unstructured data taking paper files or electronic files and the like as carriers. The quality management business has close coupling with the processes of product design, production, management and the like, the equipment quality problem and the occurrence of quality data inspection and the transmission process have high discreteness and heterogeneity, the quality management business has more data association with different information systems, and the data have a large number of conditions of redundancy, deletion, abnormality and the like, and useful information in the data cannot be effectively mined due to the existence of noise signals. In addition, the quality data is scattered, which makes collection and sharing of quality data resources difficult.
The quality data is mostly distributed in the computers of the individual managers, developers, technicians or in the production test equipment. Acquisition and sharing of quality data resources is difficult to achieve. The problems of low data utilization rate, insufficient quantitative analysis and the like caused by the difficulty in fusion and sharing of data resources and the lack of means of data acquisition and analysis tools are solved, and accurate data analysis and statistics are lacked as the basis. The data are urgently needed to be mined and processed, rules behind the data are found out, quality conditions of equipment in the development process are mastered, and therefore quality management, design and technical personnel are assisted to make scientific decisions, and comprehensive on-line quality control is supported.
A large amount of test data is formed in the equipment development process, but quantitative analysis and problem mining application of test data resources in the quality of the acquired equipment are lacking at present. Meanwhile, indexes covered by various characteristics related to quality analysis of equipment are quite different, and all index parameters are mutually interacted, so that analysis is difficult. It is necessary to ensure that the number of indicators sufficient to express the target analysis characteristics is selected while keeping these parameters from affecting each other, so that accurate quality analysis is achieved. By finding out and analyzing the test problems and abnormal data, the aim of pre-predicting and early warning equipment development and supporting and improving the quality weak points is achieved, and the method is a key problem to be solved for realizing the refinement and the intellectualization of quality management control.
Disclosure of Invention
In view of the above analysis, the present invention aims to provide a quality data analysis method and system, which are used for solving the problems of inaccurate quality analysis caused by redundancy of check characteristic indexes and abnormal data.
In one aspect, an embodiment of the present invention provides a quality data analysis method, including the steps of:
acquiring quality data and check characteristic indexes in the production process, and removing redundant check characteristic indexes according to the quality data and the correlation coefficient to obtain the check characteristic indexes to be analyzed;
removing abnormal data of quality data of each to-be-analyzed inspection characteristic index according to the statistical analysis and the variation self-encoder to obtain to-be-analyzed data;
according to the detection characteristic index to be analyzed and the data to be analyzed, calculating each PPM value in the production process, comparing the PPM value with the corresponding PPM threshold range, and carrying out data envelope analysis on the data to be analyzed which is not in the PPM threshold range.
Based on the further improvement of the method, before comparing with the corresponding PPM threshold range, the method further comprises: if the data quantity of the data to be analyzed is smaller than or equal to the quantity threshold value, acquiring a fluctuation threshold value by constructing confidence coefficient of t distribution, evaluating whether the difference between each PPM value and an ideal PPM value is smaller than the fluctuation threshold value, and if so, retaining the data to be analyzed for calculating the PPM value.
Based on the further improvement of the method, according to the correlation coefficient matrix, removing redundant test characteristic indexes to obtain test characteristic indexes to be analyzed, wherein the method comprises the following steps: dividing all N detection characteristic indexes into a plurality of paired combinations by a traversing and recursion method, wherein a first group of the paired combinations has i indexes, and a second group of the paired combinations is the rest N-i indexes; taking any one of the indexes of the second group with the smallest quantity as the inspection characteristic indexes to be analyzed from the paired combinations meeting the following conditions: the correlation coefficient between each index in the first group and all indexes in the second group is larger than a correlation threshold value and is taken as a basic condition, and the basic condition is not met after any index is taken out from the second group and added into the first group.
Based on a further improvement of the method, the correlation coefficient between each index in the first group and all indexes in the second group is obtained by obtaining the linear combination of quality data corresponding to the two groups of indexes and maximizing the pearson correlation coefficient of the two groups of linear combinations.
Based on the further improvement of the method, according to the statistical analysis and the variation self-encoder, removing the abnormal data of the quality data of each inspection characteristic index to be analyzed to obtain the data to be analyzed, including:
based on statistical analysis, removing the quality data which is larger than an abnormal threshold value as abnormal data after the quality data of each to-be-analyzed inspection characteristic index is subjected to z-score standardization processing;
the quality data after the standardized treatment of each to-be-analyzed test characteristic index is respectively transmitted into a trained variation self-encoder, the obtained output and the input are subjected to difference comparison, and the quality data with the difference value larger than a difference threshold value is taken as abnormal data to be removed;
the remaining quality data is used as the data to be analyzed.
Based on a further improvement of the method, the loss function of the variable self-encoder comprises a reconstruction term and a KL divergence regularization term, and a weight parameter is added before the KL divergence regularization term for reducing the weight of the KL divergence regularization term.
Based on further improvement of the method, according to the inspection characteristic index to be analyzed and the data to be analyzed, calculating each PPM value in the production process comprises the following steps:
collecting the defect number and the severity coefficient of each procedure, and obtaining the total number of defects of each procedure; acquiring the quantity of the data to be analyzed corresponding to the inspection characteristic indexes of each process as the total number of the inspection characteristics of each process according to the inspection characteristic indexes to be analyzed and the data to be analyzed; obtaining PPM values of all the procedures according to the total number of the defects of all the procedures and the total number of corresponding procedure checking characteristics;
according to the total number of process defects and the total number of process checking characteristics of the processes related to each product in the generation process, summarizing to obtain the total number of defects of each product and the total number of the checking characteristics of each product; obtaining PPM values of all products according to the total number of the defects of all the products and the total number of corresponding product inspection characteristics;
according to the total number of product defects and the total number of product inspection characteristics of products to which each model number belongs, the total number of the defects of each model number and the total number of inspection characteristics of each model number are obtained, and according to the total number of the defects of each model number and the total number of the inspection characteristics of the corresponding model number, the PPM value of each model is obtained.
Based on a further improvement of the method, performing data envelope analysis on the data to be analyzed which is not in the PPM threshold value range comprises: taking the data to be analyzed which is not in the PPM threshold value range as sample data; acquiring successful data, calculating a confidence interval of the successful data, and indicating whether the sample data is in the confidence interval according to the range of the confidence interval; acquiring an envelope upper limit and an envelope lower limit of successful data according to preset confidence coefficient, wherein the envelope upper limit and the envelope lower limit are used for representing whether the sample data is enveloped or not; acquiring a qualified upper limit and a qualified lower limit according to a preset tolerance value, wherein the qualified upper limit and the qualified lower limit are used for indicating whether sample data are qualified or not; sample data analysis results are generated based on whether the envelope, whether the envelope is acceptable, and whether the confidence interval is present.
Based on a further improvement of the above method, successful data is obtained, and a confidence interval of the successful data is calculated, including: respectively counting successful data of each index according to the to-be-analyzed test characteristic index corresponding to the sample data, and if the number of the successful data is greater than a number threshold, constructing a confidence interval through a Gaussian mixture density function (GMM) algorithm; otherwise, a confidence interval is constructed through t distribution.
In another aspect, an embodiment of the present invention provides a quality data analysis system, including: the test characteristic index acquisition module is used for acquiring quality data and test characteristic indexes in the production process, and removing redundant test characteristic indexes according to the quality data and the correlation coefficient to obtain test characteristic indexes to be analyzed;
the data acquisition module to be analyzed is used for removing abnormal data of quality data of each inspection characteristic index to be analyzed according to the statistical analysis and the variation self-encoder to obtain data to be analyzed;
and the quality data analysis module is used for calculating each PPM value in the production process according to the to-be-analyzed detection characteristic index and the to-be-analyzed data, comparing the PPM value with the corresponding PPM threshold range respectively, and carrying out data envelope analysis on the to-be-analyzed data which is not in the PPM threshold range.
Compared with the prior art, the invention has at least one of the following beneficial effects: based on the collected equipment quality data, the technology of data correlation analysis, data abnormality analysis, small sample data analysis and the like is used for carrying out redundancy detection and rejection on the test quality data, evaluating the confidence of the test data under the condition of the small sample, intelligently analyzing whether the product data falls within an envelope range, discovering quality hidden danger or weak links existing in the production process in advance, and realizing the refinement and intellectualization of quality management control.
In the invention, the technical schemes can be mutually combined to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.
Fig. 1 is a flow chart of a quality data analysis method in embodiment 1 of the present invention.
Detailed Description
Preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and together with the description serve to explain the principles of the invention, and are not intended to limit the scope of the invention.
Example 1
In one embodiment of the present invention, a method for analyzing quality data is disclosed, as shown in fig. 1, comprising the steps of:
s11: and acquiring quality data and check characteristic indexes in the production process, and removing redundant check characteristic indexes according to the quality data and the correlation coefficient to obtain the check characteristic indexes to be analyzed.
It should be noted that, the quality data in the production process in this embodiment is data affecting quality in the product test data, for example: for inertial navigation gyroscopes, temperature, weight, pressure and gravity data are acquired as mass data.
Quality data is acquired through structured and unstructured data acquisition and processing modes, and preliminary data cleaning and data preprocessing are carried out, and the method comprises the following steps: detecting missing values, filling the missing values of the data items through Newton interpolation, detecting and eliminating outliers based on a data mining method and a state estimation method, and detecting and deleting repeated values according to the similarity.
The test characteristic index refers to the quantized test characteristic index. In general, the pearson bivariate correlation analysis method only focuses on the correlation coefficient between two indexes, and cannot fully mine the inherent association relation of a plurality of test characteristic indexes. Therefore, in the embodiment, the potential correlation between the redundancy check characteristic index and the rest multiple check characteristic indexes is mined, the redundancy check characteristic indexes are gradually screened out, and the evaluation capability of the indexes on the product quality is improved.
Specifically, according to the quality data and the correlation coefficient, removing redundant test characteristic indexes to obtain test characteristic indexes to be analyzed, including: dividing all N detection characteristic indexes into a plurality of paired combinations by a traversing and recursion method, wherein a first group of the paired combinations has i indexes, and a second group of the paired combinations is the rest N-i indexes; taking any one of the indexes of the second group with the smallest quantity as the inspection characteristic indexes to be analyzed from the paired combinations meeting the following conditions: the correlation coefficient between each index in the first group and all indexes in the second group is larger than a correlation threshold value and is taken as a basic condition, and the basic condition is not met after any index is taken out from the second group and added into the first group.
In the analysis of the test characteristic index, i may be traversed from 1 to obtain a pair combination, or may be traversed from any number smaller than N, as long as a pair combination satisfying the above condition is obtained.
It should be noted that, the correlation coefficient between each index in the first set and all indexes in the second set is obtained by obtaining a linear combination of quality data corresponding to the two sets of indexes, so that the pearson correlation coefficient of the two sets of linear combinations is maximized, and the two sets of linear combinations are expressed by the following formula:
wherein w is 1 And w 2 Respectively a linear combination of the first set of quality data and a linear combination of the second set of quality data Σ 12 Is the covariance matrix of the first and second groups, Σ 11 Is the covariance matrix of the first group, Σ 22 Is the covariance matrix of the second set.
Illustratively, taking the production test of the twist needle as an example, calculating the relevance of 7 test characteristic indexes of the needle body length, the fat degree after the needle body is collected is smaller, the coaxiality, the empty needle, the reverse direction and the loose silk according to the quality data of the twist needle, and taking 1,2,3,4,5,6 and 7 as test characteristic index numbers respectively. Setting the correlation threshold to 0.7, that is, setting the correlation coefficient to be greater than 0.7, indicates that the correlation degree is high, and setting the correlation degree as a redundant test characteristic index.
The first group is denoted by B, the second group is denoted by a, and the initial traversal from i=1, is decomposed into: t1= { a= [2,3,4,5,6,7], b= [1] }, t 2= { a= [1,3,4,5,6,7], b= [2] }, t 3= { a= [1,2,4,5,6,7], b= [3] }, t 4= { a= [1,2,3,5,6,7], b= [4] }, t 5= { a= [1,2,3,4,6,7], b= [5] }, t 6= { a= [1,2,3,4,5,7], b= [6] }, t 7= { a= [1,2,3,4,5, 6], b= [7] }, and calculating correlation coefficients of the indexes in B and all the indexes in a respectively, wherein t1, t3, t6 and t6 are not decomposed at a greater than 0, and the two indexes can be further decomposed when the two indexes are not combined, and the two indexes are not decomposed.
The second traversal, taking t1 as an example, can be decomposed into: t1_1= { a= [3,4,5,6,7], b= [1,2] }, t1_2= { a= [2,4,5,6,7], b= [1,3] }, t1_3= { a= [2,3,4,6,7], b= [1,5] }, t1_4= { a= [2,3,4,5,7], b= [1,6] }, and calculating the correlation coefficient of each index in B with respect to all indexes in a, respectively, and if t1_1 and t1_2 meet the condition that both are greater than 0.7, continuing to decompose the a group in t1_1 and t1_2.
For the third pass, for each pairwise combination of t1_1 and t1_2 decompositions, a has 4 indices and B has 3 indices, but the correlation coefficient of each index in B with respect to all indices in a does not satisfy the condition of greater than 0.7, namely: when any index is taken out from the second group and added into the first group, the basic condition is not met any more, and at least 5 effective test characteristic indexes are indicated, and the A group in the t1_1 and the t1_2 can be used as the test characteristic index to be analyzed, and one index is optional.
Compared with the prior art, the method expands the traditional correlation calculation method for two test characteristics, screens test characteristic indexes with high correlation, and obtains the minimum test characteristic index set meeting the correlation condition, so that the detection result of the redundant test characteristic indexes outside the set can be predicted through the detection result of the test characteristic indexes in the set.
S12: and removing abnormal data of the quality data of each to-be-analyzed inspection characteristic index according to the statistical analysis and the variation self-encoder to obtain to-be-analyzed data.
The quality data of this embodiment includes abnormal data caused by local variability and global variability. The local variability is mainly caused by local detection of environmental mutations and can be exploited by numerical comparison of quality data of individual test problems. The overall difference is caused by the overall change of factors such as process detection environment, detection method and the like, and cannot be identified through the numerical value abnormality of quality data of a single test problem. Thus, the present embodiment calculates abnormal quality data having a single trial problem quality data and a difference in overall distribution of the quality data, respectively, by statistical analysis and variation from the encoder.
Specifically, according to statistical analysis and variation, removing abnormal data of quality data of each inspection characteristic index to be analyzed from an encoder to obtain data to be analyzed, including:
based on statistical analysis, removing the quality data of each to-be-analyzed inspection feature as abnormal data after the z-score standardization processing;
respectively transmitting the remaining standardized quality data of each to-be-analyzed test characteristic index into a trained variation self-encoder, performing difference comparison on the obtained output and input, and removing the quality data with the difference value larger than a difference threshold value as abnormal data;
the remaining quality data is used as the data to be analyzed.
The z-score normalization method measures the abnormality of the quality data in a single test characteristic index by a normalized distance from the average value, and the greater the normalized distance, the greater the abnormality of the quality data in the test characteristic.
The quality data distribution is fitted by a variation self-encoder, so that the data distribution characteristics of the test characteristics are learned, and abnormal quality data with overall differences are identified. The variable self-encoder comprises two network structures of an encoder E (x) and a decoder D (z). The encoder E (x) performs feature extraction on the features, maps the checking features into a structural feature space, decodes the feature distribution in the structural feature space, extracts effective features of the quality data through KL divergence regularization by utilizing a variation principle, and outputs the data distribution of the original quality data.
Specifically, the encoder is obtained by approximating the posterior q (z|x, phi), the decoder is obtained by maximum likelihood p (x|z, theta), wherein phi and theta are the parameters of the encoder and decoder, respectively, and a neural network is constructed to learn the parameters of the encoder and decoder. The loss function of the variable self-encoder comprises a reconstruction term and a KL divergence regular term, and a weight parameter is added before the KL divergence regular term for reducing the weight of the KL divergence regular term. Namely: the variation in this embodiment is derived from the encoder by solving the following optimization problem:
wherein alpha is a preset weight parameter, and parameter optimization is performed in the training process; d (D) KL (. Cndot.) represents the Kullback-Leibler divergence.
In the variation self-encoder of the KL divergence regularization of the present embodiment, by fitting the distribution of the quality data from the encoder, the outliers of the data are measured by calculating the distance from the encoder output to the original quality data. For a given quality data, the higher the outlier, the greater the overall variability of that quality data from other quality data, and removed as outlier.
S13: according to the detection characteristic index to be analyzed and the data to be analyzed, calculating each PPM value in the production process, comparing the PPM value with the corresponding PPM threshold range, and carrying out data envelope analysis on the data to be analyzed which is not in the PPM threshold range.
It should be noted that, in the production process of the equipment product, a typical model and a main product are involved, and the present embodiment calculates PPM (Parts Per Million, an abbreviation of parts per million, representing the reject ratio in each million) values from three dimensions of the process, the product and the model. According to the inspection characteristic index to be analyzed and the data to be analyzed, calculating each PPM value in the production process, wherein the method comprises the following steps:
collecting the defect number and the severity coefficient of each procedure, and obtaining the total number of defects of each procedure; acquiring the quantity of the data to be analyzed corresponding to the inspection characteristic indexes of each process as the total number of the inspection characteristics of each process according to the inspection characteristic indexes to be analyzed and the data to be analyzed; according to the total number of defects of each process and the total number of corresponding process checking characteristics, PPM values of each process are obtained and expressed by the following formula:
wherein P is i Indicating the defect number, K, of step i i The severity coefficient of step i, G i Indicating the number of test characteristic indexes to be analyzed in step i, n r The number of data to be analyzed corresponding to the r-th inspection characteristic index in the process i is represented.
According to the total number of process defects and the total number of process checking characteristics of the processes related to each product in the generation process, summarizing to obtain the total number of defects of each product and the total number of the checking characteristics of each product; obtaining PPM values of all products according to the total number of the defects of all the products and the total number of corresponding product inspection characteristics;
according to the total number of product defects and the total number of product inspection characteristics of products to which each model number belongs, the total number of the defects of each model number and the total number of inspection characteristics of each model number are obtained, and according to the total number of the defects of each model number and the total number of the inspection characteristics of the corresponding model number, the PPM value of each model is obtained.
Preferably, when the amount of the data to be analyzed, i.e. the quality data for calculating the PPM, is smaller and cannot meet the million-level data required by the traditional PPM calculation method, the difference between the PPM value calculated by normalizing the quality data to the million-level and the PPM true value calculated by collecting the million-level PPM quality data is required to be evaluated. If the difference is smaller, the reliability of the PPM value calculated by normalizing the quality data to the million-magnitude is higher, and the PPM value under the condition of the million-magnitude quality data can be directly represented. Otherwise, it is explained that the PPM value calculated by normalizing the quality data to the million-magnitude level may have a larger deviation from the PPM value under the condition of the million-magnitude quality data, and further optimization is required.
Thus, prior to calculating the PPM value, compared to the PPM threshold range, further comprising: if the data quantity of the data to be analyzed is smaller than or equal to the quantity threshold value, acquiring a fluctuation threshold value by constructing confidence coefficient of t distribution, evaluating whether the difference between each PPM value and an ideal PPM value is smaller than the fluctuation threshold value, and if so, retaining the data to be analyzed for calculating the PPM value. Namely: and constructing t distribution of the data to be analyzed through the average value and variance of the data to be analyzed, and estimating the difference between the t distribution and the PPM true value calculated through collecting the million-level PPM quality data through the confidence coefficient of the t distribution.
Specifically, the t distribution is a t distribution that approximately satisfies the degree of freedom n-1, and the ideal PPM calculation result based on the million-magnitude mass data is approximately 10 6 μ, μ is the probability distribution average.
The fluctuation interval of the PPM value at the millions of mass data obtained according to the t distribution is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,data mean, S data standard deviation, n data number, a confidence level,/-confidence level>Is a threshold determined based on the confidence level.
Thus, when the difference between the individual PPM values and the ideal PPM value is smaller thanWhen the method is used, the fact that the data to be analyzed are normalized to the calculated PPM values in the millions can effectively evaluate the PPM true values under the condition of acquisition of the quality data in the millions is explained, otherwise, more quality data need to be acquired, and the PPM values of the existing equipment data are further optimized.
When it is determined that the calculated PPM values can be used for evaluation, comparing the PPM values with the corresponding PPM threshold ranges, and performing data envelope analysis on the data to be analyzed which is not within the PPM threshold ranges includes: taking the data to be analyzed which is not in the PPM threshold value range as sample data; acquiring success data, calculating a confidence interval of the success data, and judging whether the sample data is in the confidence interval according to the range of the confidence interval; acquiring an envelope upper limit and an envelope lower limit of successful data according to preset confidence coefficient, wherein the envelope upper limit and the envelope lower limit are used for representing whether the sample data is enveloped or not; acquiring a qualified upper limit and a qualified lower limit according to a preset tolerance value, wherein the qualified upper limit and the qualified lower limit are used for indicating whether sample data are qualified or not; based on whether the envelope, whether it is acceptable, and whether it is in the confidence interval, an analysis result of the sample data is generated.
The success data refers to product data that has been verified as successful or not failed in experiments or histories. Calculating a confidence interval of the successful data according to the successful data, comprising: respectively counting successful data of each index according to the to-be-analyzed test characteristic index corresponding to the sample data, and if the number of the successful data is greater than a number threshold, constructing a confidence interval through a Gaussian mixture density function (GMM) algorithm; otherwise, a confidence interval is constructed through t distribution.
Specifically, a confidence interval is constructed by a Gaussian mixture density function (GMM) algorithm, parameters in the Gaussian density function are estimated by an EM algorithm, and posterior probability distribution of the estimated parameters is calculated according to a Bayesian formula, so that the confidence interval is obtained.
The EM algorithm is mapped to the parameter estimation in the gaussian mixture density function as follows:
wherein mu k 、∑ k And pi k The mean value and the variance of the Gaussian density function corresponding to the quality data of the kth test characteristic index and the proportion of the kth test characteristic index are taken, and n is the number of samples.
The confidence interval is constructed by t distribution, 10 in the formula (4) is as above estimated PPM value 6 The removal is the representation of the confidence interval.
Further, the present embodiment generates an envelope upper limit and an envelope lower limit for the confidence of 99.73% (corresponding to 3σ) of the equipment production data definition. And acquiring a qualified upper limit and a qualified lower limit according to a preset tolerance value, wherein a formed interval is a tolerance zone. When the designed tolerance is used as the upper line and the lower line of the product qualification criterion standard and is overlapped with the envelope line trend, the influence relation of single quality data on the task is completely mastered, and the risk brought by decision is extremely small. However, the situation that the tolerance zone is not coincident with the envelope line often occurs, so the embodiment considers the envelope zone, the tolerance zone and the confidence zone simultaneously, and generates an analysis result for the sample data, thereby facilitating more accurate risk analysis and assessment.
The analysis results include: acceptable and envelope (whether or not in confidence interval), acceptable but not envelope (whether or not in confidence interval), unacceptable but envelope (whether or not in confidence interval), and unacceptable and not envelope (whether or not in confidence interval).
Compared with the prior art, the quality data analysis method and system provided by the embodiment are used for performing redundancy detection and rejection on test quality data and evaluating the confidence level of the test data under the condition of a small sample by using technologies such as data correlation analysis, data abnormality analysis and small sample data analysis based on the collected equipment quality data, intelligently analyzing whether the product data falls within an envelope range, discovering quality hidden danger or weak links existing in the production process in advance, and realizing refinement and intellectualization of quality management control.
Example 2
In another embodiment of the present invention, a mass data analysis system is disclosed to implement the mass data analysis method of embodiment 1. The specific implementation of each module is described with reference to the corresponding description in embodiment 1. The system comprises:
the test characteristic index acquisition module is used for acquiring quality data and test characteristic indexes in the production process, and removing redundant test characteristic indexes according to the quality data and the correlation coefficient to obtain test characteristic indexes to be analyzed;
the data acquisition module to be analyzed is used for removing abnormal data of quality data of each inspection characteristic index to be analyzed according to the statistical analysis and the variation self-encoder to obtain data to be analyzed;
and the quality data analysis module is used for calculating each PPM value in the production process according to the to-be-analyzed detection characteristic index and the to-be-analyzed data, comparing the PPM value with the corresponding PPM threshold range respectively, and carrying out data envelope analysis on the to-be-analyzed data which is not in the PPM threshold range.
Since the relevant parts of the mass data analysis system and the mass data analysis method in this embodiment can be referred to each other, the description is repeated here, and thus the description is omitted here. The principle of the system embodiment is the same as that of the method embodiment, so the system embodiment also has the corresponding technical effects of the method embodiment.
Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program to instruct associated hardware, where the program may be stored on a computer readable storage medium. Wherein the computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory, etc.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.
Claims (10)
1. A method of mass data analysis comprising the steps of:
acquiring quality data and check characteristic indexes in the production process, and removing redundant check characteristic indexes according to the quality data and the correlation coefficient to obtain the check characteristic indexes to be analyzed;
removing abnormal data of quality data of each to-be-analyzed inspection characteristic index according to the statistical analysis and the variation self-encoder to obtain to-be-analyzed data;
according to the detection characteristic index to be analyzed and the data to be analyzed, calculating each PPM value in the production process, comparing the PPM value with the corresponding PPM threshold range, and carrying out data envelope analysis on the data to be analyzed which is not in the PPM threshold range.
2. The quality data analysis method according to claim 1, further comprising, before comparing with the corresponding PPM threshold ranges, respectively: if the data quantity of the data to be analyzed is smaller than or equal to the quantity threshold value, acquiring a fluctuation threshold value by constructing confidence coefficient of t distribution, evaluating whether the difference between each PPM value and an ideal PPM value is smaller than the fluctuation threshold value, and if so, retaining the data to be analyzed for calculating the PPM value.
3. The quality data analysis method according to claim 1, wherein the removing redundant test characteristic indexes according to the correlation coefficient matrix to obtain test characteristic indexes to be analyzed includes: dividing all N detection characteristic indexes into a plurality of paired combinations by a traversing and recursion method, wherein a first group of the paired combinations has i indexes, and a second group of the paired combinations is the rest N-i indexes; taking any one of the indexes of the second group with the smallest quantity as the inspection characteristic indexes to be analyzed from the paired combinations meeting the following conditions: the correlation coefficient between each index in the first group and all indexes in the second group is larger than a correlation threshold value and is taken as a basic condition, and the basic condition is not met after any index is taken out from the second group and added into the first group.
4. A quality data analysis method according to claim 3, wherein the correlation coefficient between each index in the first group and all indexes in the second group is obtained by obtaining a linear combination of quality data corresponding to the two groups of indexes, and maximizing pearson correlation coefficients of the two groups of linear combinations.
5. The method for analyzing quality data according to claim 1, wherein the removing abnormal data of the quality data of each inspection characteristic index to be analyzed from the encoder according to the statistical analysis and the variation to obtain the data to be analyzed comprises:
based on statistical analysis, removing the quality data which is larger than an abnormal threshold value as abnormal data after the quality data of each to-be-analyzed inspection characteristic index is subjected to z-score standardization processing;
the quality data after the standardized treatment of each to-be-analyzed test characteristic index is respectively transmitted into a trained variation self-encoder, the obtained output and the input are subjected to difference comparison, and the quality data with the difference value larger than a difference threshold value is taken as abnormal data to be removed;
the remaining quality data is used as the data to be analyzed.
6. The quality data analysis method according to claim 5, wherein the loss function of the variation self-encoder includes a reconstruction term and a KL-divergence regularization term, and wherein a weight parameter is added before the KL-divergence regularization term for reducing the weight of the KL-divergence regularization term.
7. The quality data analysis method according to claim 1, wherein the calculating of each PPM value in the production process based on the inspection characteristic index to be analyzed and the data to be analyzed includes:
collecting the defect number and the severity coefficient of each procedure, and obtaining the total number of defects of each procedure; acquiring the quantity of the data to be analyzed corresponding to the inspection characteristic indexes of each process as the total number of the inspection characteristics of each process according to the inspection characteristic indexes to be analyzed and the data to be analyzed; obtaining PPM values of all the procedures according to the total number of the defects of all the procedures and the total number of corresponding procedure checking characteristics;
according to the total number of process defects and the total number of process checking characteristics of the processes related to each product in the generation process, summarizing to obtain the total number of defects of each product and the total number of the checking characteristics of each product; obtaining PPM values of all products according to the total number of the defects of all the products and the total number of corresponding product inspection characteristics;
according to the total number of product defects and the total number of product inspection characteristics of products to which each model number belongs, the total number of the defects of each model number and the total number of inspection characteristics of each model number are obtained, and according to the total number of the defects of each model number and the total number of the inspection characteristics of the corresponding model number, the PPM value of each model is obtained.
8. The quality data analysis method according to claim 1, wherein the performing data envelope analysis on the data to be analyzed which is not within the PPM threshold value range includes: taking the data to be analyzed which is not in the PPM threshold value range as sample data; acquiring successful data, calculating a confidence interval of the successful data, and indicating whether the sample data is in the confidence interval according to the range of the confidence interval; acquiring an envelope upper limit and an envelope lower limit of successful data according to preset confidence coefficient, wherein the envelope upper limit and the envelope lower limit are used for representing whether the sample data is enveloped or not; acquiring a qualified upper limit and a qualified lower limit according to a preset tolerance value, wherein the qualified upper limit and the qualified lower limit are used for indicating whether sample data are qualified or not; sample data analysis results are generated based on whether the envelope, whether the envelope is acceptable, and whether the confidence interval is present.
9. The method of claim 8, wherein the acquiring success data, calculating a confidence interval for the success data, comprises: respectively counting successful data of each index according to the to-be-analyzed test characteristic index corresponding to the sample data, and if the number of the successful data is greater than a number threshold, constructing a confidence interval through a Gaussian mixture density function (GMM) algorithm; otherwise, a confidence interval is constructed through t distribution.
10. A mass data analysis system, comprising:
the test characteristic index acquisition module is used for acquiring quality data and test characteristic indexes in the production process, and removing redundant test characteristic indexes according to the quality data and the correlation coefficient to obtain test characteristic indexes to be analyzed;
the data acquisition module to be analyzed is used for removing abnormal data of quality data of each inspection characteristic index to be analyzed according to the statistical analysis and the variation self-encoder to obtain data to be analyzed;
and the quality data analysis module is used for calculating each PPM value in the production process according to the to-be-analyzed detection characteristic index and the to-be-analyzed data, comparing the PPM value with the corresponding PPM threshold range respectively, and carrying out data envelope analysis on the to-be-analyzed data which is not in the PPM threshold range.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310007166.5A CN116049157A (en) | 2023-01-04 | 2023-01-04 | Quality data analysis method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310007166.5A CN116049157A (en) | 2023-01-04 | 2023-01-04 | Quality data analysis method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116049157A true CN116049157A (en) | 2023-05-02 |
Family
ID=86128997
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310007166.5A Pending CN116049157A (en) | 2023-01-04 | 2023-01-04 | Quality data analysis method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116049157A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116777292A (en) * | 2023-06-30 | 2023-09-19 | 北京京航计算通讯研究所 | Defect rate index correction method based on multi-batch small sample space product |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105303311A (en) * | 2015-10-21 | 2016-02-03 | 中国人民解放军装甲兵工程学院 | Assessment index selection method and device based on data envelopment analysis |
US20180173733A1 (en) * | 2016-12-19 | 2018-06-21 | Capital One Services, Llc | Systems and methods for providing data quality management |
CN108304350A (en) * | 2017-12-25 | 2018-07-20 | 明阳智慧能源集团股份公司 | Wind turbine index prediction based on large data sets neighbour's strategy and fault early warning method |
US20180357205A1 (en) * | 2015-11-26 | 2018-12-13 | Human Metabolome Technologies Inc. | Data analysis apparatus, method, and program |
CN109101632A (en) * | 2018-08-15 | 2018-12-28 | 中国人民解放军海军航空大学 | Product quality abnormal data retrospective analysis method based on manufacture big data |
CN110807605A (en) * | 2019-11-14 | 2020-02-18 | 北京京航计算通讯研究所 | Key inspection characteristic defect rate statistical method |
CN112149860A (en) * | 2019-06-28 | 2020-12-29 | 中国电力科学研究院有限公司 | Automatic anomaly detection method and system |
CN112258689A (en) * | 2020-10-26 | 2021-01-22 | 上海船舶研究设计院(中国船舶工业集团公司第六0四研究院) | Ship data processing method and device and ship data quality management platform |
WO2021189904A1 (en) * | 2020-10-09 | 2021-09-30 | 平安科技(深圳)有限公司 | Data anomaly detection method and apparatus, and electronic device and storage medium |
CN113609698A (en) * | 2021-08-17 | 2021-11-05 | 北京无线电测量研究所 | Process reliability analysis method and system based on process fault database |
US20210365421A1 (en) * | 2020-05-20 | 2021-11-25 | Hon Hai Precision Industry Co., Ltd. | Data analysis method, computer device and storage medium |
CN114036724A (en) * | 2021-10-19 | 2022-02-11 | 北京轩宇信息技术有限公司 | Method and device for analyzing technical index success envelope of aerospace product |
US20220328332A1 (en) * | 2021-04-13 | 2022-10-13 | Accenture Global Solutions Limited | Anomaly detection method and system for manufacturing processes |
WO2022243764A1 (en) * | 2021-05-18 | 2022-11-24 | LEONARDO S.p.A | Method and system for detecting anomalies relating to components of a transmission system of an aircraft, in particular a helicopter |
-
2023
- 2023-01-04 CN CN202310007166.5A patent/CN116049157A/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105303311A (en) * | 2015-10-21 | 2016-02-03 | 中国人民解放军装甲兵工程学院 | Assessment index selection method and device based on data envelopment analysis |
US20180357205A1 (en) * | 2015-11-26 | 2018-12-13 | Human Metabolome Technologies Inc. | Data analysis apparatus, method, and program |
US20180173733A1 (en) * | 2016-12-19 | 2018-06-21 | Capital One Services, Llc | Systems and methods for providing data quality management |
CN108304350A (en) * | 2017-12-25 | 2018-07-20 | 明阳智慧能源集团股份公司 | Wind turbine index prediction based on large data sets neighbour's strategy and fault early warning method |
CN109101632A (en) * | 2018-08-15 | 2018-12-28 | 中国人民解放军海军航空大学 | Product quality abnormal data retrospective analysis method based on manufacture big data |
CN112149860A (en) * | 2019-06-28 | 2020-12-29 | 中国电力科学研究院有限公司 | Automatic anomaly detection method and system |
CN110807605A (en) * | 2019-11-14 | 2020-02-18 | 北京京航计算通讯研究所 | Key inspection characteristic defect rate statistical method |
US20210365421A1 (en) * | 2020-05-20 | 2021-11-25 | Hon Hai Precision Industry Co., Ltd. | Data analysis method, computer device and storage medium |
WO2021189904A1 (en) * | 2020-10-09 | 2021-09-30 | 平安科技(深圳)有限公司 | Data anomaly detection method and apparatus, and electronic device and storage medium |
CN112258689A (en) * | 2020-10-26 | 2021-01-22 | 上海船舶研究设计院(中国船舶工业集团公司第六0四研究院) | Ship data processing method and device and ship data quality management platform |
US20220328332A1 (en) * | 2021-04-13 | 2022-10-13 | Accenture Global Solutions Limited | Anomaly detection method and system for manufacturing processes |
WO2022243764A1 (en) * | 2021-05-18 | 2022-11-24 | LEONARDO S.p.A | Method and system for detecting anomalies relating to components of a transmission system of an aircraft, in particular a helicopter |
CN113609698A (en) * | 2021-08-17 | 2021-11-05 | 北京无线电测量研究所 | Process reliability analysis method and system based on process fault database |
CN114036724A (en) * | 2021-10-19 | 2022-02-11 | 北京轩宇信息技术有限公司 | Method and device for analyzing technical index success envelope of aerospace product |
Non-Patent Citations (1)
Title |
---|
J. WANG, T. ZHANG, C. WANG AND X. SHI,: "Optimizing the Uncertainty of PPM on Small Batch of Quality Data", 2021 IEEE 6TH INTERNATIONAL CONFERENCE ON SMART CLOUD, 31 December 2021 (2021-12-31), pages 107 - 110 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116777292A (en) * | 2023-06-30 | 2023-09-19 | 北京京航计算通讯研究所 | Defect rate index correction method based on multi-batch small sample space product |
CN116777292B (en) * | 2023-06-30 | 2024-04-16 | 北京京航计算通讯研究所 | Defect rate index correction method based on multi-batch small sample space product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10311368B2 (en) | Analytic system for graphical interpretability of and improvement of machine learning models | |
Scott et al. | Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem | |
US20170126694A1 (en) | Differentially private processing and database storage | |
EP3716165A1 (en) | Esg criteria-based enterprise evaluation device and operation method thereof | |
CN103513983A (en) | Method and system for predictive alert threshold determination tool | |
CN112039903B (en) | Network security situation assessment method based on deep self-coding neural network model | |
CN111027615A (en) | Middleware fault early warning method and system based on machine learning | |
US11640387B2 (en) | Anomaly detection data workflow for time series data | |
CN111338972A (en) | Machine learning-based software defect and complexity incidence relation analysis method | |
CN112685324A (en) | Method and system for generating test scheme | |
CN116049157A (en) | Quality data analysis method and system | |
Sundareswaran | Egomotion from global flow field data | |
Amazal et al. | Estimating software development effort using fuzzy clustering‐based analogy | |
EP3109771A1 (en) | Method, distributed system and device for efficiently quantifying a similarity of large data sets | |
Pauwels et al. | Detecting and explaining drifts in yearly grant applications | |
Gerrits | Soul of a new machine: Self-learning algorithms in public administration | |
Neela et al. | Modeling Software Defects as Anomalies: A Case Study on Promise Repository. | |
Sumargo | Comparing better environmental knowledge based on education and income using the odds ratio | |
CN114358024A (en) | Log analysis method, apparatus, device, medium, and program product | |
CN113920366A (en) | Comprehensive weighted main data identification method based on machine learning | |
Uddin et al. | Actor-level dynamicity: Its distribution analysis eases anomaly detection in longitudinal networks | |
CN117035563B (en) | Product quality safety risk monitoring method, device, monitoring system and medium | |
CN113377746B (en) | Test report database construction and intelligent diagnosis analysis system | |
CN115410718B (en) | Method for evaluating error of investigator in large-scale face-to-face investigation | |
Buccafusco | Profile heavy duty vehicle usage based on CAN bus data mining |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |