CN113537280A - Intelligent manufacturing industry big data analysis method based on feature selection - Google Patents

Intelligent manufacturing industry big data analysis method based on feature selection Download PDF

Info

Publication number
CN113537280A
CN113537280A CN202110559197.2A CN202110559197A CN113537280A CN 113537280 A CN113537280 A CN 113537280A CN 202110559197 A CN202110559197 A CN 202110559197A CN 113537280 A CN113537280 A CN 113537280A
Authority
CN
China
Prior art keywords
big data
industrial big
samples
representative subset
industrial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110559197.2A
Other languages
Chinese (zh)
Inventor
吴志生
曾敬其
李倩倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chinese Medicine
Original Assignee
Beijing University of Chinese Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chinese Medicine filed Critical Beijing University of Chinese Medicine
Priority to CN202110559197.2A priority Critical patent/CN113537280A/en
Publication of CN113537280A publication Critical patent/CN113537280A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention provides an intelligent manufacturing industry big data analysis method based on feature selection, and belongs to the field of intelligent manufacturing. The method comprises the following steps: obtaining a representative subset of original data by adopting a feature selection method, and determining the subordination relation between samples in the representative subset and samples in industrial big data; based on the membership of the samples in the representative subset and the samples in the industrial big data, replacing the samples in the industrial big data with the samples in the representative subset to obtain reconstructed industrial big data; and realizing the analysis of the industrial big data based on the feature extraction through the reconstructed industrial big data. The invention adopts the self-organizing neural network to realize the feature selection of the industrial big data, and realizes the factor analysis, the process monitoring and the intelligent decision of the intelligent manufacturing industrial big data through the representative subset obtained by the feature selection.

Description

Intelligent manufacturing industry big data analysis method based on feature selection
Technical Field
The invention belongs to the field of intelligent manufacturing, relates to an intelligent manufacturing industry big data analysis method, and particularly relates to an intelligent manufacturing industry big data analysis method based on feature selection.
Background
Data driving is a typical feature of intelligent manufacturing, and industrial big data analysis is the core content of the data driving. The industrial big data of the manufacturing process often has the characteristic of multi-source isomerism, namely, variables are derived from a plurality of manufacturing units, and the distribution structure difference among the variables is large. The industrial big data analysis usually adopts a feature extraction method, and data features are discovered through correlation among variables, for example, feature variables are extracted to replace original variables, but original feature space of the data is changed while the variables are compressed. However, the noise of industrial big data, the difference of distribution characteristics among variables and the outlier among variables lead to the complexity of data feature extraction. Therefore, a scientific and effective industrial big data analysis method is a technical difficulty of intelligent manufacturing data drive.
The invention creatively introduces the feature selection into the industrial big data analysis and establishes the intelligent manufacturing industrial big data analysis method based on the feature selection. In distinction from feature extraction, feature selection reduces noise interference on data feature discovery by deriving a representative subset from an original sample set, and does not change the original feature space of the data. In addition, the invention creatively adopts the self-organizing neural network to realize the feature selection of the industrial big data, and realizes the factor analysis, the process monitoring and the intelligent decision of the intelligent manufacturing industrial big data through the representative subset obtained by the feature selection.
Disclosure of Invention
The invention aims to provide an industrial big data analysis method based on feature selection.
Another object of the invention is to provide an application of the method in smart manufacturing, the application content specifically including factor analysis, process monitoring and intelligent decision making.
In order to achieve the above object, in one aspect, the present invention provides a method for analyzing industrial big data based on feature selection, the method comprising the steps of:
step 1: obtaining a representative subset of original data by adopting a feature selection method, and determining the subordination relation between samples in the representative subset and samples in industrial big data;
step 2: based on the membership of the samples in the representative subset and the samples in the industrial big data, replacing the samples in the industrial big data with the samples in the representative subset to obtain reconstructed industrial big data;
and step 3: and realizing the analysis of the industrial big data based on the feature extraction through the reconstructed industrial big data.
According to some embodiments of the invention, the feature selection method comprises the following steps:
the method comprises the following steps: carrying out normalization processing on variables in the industrial big data;
step two: setting a self-organizing feature mapping neural network output layer neuron number and distance calculation method;
step three: determining the subordination relation between the sample in the industrial big data and the neuron of the output layer through the self-organizing feature mapping neural network iterative training;
step IV: and extracting weight vectors of all neurons of the output layer to form a representative subset, wherein each neuron is a sample, and the characteristic selection of industrial big data is realized.
According to some embodiments of the invention, the iterative training process of the self-organizing feature mapping neural network is as follows:
(1) initialization: defining a topological structure a multiplied by b of output layer neurons of the self-organizing neural network, wherein the total number K of the output layer neurons is a multiplied by b. Assigning [0,1 ] to weight corresponding to n nodes of input layer in neuron of output layer]Random value of interval, and normalization processing to obtain output layer neuron weight matrix wjp,j=1,2,......K,p=1,2......n。
The initial learning rate η (0) is defined, generally taking a value of 0.2, and the maximum value does not exceed 0.5.
An initial neighborhood size N (0) is defined, typically 1/2-1/3 of the larger of the output layer array amplitudes a, b.
(2) Sample input: will train the setNormalizing the sample to obtain XqpThe method comprises the following steps of 1,2, a.
(3) Finding winning neurons: training set sample X of calculation inputqAnd neuron weight vector wjAnd determining the neuron with the shortest distance as the winning neuron j*
XqAnd wjThe distance may be calculated by euclidean distance, mahalanobis distance, link distance, or the like.
(4) Updating the weight of the winning neuron: adjusting weight vectors of winning neurons
Figure RE-RE-GDA0003224165140000022
Figure RE-RE-GDA0003224165140000021
Wherein
Figure RE-RE-GDA0003224165140000034
As winning neuron j at the t-th iteration*The weight vector of (1), η (T) is a learning rate at the time of the tth iteration, the learning rate decreases as the number of iterations increases, and a general variation function is η (T) ═ η (0) (1-T/T), where T is a set total number of iterations.
(5) Updating the neighborhood weight of the winning neuron: to win neuron j*Adjusting the weight vector w of other neurons in the square neighborhood with radius N (t) as the centerj
N(t)=int[N(0)(1-t/T)]
Figure RE-RE-GDA0003224165140000035
η(t,d)=η(t)e-d
Where int is the rounding symbol and the neighborhood radius decreases as the number of iterations increases. d is the distance between the neuron in the neighborhood and the winning neuron, and the closer the neuron is to the winning neuron, the larger the weight adjustment is.
(6) And (5) finishing training: and (4) when all the training set samples are input, making T equal to T +1, and repeating the training from the step (3) until T equal to T.
According to some embodiments of the invention, the initial learning rate of the self-organizing feature mapping neural network is 0.2 to 0.5, and the initial neighborhood size is 1/2 to 1/3 of the number of neurons in the output layer.
According to some embodiments of the invention, the number of neurons in the output layer of the self-organizing feature-mapped neural network is greater than 3 × 3.
According to some embodiments of the invention, the distance calculation method for the self-organizing feature mapping neural network comprises euclidean distance and connection distance.
According to some embodiments of the present invention, the consistency evaluation method of the representative subset and the original data feature space is to characterize the consistency of the feature extraction space by the association relationship between the first principal component and the second principal component loading matrix.
According to some embodiments of the present invention, the correlation of the load matrix is calculated as follows:
Figure RE-RE-GDA0003224165140000031
wherein
Figure RE-RE-GDA0003224165140000032
Is the mean value of the matrix a and,
Figure RE-RE-GDA0003224165140000033
being the mean of the matrix B, the closer the correlation coefficient r is to ± 1, the stronger the correlation between the two matrices.
In summary, the invention creatively adopts a feature selection method to obtain the representative subset of the industrial big data, and the analysis of the industrial big data is realized through the representative subset.
On the other hand, the invention also provides application of the method in intelligent manufacturing industry big data analysis, and the application content specifically comprises factor analysis, process monitoring and intelligent decision.
According to some embodiments of the present invention, the intelligent manufacturing factor analysis is implemented by analyzing correlation coefficients of factors in the reconstructed industrial big data.
According to some embodiments of the present invention, the intelligent manufacturing process monitoring is performed by setting a confidence interval of the variable in the reconstructed industrial big data to be a standard control of the variable, and the process capability index is used to perform the process monitoring of the industrial big data.
According to some embodiments of the invention, wherein the process capability index (C)P) The calculation method of (2) is as follows:
when the technical standard requires control of both sides, CP=(Tu-Tl)/6σ;
When the technical standard only requires a lower control limit, CPu=(μ-Tl)/3σ;
When the technical standard only requires the upper control limit, CPu=(Tu-μ)/3σ。
Wherein T isuUpper limit of control for technical standard, Tlμ is the average of the manufacturing process samples for the lower control limit of the technical standard. In addition, σ is the total standard deviation of the sample distribution, which can be estimated by the sample standard deviation S according to the American Society of Testing Materials (American Society of Testing Materials) regulations. For example, when the number of samples is 25, σ is S1.0105.
According to some embodiments of the invention, the process capability index is ranked as follows: cPLess than 0.67, the process capability is seriously insufficient; c is more than 0.67PLess than 1.00, insufficient process capability; 1.00 < CP< 1.33 process capacity is sufficient; 1.33 < CPLess than 1.67 process capacity is sufficient; 1.67 < CPThe process capability is too high.
According to some embodiments of the invention, the intelligent manufacturing decision making is performed by independent sample T-test analysis of variables in the reconstructed industrial big data.
In conclusion, the invention provides an intelligent manufacturing industry big data analysis method based on feature selection. The method of the invention has the following advantages:
the invention introduces the feature selection into the industrial big data analysis and establishes the intelligent manufacturing industrial big data analysis method based on the feature selection. In distinction from feature extraction, feature selection reduces noise interference on data feature discovery by deriving a representative subset from an original sample set, and does not change the original feature space of the data. The invention adopts the self-organizing neural network to realize the feature selection of the industrial big data, and realizes the factor analysis, the process monitoring and the intelligent decision of the intelligent manufacturing industrial big data through the representative subset obtained by the feature selection.
Drawings
Fig. 1 industry big data of coated tablet manufacturing process, model A.E-R, b.
Fig. 2 is the correlation analysis result of variables, correlation coefficient between variables a, grey correlation degree between variables b, and yield data distribution c in the big data of industrial manufacturing of coated tablets.
Fig. 3 SOM feature selection results of coated tablet industrial manufacturing big data, a attribution of a sample in neurons, b connecting distance of neurons, c distribution of neuron weights.
Fig. 4 shows the consistency evaluation and influence factors of the representative subset and the feature space of the raw data, a PCA result of the raw data, b PCA result of the representative subset, and influence of c.som neural network model parameters on the correlation coefficient of the load matrix.
Fig. 5 shows the analysis result of the product yield factor selected by the industrial big data characteristics, the result of the variation of the average value and the standard deviation of the product yield along with the production time, the correlation coefficient of the product yield, and the linear regression of the product yield.
Figure 6 dissolution process monitoring results for large industrial data feature selection, a. dissolution paired T test results for raw data and representative subsets, b. process capability index for dissolution of granules and tablets as a function of production time.
Fig. 7 shows the intelligent decision results of the raw material manufacturers selected according to the characteristics of the industrial big data, a. the distribution of the raw material manufacturers in the production time, and b. the difference of the finished product rate and the dissolution rate among different manufacturers.
Detailed Description
The following detailed description is provided for the purpose of illustrating the embodiments and the advantageous effects thereof, and is not intended to limit the scope of the present disclosure.
Example 1: feature selection for coated tablet industry big data
(1) Industrial big data description of coated tablets
907 batches of industrial big data of a certain coated tablet in 11 months to 2018 and 6 months in 2013. The coated tablet is composed of L, S, H and A, and the variables of the industrial big data comprise 16 quality attributes of the manufacturing process of the coated tablet, and relate to four manufacturing units of granulation, tabletting, coating, packaging and the like. The E-R model of the large data for the coated tablet industry is shown in FIG. 1A. In addition, the yield of each batch of samples of industrial big data is 500 ten thousand, 2 batches are taken every day, 10-30 batches are produced every month, and the production month is not fixed every year. It is to be noted that the result of the change in yield with the production time (fig. 1B) shows that the finished product of the coated tablet is continuously decreased, and in the range of 2018 to 2018 and 6 months, although the yield of the granules and the yield of the coating are significantly increased, the problem of the decrease in the yield of the product is still not solved. Therefore, the key problem of product yield reduction is determined through industrial big data analysis, the production efficiency of the coated tablet is improved, and the cost is saved.
(2) Feature extraction of industrial big data
The industrial big data of the coating tablet manufacturing process has the characteristic of multi-source isomerism, namely variables are derived from a plurality of manufacturing units, and the distribution structure difference among the variables is large. . For example, the RSD range of the finished product ratio is 0.63-1.53%, the material balance is 0.60-0.78%, the dissolution rate of the particles is 2.08-2.27%, and the dissolution rate of the tablets is 5.67-6.22%. In addition, the variance of the moisture content of the granules and the weight gain of the coating is large, the RSD is 10.17% and 15.01%, and the statistical results of the variables in the industrial big data are shown in the table 1. Therefore, feature extraction of industrial big data should pay attention to the careful use of Principal Component Analysis (PCA), local projection algorithm (LPP), laplacian mapping (LE), and other variable compression methods, because each variable cannot be simply given the same weight.
Different from data collected in a laboratory, the collection period of industrial big data is long, the interference factors in the collection process of sample data are many, and the data noise can cover the characteristics and the degree of correlation among variables. Taking the yield data as an example, the correlation degree with the product yield (see fig. 2A), the tabletting yield > the granule yield > the coating yield (0.72>0.08>0.02), the correlation coefficient between variables is low, and the reliability of the factor analysis result is low.
From the relation of the samples, the samples of the industrial big data are dynamically associated in time series, so that the grey association degree can be adopted to replace the correlation coefficient to evaluate the association degree between the variables (see fig. 2B). However, when the degree of association between variables is evaluated by the gray degree of association, the distribution characteristics and outliers of the variables must be considered. The yield of granules and tablets obviously does not conform to the normal distribution, and the outliers in the sample are more (fig. 2C), which affects the calculation result of the grey correlation degree. In conclusion, research results reveal the characteristics of multisource isomerism of industrial large data in the manufacturing process of the coated tablets, and meanwhile, the complexity of data feature extraction is caused by the problems of data noise, the difference of distribution features among variables, outliers in the variables and the like.
TABLE 1 statistical results of variables in the industry Mass data of the coated tablet manufacturing Process
Figure RE-RE-GDA0003224165140000061
Figure RE-RE-GDA0003224165140000071
(3) Feature selection for industrial big data
The invention provides a method for realizing the characteristic selection of industrial big data by adopting an SOM algorithm, an output layer neural network is a 6 multiplied by 6 hexagonal topological structure, an initial learning rate eta (0) is 0.02, the size of an initial field N (0) is 1/2 of the maximum value of an output layer array, the iteration time T is 1000 times, and the calculation method of the distance between a sample and a neuron is a connection distance, namely the Chebyshev distance. Based on the principle of minimum distance, 907 samples are categorized into 36 neurons (fig. 3A), each neuron contains 16 weight variables, which correspond to 16 variables in the samples. The sample set composed of 36 neurons is a representative subset of industrial big data, and contains characteristic information of original data. The difference of adjacent neurons can be characterized by the connection distance (fig. 3B), and due to the neighborhood weight update strategy, the weight difference of adjacent neurons is small (fig. 3C), so that abnormal samples can be quickly identified. For example, the product yield for the top right hand position sample is much lower than the other samples, and additionally, the sheeting yield also shows similar results.
Example 2: effect of model parameters on feature selection results
The invention provides an industrial big data analysis method based on feature selection, which is provided with the premise that a representative subset is consistent with a feature space of original data, wherein the consistency comprises the consistency of variable types in the feature space and the consistency of incidence relations among variables. Here, PCA is used to perform feature extraction on the raw data and the representative subset respectively, and the correlation between the variables is characterized by the load matrices of the first principal component and the second principal component, as shown in fig. 7A and 7B, and the similarity of the two load matrices is 0.9955. The research result shows that the representative subset has consistency with the feature space of the original data.
It should be noted that the representative subset is extracted by the SOM neural network model, and improper model parameters may result in insufficient feature extraction of the raw data. The number of neurons in the output layer and a distance calculation method determine the number of samples in the representative subset, and the distance calculation method determines a selection method of the samples in the representative subset. As a result of the research, it is found that when the number of neurons in the output layer is 3 × 3, the correlation coefficient of the load matrix is significantly lower, and when the mahalanobis distance is used as the distance calculation method, the correlation coefficient of the load matrix is significantly lower than those of the other two distance calculation methods, as shown in fig. 7C. Therefore, when the SOM neural network model is used for feature selection, the number of neurons in the output layer should not be less than 3 × 3, and mahalanobis distance should be avoided as a distance calculation method.
Example 3: application of feature selection in factor analysis
After the industrial big data characteristic is selected, a representative subset consisting of 36 samples is obtained, the complexity and the noise of the data are reduced, and the reliability of factor analysis is greatly improved. Taking the product yield as an example, the results of the average value and the standard deviation of the product yield of 30 consecutive batches changing with the production time show (fig. 4A), the product yield of the coated tablets has the problem of continuous reduction, and the cliff type reduction appears between 1 month in 2018 and 6 months in 2018. First, correlation coefficient analysis was performed on the variables (fig. 4B), and the degree of correlation between the product yield, the tabletting yield > the granule yield > the coating yield (0.88>0.56>0.27), and the degree of correlation between the tabletting yield and the product yield was much higher than that of other manufacturing units. In addition, the product yield and the tabletting yield are in positive correlation with each other in fig. 4C, and the fitting degree of regression analysis is high (P < 0.001). In conclusion, the feature selection can improve the reliability of factor-associated regression analysis, and research results reveal that the reduction of the tabletting yield in the manufacturing process of the coated tablet is a key problem influencing the reduction of the product yield.
Example 4: application of feature selection in process monitoring
In order to further explore the advantages of feature selection in industrial big data analysis, an application method of the feature selection in process monitoring is developed. Process capability index (C)P) The quality stability is evaluated by calculating the ratio of the total standard deviation sigma of the sample to the control limit of the technical standard, and the method is widely used for monitoring the industrial manufacturing process. However, for the factors lacking the control limit of the technical standard in the manufacturing process, the method for monitoring the quality stability process needs to be established. Here, we propose confidence intervals that use factors in the representative subset
Figure RE-RE-GDA0003224165140000081
As a standard control limit for this factor, C-based implementationPQuality stability process monitoring. It should be noted that although the samples in the representative subset retain the characteristics of the original data, the standard control limit is calculated by a weighting method considering the frequency distribution of each sample in the original data.
Taking dissolution rates of granules and plain tablets as an example, the dissolution rates become smaller after characteristic selection, but the dissolution rates are not significantly different (paired T test, P test)>0.1), data distribution profile indicating that the profile selection did not alter dissolution, and paired T test results for dissolution are shown in fig. 5A. Since the higher the dissolution rates of both the granules and the tablets, the better, only the lower limit of dissolution rate was controlled to calculate manufacturing Process CPThe cycle was 25 batches, see FIG. 5B. The research result shows that the process capability of the dissolution rate of the granules is insufficient (C)P<1.00) in need of further improvement of quality control method thereof, wherein the particles H are C of dissolution ratePThe whole body is in an ascending trend and can be used as an entry point for improving the quality control method of the dissolution rate of the granules. In addition, the quality control method of the dissolution rate of the tablet also needs to be improved, and the dissolution rate of the tablet A has sufficient process capacity from 5 months in 2017 to 12 months in 2017 (C)PMore than 1.00) can be used as an entry point for improving the quality control method of the dissolution rate of the plain tablets.
Example 5: application of feature selection in intelligent decision making
The core content of industrial big data analysis is to realize intelligent decision of data-driven manufacturing process. The feature selection can reduce the complexity of data without changing the feature space of the data, thereby improving the reliability of intelligent decision in the manufacturing process. Take decision analysis of the coated tablet raw material manufacturer as an example. There are four manufacturers for raw material L, of which L1 and L2 are two main manufacturers, and two manufacturers for raw material S and raw material H, respectively, and the distribution of raw material manufacturers in the production time is shown in fig. 6A.
The product yield and dissolution rate differences between different manufacturers were analyzed by independent sample T test, see fig. 6B. The research result shows that the product yield of L2 is remarkably higher than that of L1(P <0.01), and in addition, the product yield of L3 is remarkably higher than that of L1(P <0.01) although the use frequency of L3 is less, so that the raw materials of L2 and L3 manufacturers are favorable for improving the product yield. However, the dissolution rate of L3 particles is significantly lower than that of L1(P <0.01), so it is recommended to use L2 as the manufacturer of raw material L. In addition, the product yield of H2 is obviously higher than that of H1(P <0.01), the granule dissolution rate of the H2 is higher (P <0.01), and H2 is selected as a manufacturer of raw material H to improve the product yield although the dissolution rates of plain tablets are not obviously different.

Claims (10)

1. The industrial big data analysis method based on feature selection is characterized in that a representative subset of industrial big data is obtained by the feature selection method, and industrial big data analysis is realized through the representative subset, and the method comprises the following steps:
step 1: obtaining a representative subset of original data by adopting a feature selection method, and determining the subordination relation between samples in the representative subset and samples in industrial big data;
step 2: based on the membership of the samples in the representative subset and the samples in the industrial big data, replacing the samples in the industrial big data with the samples in the representative subset to obtain reconstructed industrial big data;
and step 3: and realizing the analysis of the industrial big data based on the feature extraction through the reconstructed industrial big data.
2. The method of claim 1, wherein the feature selection method in step 1 comprises the steps of:
the method comprises the following steps: carrying out normalization processing on variables in the industrial big data;
step two: setting a self-organizing feature mapping neural network output layer neuron number and distance calculation method;
step three: determining the subordination relation between the sample in the industrial big data and the neuron of the output layer through the self-organizing feature mapping neural network iterative training;
step IV: and extracting weight vectors of all neurons of the output layer to form a representative subset, wherein each neuron is a sample, and the characteristic selection of industrial big data is realized.
3. The method of claim 2, wherein the step of (ii) the self-organizing feature mapping neural network has an initial learning rate of 0.2 to 0.5, and an initial neighborhood size of 1/2 to 1/3 of the number of neurons in the output layer.
4. The method of claim 2, wherein step (ii) the number of neurons in the output layer of the ad hoc feature mapping neural network is greater than 3 x 3.
5. The method of claim 2, wherein step (ii) the self-organizing feature mapping neural network distance calculation method comprises euclidean distance and connection distance.
6. Use of the method of any of claims 1 or 2 in intelligent manufacturing industry big data analytics, including in particular factor analysis, process monitoring and intelligent decision making.
7. The application of claim 6, wherein the application in factor analysis is realized by analyzing correlation coefficients of factors in the reconstructed industrial big data.
8. The application of claim 6, wherein in the process monitoring, the standard control of the variable is set through the confidence interval of the variable in the reconstructed industrial big data, and the process monitoring of the industrial big data is realized by adopting the process capability index.
9. The application of the method according to claim 6, wherein the application of the method in intelligent decision making is realized by independent sample T-test analysis of variables in the reconstructed industrial big data.
10. The use of claim 6, comprising pharmaceutical manufacturing process big data containing more than ten thousand data points.
CN202110559197.2A 2021-05-21 2021-05-21 Intelligent manufacturing industry big data analysis method based on feature selection Pending CN113537280A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110559197.2A CN113537280A (en) 2021-05-21 2021-05-21 Intelligent manufacturing industry big data analysis method based on feature selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110559197.2A CN113537280A (en) 2021-05-21 2021-05-21 Intelligent manufacturing industry big data analysis method based on feature selection

Publications (1)

Publication Number Publication Date
CN113537280A true CN113537280A (en) 2021-10-22

Family

ID=78124352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110559197.2A Pending CN113537280A (en) 2021-05-21 2021-05-21 Intelligent manufacturing industry big data analysis method based on feature selection

Country Status (1)

Country Link
CN (1) CN113537280A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101194273A (en) * 2005-03-16 2008-06-04 皇家创新有限公司 Spatio-temporal self organising map
CN101399672A (en) * 2008-10-17 2009-04-01 章毅 Intrusion detection method for fusion of multiple neutral networks
CN106650825A (en) * 2016-12-31 2017-05-10 中国科学技术大学 Automotive exhaust emission data fusion system
US20170185892A1 (en) * 2015-12-27 2017-06-29 Beijing University Of Technology Intelligent detection method for Biochemical Oxygen Demand based on a Self-organizing Recurrent RBF Neural Network
CN108574691A (en) * 2017-03-09 2018-09-25 通用电气公司 System, method and computer-readable medium for protecting power grid control system
CN109800790A (en) * 2018-12-24 2019-05-24 厦门大学 A kind of feature selection approach towards high dimensional data
US20190219994A1 (en) * 2018-01-18 2019-07-18 General Electric Company Feature extractions to model large-scale complex control systems
CN110073301A (en) * 2017-08-02 2019-07-30 强力物联网投资组合2016有限公司 The detection method and system under data collection environment in industrial Internet of Things with large data sets
CN110349633A (en) * 2019-07-12 2019-10-18 大连海事大学 A method of irradiating biological marker and predicting radiation dosage are screened based on rdaiation response biological pathways
CN111598003A (en) * 2020-05-18 2020-08-28 温州大学 Time-frequency image classification method for electroencephalogram signals of epileptics
CN111767273A (en) * 2020-06-22 2020-10-13 清华大学 Data intelligent detection method and device based on improved SOM algorithm
CN112749763A (en) * 2021-01-27 2021-05-04 武汉理工大学 Time series classification analysis method and system for glass quality influence factors

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101194273A (en) * 2005-03-16 2008-06-04 皇家创新有限公司 Spatio-temporal self organising map
CN101399672A (en) * 2008-10-17 2009-04-01 章毅 Intrusion detection method for fusion of multiple neutral networks
US20170185892A1 (en) * 2015-12-27 2017-06-29 Beijing University Of Technology Intelligent detection method for Biochemical Oxygen Demand based on a Self-organizing Recurrent RBF Neural Network
CN106650825A (en) * 2016-12-31 2017-05-10 中国科学技术大学 Automotive exhaust emission data fusion system
CN108574691A (en) * 2017-03-09 2018-09-25 通用电气公司 System, method and computer-readable medium for protecting power grid control system
CN110073301A (en) * 2017-08-02 2019-07-30 强力物联网投资组合2016有限公司 The detection method and system under data collection environment in industrial Internet of Things with large data sets
US20190219994A1 (en) * 2018-01-18 2019-07-18 General Electric Company Feature extractions to model large-scale complex control systems
CN109800790A (en) * 2018-12-24 2019-05-24 厦门大学 A kind of feature selection approach towards high dimensional data
CN110349633A (en) * 2019-07-12 2019-10-18 大连海事大学 A method of irradiating biological marker and predicting radiation dosage are screened based on rdaiation response biological pathways
CN111598003A (en) * 2020-05-18 2020-08-28 温州大学 Time-frequency image classification method for electroencephalogram signals of epileptics
CN111767273A (en) * 2020-06-22 2020-10-13 清华大学 Data intelligent detection method and device based on improved SOM algorithm
CN112749763A (en) * 2021-01-27 2021-05-04 武汉理工大学 Time series classification analysis method and system for glass quality influence factors

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SINA JAHANDARI等: "Online Forecasting of Synchronous Time Series Based on Evolving Linear Models", 《IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS》, vol. 50, no. 5, pages 1 - 12 *
任忠铭: "基于自组织递归RBF神经网络的出水BOD软测量研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅰ辑》, no. 3, pages 027 - 990 *
何兰: "基于CT影像组学术前预测非小细胞肺癌淋巴结转移的研究", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》, no. 5, pages 060 - 71 *
刘道元等: "一种面向订单剩余完工时间预测的SOM-FWFCM特征选择算法", 《中国机械工程》, vol. 32, no. 9, pages 1073 - 1079 *

Similar Documents

Publication Publication Date Title
CN109902861B (en) Order production progress real-time prediction method based on double-layer transfer learning
Pham et al. Control chart pattern recognition using a new type of self-organizing neural network
CN109492748B (en) Method for establishing medium-and-long-term load prediction model of power system based on convolutional neural network
CN113011239B (en) Motor imagery classification method based on optimal narrow-band feature fusion
Cudney et al. Applying the Mahalanobis–Taguchi system to vehicle handling
CN102722103A (en) Method for optimizing multi-objective parameters based on fuzzy analysis and visualization technology
CN115008818A (en) Stamping process optimization method capable of promoting production efficiency of sheet metal structural part
CN113869404A (en) Self-adaptive graph volume accumulation method for thesis network data
CN113868960B (en) Soil heavy metal characteristic selection method and system based on typical related forests
CN113537280A (en) Intelligent manufacturing industry big data analysis method based on feature selection
CN110766082A (en) Plant leaf disease and insect pest degree classification method based on transfer learning
Łęski Neuro-fuzzy system with learning tolerant to imprecision
Malakooti et al. A variable-parameter unsupervised learning clustering neural network approach with application to machine-part group formation
Soares et al. Design and application of soft sensor using ensemble methods
Mir et al. Improving data clustering using fuzzy logic and PSO algorithm
Łęski A fuzzy if-then rule-based nonlinear classifier
CN116089801A (en) Medical data missing value repairing method based on multiple confidence degrees
TW202312030A (en) Recipe construction system, recipe construction method, computer readable recording media with stored programs, and non-transitory computer program product
CN110399885B (en) Image target classification method based on local geometric perception
CN113139622B (en) MF-EF-IF manufacturing system feature extraction method based on big data driving
CN112684766B (en) Multi-machine cooperative control method and system based on negative entropy increase
CN114189825B (en) Data processing method and system based on industrial Internet and intelligent manufacturing
CN108363830A (en) A kind of non-cooperation of the principle scheme towards functional clothes hanger-cooperative game decision-making technique
CN115147203B (en) Financial risk analysis method based on big data
Jia et al. Unsupervised Feature Selection via Adaptive Feature Clustering for High-dimensional Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination