CN107729943B - Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof - Google Patents
Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof Download PDFInfo
- Publication number
- CN107729943B CN107729943B CN201710992778.9A CN201710992778A CN107729943B CN 107729943 B CN107729943 B CN 107729943B CN 201710992778 A CN201710992778 A CN 201710992778A CN 107729943 B CN107729943 B CN 107729943B
- Authority
- CN
- China
- Prior art keywords
- data
- value
- felm
- missing
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000000034 method Methods 0.000 claims abstract description 27
- 239000011159 matrix material Substances 0.000 claims abstract description 18
- 238000005457 optimization Methods 0.000 claims abstract description 7
- 238000005192 partition Methods 0.000 claims abstract description 6
- 238000011156 evaluation Methods 0.000 claims abstract description 3
- 229910000831 Steel Inorganic materials 0.000 claims description 22
- 238000005096 rolling process Methods 0.000 claims description 22
- 239000010959 steel Substances 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 16
- 238000007621 cluster analysis Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 239000000126 substance Substances 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 238000009776 industrial production Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012067 mathematical method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
Abstract
The invention relates to a missing data fuzzy clustering algorithm for optimizing estimation of an information feedback extreme learning machine and application thereof, which comprises the following basic steps: 1) calculating and selecting data attributes with higher correlation by adopting mutual information, and selecting complete data in incomplete data as a training sample of the FELM network according to the attributes; 2) initializing an input weight omega and a bias value b of the FELM network; 3) pre-filling the missing attribute according to a nearest neighbor rule, and adjusting a pre-filling value according to an error obtained by training the FELM network by the training sample until a reasonable numerical value is found for filling, so as to obtain a recovered complete data set; 4) initializing parameters of a fuzzy C-means algorithm, clustering number C, fuzzy coefficient m, threshold epsilon and membership degree partition matrix U(0)(ii) a 5) And obtaining a final clustering result through a membership degree partition matrix U and a clustering center V of the iterative optimization fuzzy C-means algorithm. By the method, more reasonable attribute evaluation can be obtained by fully utilizing the relevance between the data samples and the attributes and the distribution information of the complete data samples and the incomplete data samples, so that the clustering result of the incomplete data set is more accurate.
Description
Technical Field
The invention relates to a missing data fuzzy clustering algorithm for optimizing estimation of an information feedback extreme learning machine and application thereof, belonging to an industrial informatization technology.
Background
Steel is an indispensable important material for construction and quartification realization in China, the steel industry is the basis of national development, and the steel industry keeps steady and high-speed development for more than sixty years after being established, so that an industrial strip steel technical system is established. At present, China is in an important stage of industrial development, and the demand of steel is still huge. For the steel industry, it faces a very large market space. The method has practical significance on how to creatively reform, reduce and produce high-quality, high-benefit and high-level steel with low carbon for the existing strip steel production line. At the present stage, informatization is a strategic measure covering the modernization overall situation, and the steel industry needs to fully combine the informatization technology to further innovate and modify, fully integrate the advanced technology of the informatization industry in the steel rolling process and fully realize the industrial informatization collaborative development. Therefore, it is very important to perform cluster analysis on strip steel data and to enhance the industrial production innovation by analysis results.
In recent years, clustering analysis has been adapted to aggregate many different types of data. Has achieved wide application and development in many research fields. It is a significant matter to use the analysis result to adjust the production line by using the mathematical method to determine the relationship between the strip data samples according to the property of the strip data itself and some similarity or difference measure and to perform cluster analysis on the relationship. However, in real production life, the influence of multiple factors is: such as failure of the data acquisition equipment, failure of the storage medium, failure of the transmission medium, omission of human factors or limitation of the detection instrument, etc. The collected data set has an incomplete phenomenon, and the traditional clustering method cannot be directly applied to the incomplete data set. Therefore, it is very important to select a proper way to process the incomplete data, and to analyze the final result and make future industrial plans.
Disclosure of Invention
In order to solve the problems, the invention provides a missing data fuzzy clustering algorithm for optimizing the estimation of the information feedback limit learning machine, and the missing data fuzzy clustering algorithm is applied to the analysis of strip steel data, and the industrial production reform is strengthened through the analysis result.
The invention is realized by the following technical scheme: the fuzzy clustering algorithm of the missing data of the optimized estimation of the information feedback extreme learning machine is characterized by comprising the following steps:
1) calculating and selecting data attributes with higher correlation by adopting mutual information, and selecting complete data in incomplete data as a training sample of the FELM network according to the attributes;
wherein, muX(x) An edge probability density function representing a variable X; mu.sY(Y) an edge probability density function representing the variable Y; mu.sXY(x, y) represents a joint probability density function between variables;
2) and (3) determining parameters of the FELM network: initializing an input weight omega and a bias value b; setting the initialization values of omega and b between intervals < -1,1 >, randomly selecting any random number in the interval to initialize the network, and determining the number of hidden layer nodes of the extreme learning machine;
3) pre-filling the missing attribute according to a nearest neighbor rule, and adjusting the pre-filling value by adopting an error retrieval method according to an error obtained by training the FELM network by a training sample until a reasonable numerical value is found and filled, thereby obtaining a recovered complete data set;
4) initializing parameters of a fuzzy C-means algorithm, clustering number C, fuzzy coefficient m, threshold epsilon and membership degree partition matrix U(0);
5) Clustering the recovered complete data set by using the fuzzy C mean value, and dividing the matrix U according to the formula (2) and the membership degree when the iteration number t is equal to l(l-1)Calculating a clustering center matrix V(l)According to the formulae (3) and V(l)Updating U(l)For a given threshold value ε, ifThe algorithm is terminated; otherwise, continuing to iteratively update the membership grade division matrix and the clustering center, wherein l is l + 1.
The step 3) pre-fills the missing attribute according to the nearest neighbor rule, and adjusts the pre-filling value by adopting an error retrieval method according to the error obtained by training the FELM network by the training sample until a reasonable numerical value is found and filled, and then the process of obtaining the recovered complete data set is as follows:
1) and pre-filling the missing attribute according to a nearest neighbor rule, selecting k data closest to the data sample, calculating the average value of the corresponding positions of the k data samples from the corresponding positions of the missing data, and taking the average value as a pre-filling value of the incomplete data.
Wherein x isaAnd xbIs x respectivelyiaAnd xibAnd I isiThe satisfied condition is shown in formula (5):
2) calculating an output matrix H of the hidden layer of the FELM network by using a formula (6-8);
wherein the content of the first and second substances,the output of the ith hidden layer is shown;is thatAnd xjInner product of (d);expressed is the input weight of the link between the input layer and the hidden layer; beta is aiDescribing the output weight value linked between the hidden layer and the output layer; biThe bias value of the jth hidden layer is indicated.
Hβ=T (7)
Where H is the output of the hidden layer node, β is the output weight, and T is the desired weight.
2) Calculating the output weight of the FELM network, and calculating the output weight by using the obtained output matrix H and the expected output value according to a formula (9);
wherein the content of the first and second substances,is the Moore-penrose generalized inverse of H,is the smallest and unique.
3) Obtaining the error between the actual output value and the real output value, feeding back the error, and assuming that the predicted value output by the extreme learning machine is Y, the actual value is Y, and the error is e0;
e0=Y-y (10)
4) And judging the magnitude relation between the obtained error and the obtained error of the training sample, if the magnitude relation meets the iteration stop requirement, filling the missing attribute, otherwise, receiving the error, readjusting the pre-filling value, and returning to the step 1).
The error retrieval method comprises the following specific processes:
assume that the initial estimate using the k-nearest neighbor rule for the missing attribute is EkUsing the FELM network to derive the mean error value of the training samples asIf the output value obtained by performing the FELM learning prediction on the data containing the missing attribute is Y, and the real value of the data is Y, the error e is obtained0Y-Y, calculatingAdjusting the fill value of the missing attribute:
1) if E < 0, then re-adjust the fill value E for the missing attributenew=Ek+ ρ e, i.e., increasing the value with a certain probability, and then performing FELM learning as an input, where ρ ∈ [0, 1]]Is randomly selected according to a random function;
2) if it isThen the fill value E of the missing attribute is readjustednew=Ek-pe, then as input to FELM learning;
3) if it isIt is said that the value predicted by the FELM network, which is close to the true value, is acceptable, so this value is used as a fill-in for the missing attributes of the incomplete data set.
The application of the missing data fuzzy clustering algorithm of the information feedback limit learning machine optimization estimation in strip steel data clustering statistics comprises the following processes:
1) collecting experimental data: collecting data collected by the strip steel at a certain period of time as a data sample;
2) the following attributes are extracted from the collected data sample: the rolling force of a rolling frame, the size of a roll gap between rolling rolls, the roll gap difference between the rolling rolls, the inlet temperature, the outlet temperature, the rolling current, the rolling speed and the SONY value;
3) taking the attribute value acquired in the step 2) as a training data set;
4) and carrying out normalization processing on the data set. For reasons such as data attribute magnitude, all values in a data set are converted to corresponding values in a [0, 1] interval to eliminate differences among data;
5) training samples are selected and optimized. And calculating and selecting data attributes with higher correlation by adopting mutual information, and selecting complete data in incomplete data as a training sample of the FELM network according to the attributes.
6) And determining parameters of the FELM network. Initializing the input weight ω and the offset value b. Setting the initialization values of omega and b between intervals < -1,1 >, randomly selecting any random number in the interval to initialize the network, and determining the number of hidden layer nodes of the extreme learning machine;
7) missing attribute evaluation. Pre-filling the missing attribute according to a nearest neighbor rule, and adjusting a pre-filling value by adopting an error retrieval method according to an error obtained by training a training sample until a reasonable numerical value is found for filling;
8) the FCM algorithm is used to perform cluster analysis on the recovered complete data set.
The invention has the beneficial effects that: traditional solutions either consider only inter-data associations or rely on inter-attribute associations. The method combines internal and external relations (namely, the relation between data and attributes), uses the FELM network to realize the optimized estimation of the missing value of the data, and then carries out corresponding fuzzy clustering analysis on the data set after the optimization is complete. And calculating the correlation between the sample attributes by using the mutual information, thereby providing a theoretical base pad for the selection of the training sample. And selecting a plurality of nearest neighbors adjacent to the incomplete data by using a nearest neighbor rule based on the local distance, and preparing a pre-filling value for each data missing value, wherein the pre-filling value is iteratively used by the FELM network. A plurality of errors (difference between real output and expected output) are obtained through a training sample set, and the average error of the errors is obtained. In response to this adjustment criterion, an error search is used to continually increase or decrease the difference optimization adjustment estimate. And repeating the steps, and harvesting the optimal estimated numerical value of the missing value to fulfill the aim of reasonably and efficiently perfecting the incomplete data set.
Drawings
Fig. 1 is a topological structure diagram of a feedback type extreme learning machine.
Fig. 2 is a flow chart of the algorithm of the present invention.
Fig. 3 is a signal acquisition diagram of strip rolling data.
Fig. 4 is a graph of the change between the number of iterations of the strip rolling data set and the objective function.
Detailed Description
The invention is based on the following theory:
1. information feedback limit learning machine
Extreme Learning Machines (ELMs) are a new type of single hidden layer feedforward neural network (SLFNs) learning algorithm, which was proposed by huang guang bin in 2004. In the extreme learning machine, the input weight connecting the input layer and the hidden layer and the bias value of the hidden layer are randomly selected, and the output weight connecting the hidden layer and the output layer is analyzed and determined by a generalized inverse method. The ELM gives up the gradient descent algorithm, tries to adopt the idea of least square method to solve the optimal neural network, and has achieved great success. However, the conventional extreme learning machine cannot reflect the value of the predicted output value to the network structure, and only depends on the input information to perform calculation in the learning process. Therefore, the traditional extreme learning machine is improved by using the idea of Kalman filtering, the feedback extreme learning machine is obtained, and estimation prediction and filling are better performed on missing attributes in an incomplete data set.
The core idea of the feedback type extreme learning machine is as follows: and the error between the predicted output and the actual output is utilized to achieve the purpose of reasonably adjusting the missing attribute filling, so that the filling value is more reasonable, and the clustering effectiveness is improved. As shown in fig. 1, a feedback type extreme learning machine model is shown.
As shown in fig. 1, the FELM network is composed of an input layer, a hidden layer, and an output layer. Each circle represents a node. The processing and calculation of the data will be performed by each node of the hidden and output layers, the specific number of nodes of the hidden layer will be determined experimentally.
2. Fuzzy C-means (FCM) clustering algorithm
The fuzzy C-means clustering algorithm (Bezdek, 1981) is to put the feature space X ═ X (X)1,x2,…,xn) The characteristic points in the cluster are classified into c types (c is more than 1 and less than or equal to n), and the clustering center V is { V ═ V1,v2,…vcH, the cluster center of the j-th class is vj∈RsRepresents, wherein arbitrary data points xj∈RsMembership of class j of uijDenotes xjDegree of membership to class j. And u isijThe following conditions are satisfied:
uik∈[O,1],i=1,2,…,c;k=1,2,…,n; (11)
the objective function is defined as follows:
wherein x isk=[x1k,x2k,…,xsk]TIs the kth data sample, xjkIs xkThe jth attribute value of (a); v. ofiIs the ith cluster center; m (m > 1) is an exponential weight which influences the fuzzification degree of the membership matrix; i | · | purple wind2Representing the euclidean distance.
The updating formula of the cluster center and the membership is as follows:
under the constraint of equation (12), alternating iterations U and V minimize equation (14).
Secondly, the implementation process of the invention:
1) calculating and selecting data attributes with higher correlation by adopting mutual information, and selecting complete data in incomplete data as a training sample of the FELM network according to the attributes;
wherein, muX(x) An edge probability density function representing a variable X; mu.sY(Y) an edge probability density function representing the variable Y; mu.sXY(x, y) represents the joint probability density function between the variables.
2) And determining parameters of the FELM network. Initializing the input weight ω and the offset value b. Setting the initialization values of omega and b between intervals < -1,1 >, randomly selecting any random number in the interval to initialize the network, and determining the number of hidden layer nodes of the extreme learning machine;
3) pre-filling the missing attribute according to a nearest neighbor rule, and adjusting a pre-filling value according to an error obtained by training the FELM network by the training sample until a reasonable numerical value is found for filling, so as to obtain a recovered complete data set;
4) initializing parameters of a fuzzy C-means algorithm, clustering number C, fuzzy coefficient m, threshold epsilon and membership degree partition matrix U(0);
5) Clustering the recovered complete data set by using the fuzzy C mean value, and when the iteration number t is equal to l, carrying out U according to the formula (2)(l-1)Calculating V(l)According to the formulae (3) and V(l)Updating U(l)If, ifAlgorithm terminalStopping; otherwise, continuing to iteratively update the membership grade division matrix and the clustering center, wherein l is l + 1.
And (3) an error retrieval algorithm: assuming that the initial estimation value obtained by using k nearest neighbor rule for missing attribute is Ek, the average error value obtained by using ELM for training sample is EkIf the output value obtained by ELM learning prediction for the data containing the missing attribute is Y and the real value of the data is Y, the error e is obtained0Y-Y, calculatingAdjusting the fill value of the missing attribute:
(1) if E < 0, then re-adjust the fill value E for the missing attributenew=Ek+ ρ e, i.e., increasing the value with a certain probability, and then performing ELM learning as an input, where ρ ∈ [0, 1]]Is randomly selected according to a random function;
(2) if it isThen the fill value E of the missing attribute is readjustednew=Ek-pe, then as input to ELM learning;
(3) if it isThen it is indicated that the value predicted by ELM is close to the true value and acceptable, so the value is used as the filling of missing attribute of the incomplete data set;
thirdly, the missing data fuzzy clustering algorithm of the information feedback limit learning machine optimized estimation is used for analyzing the strip steel data, and the industrial production reform is strengthened through the analysis result, and the method comprises the following specific steps:
1. collecting experimental data: strip data is data collected from a steel mill in China at a certain time of day, and the data set comprises 983 data samples. From this collected data sample, the following attributes are extracted: the rolling force of the rolling frame, the size of the roll gap between the rolling rolls, the roll gap difference between the rolling rolls, the inlet temperature, the outlet temperature, the rolling current, the rolling speed and the SONY value. Wherein, the attributes have different close relations with the predicted thickness of the strip steel outlet. These attribute values are used as inputs to the FELM network. Fig. 3 is a signal acquisition plot of data (with the vertical axis representing parameter values and the horizontal axis representing acquisition data time values).
2. And (3) analyzing an experimental result: the experimental data is processed manually to generate a rolling data set of random missing data, and then a training sample set is selected for each missing attribute. In order to illustrate the effectiveness of the incomplete data set fuzzy clustering algorithm of the information feedback limit learning machine optimized estimation, the experimental result of the algorithm is compared with a classical processing algorithm: and comparing results by using a mean value estimation method, a zero filling method, a k neighbor estimation method and an MBP-FCM algorithm. Comparing estimation deviations under different algorithms and different loss ratios, and measuring by three indexes: mean absolute deviation ABS, mean deviation Bias, and mean deviation root mean square RMSE between the true and estimated values. The smaller their values, the higher the accuracy of the estimates. As can be seen from tables 1 and 2, the algorithm provided by the invention has better estimation accuracy compared with the other four comparison algorithms, and the estimation effect is closer to the original data. At different miss ratios, as the number of miss values increases, the bias of the padding also increases with the difference. FIG. 4 is a graph depicting the variation trend of the number of iterations of the strip steel data set and the algorithm objective function of the FELM-FCM algorithm under four deficiency ratios. Fig. 4 shows that the function value of the algorithm proposed by the present invention floats obviously in the initial stage, and after several times of iterative optimization, the algorithm tends to a stable convergence state.
TABLE 1 comparison of missing strip data set estimate deviations under different algorithms
TABLE 2 comparison of the estimated deviations of missing strip data sets for different miss ratios
Claims (3)
1. The application of the missing data fuzzy clustering algorithm of the information feedback limit learning machine optimization estimation in strip steel data clustering statistics is characterized by comprising the following processes:
1) collecting experimental data: collecting data collected by the strip steel at a certain period of time as a data sample;
2) the following attributes are extracted from the collected data sample: the rolling force of a rolling frame, the size of a roll gap between rolling rolls, the roll gap difference between the rolling rolls, the inlet temperature, the outlet temperature, the rolling current, the rolling speed and the SONY value;
3) taking the attribute value acquired in the step 2) as a training data set;
4) carrying out normalization processing on the data set; for the order of magnitude of data attribute, all values in the data set are converted into corresponding values in the interval of [0, 1] to eliminate the difference between data;
5) selecting and optimizing training samples; calculating and selecting data attributes with higher correlation by adopting mutual information, and selecting complete data in incomplete data as a training sample of the FELM network according to the attributes;
6) determining parameters of the FELM network; initializing an input weight omega and a bias value b; setting the initialization values of omega and b between intervals < -1,1 >, randomly selecting any random number in the interval to initialize the network, and determining the number of hidden layer nodes of the extreme learning machine;
7) missing attribute evaluation; pre-filling the missing attribute according to a nearest neighbor rule, and adjusting a pre-filling value by adopting an error retrieval method according to an error obtained by training a training sample until a reasonable numerical value is found for filling;
8) cluster analysis of a recovered complete data set using FCM algorithm
The fuzzy clustering algorithm of missing data of the optimization valuation of the information feedback extreme learning machine comprises the following steps:
2.1) calculating by adopting mutual information, selecting data attributes with higher correlation, and selecting complete data in incomplete data as a training sample of the FELM network according to the attributes;
wherein, muX(x) An edge probability density function representing a variable X; mu.sY(Y) an edge probability density function representing the variable Y; mu.sXY(x, y) represents a joint probability density function between variables;
2.2) determining the parameters of the FELM network: initializing an input weight omega and a bias value b; setting the initialization values of omega and b between intervals < -1,1 >, randomly selecting any random number in the interval to initialize the network, and determining the number of hidden layer nodes of the extreme learning machine;
2.3) pre-filling the missing attribute according to a nearest neighbor rule, and adjusting the pre-filling value by adopting an error retrieval method according to the error obtained by training the FELM network by the training sample until a reasonable numerical value is found and filled, thereby obtaining a recovered complete data set;
2.4) initializing parameters of the fuzzy C-means algorithm, the cluster number C, the fuzzy coefficient m, the threshold epsilon and the membership degree partition matrix U(0);
2.5) clustering the recovered complete data set by using the fuzzy C mean value, and when the iteration time t is equal to l, dividing the matrix U according to the formula (2) and the membership degree(l-1)Calculating a clustering center matrix V(l)According to the formulae (3) and V(l)Updating U(l)For a given threshold value ε, ifThe algorithm is terminated; otherwise, continuing to iteratively update the membership degree partition matrix and the clustering center if l is l + 1;
2. the application of the missing data fuzzy clustering algorithm of the information feedback limit learning machine optimized estimation in strip steel data clustering statistics as claimed in claim 1, wherein, in the step 2.3), the missing attributes are pre-filled according to the nearest neighbor rule, and the pre-filled values are adjusted by adopting an error retrieval method according to the error obtained by training the FELM network by the training sample until a reasonable numerical value is found and filled, so that the process of obtaining the recovered complete data set is as follows:
2.3.1) pre-filling the missing attribute according to a nearest neighbor rule, selecting k data nearest to the data sample, calculating an average value of corresponding positions of the k data samples from the corresponding positions of the missing data, and taking the average value as a pre-filling value of the incomplete data;
wherein x isaAnd xbIs x respectivelypaAnd xpbAnd I ispThe satisfied condition is shown in formula (5):
2.3.2) calculating an output matrix of a hidden layer of the FELM network, and calculating an output matrix H of the hidden layer by using formulas (6) to (8);
wherein the content of the first and second substances,the output of the ith hidden layer is shown;is thatAnd xjInner product of (d);expressed is the input weight of the link between the input layer and the hidden layer; beta is aiDescribing the output weight value linked between the hidden layer and the output layer; biIndicating the bias value of the ith hidden layer;
Hβ=T (7)
wherein H is the output of the hidden layer node, β is the output weight, and T is the desired weight;
2.3.3) calculating the output weight of the FELM network, and calculating the output weight by using the obtained output matrix H and the expected output value according to a formula (9);
wherein the content of the first and second substances,is the Moore-penrose generalized inverse of H,is minimal and unique;
2.3.4) obtaining the error between the actual output value and the real output value, and feeding back the error, wherein the predicted value output by the extreme learning machine is assumed to be Y, the actual predicted value is Y, and the error is e0;
e0=Y-y (10)
2.3.5) judging the size relation between the obtained error and the error obtained by the training sample, if the iteration stop requirement is met, filling the missing attribute, otherwise, receiving the error, readjusting the pre-filling value, and returning to the step 2.3.1).
3. The application of the missing data fuzzy clustering algorithm of the information feedback limit learning machine optimized estimation in the strip steel data clustering statistics as claimed in claim 2 is characterized in that the error retrieval method comprises the following specific processes:
assume that the initial estimate using the k-nearest neighbor rule for the missing attribute is EkUsing the FELM network to derive the mean error value of the training samples asIf the output value obtained by performing the FELM learning prediction on the data containing the missing attribute is Y, and the real value of the data is Y, the error e is obtained0Y-Y, calculatingAdjusting the fill value of the missing attribute:
4.1) if E < 0, then readjust the fill value E of the missing attributenew=Ek+ ρ e, i.e., increasing the value with a certain probability, and then performing FELM learning as an input, where ρ ∈ [0, 1]]Is randomly selected according to a random function;
4.2) ifThen the fill value E of the missing attribute is readjustednew=Ek-pe, then as input to FELM learning;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710992778.9A CN107729943B (en) | 2017-10-23 | 2017-10-23 | Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710992778.9A CN107729943B (en) | 2017-10-23 | 2017-10-23 | Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107729943A CN107729943A (en) | 2018-02-23 |
CN107729943B true CN107729943B (en) | 2021-11-30 |
Family
ID=61212371
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710992778.9A Active CN107729943B (en) | 2017-10-23 | 2017-10-23 | Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107729943B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019218263A1 (en) * | 2018-05-16 | 2019-11-21 | 深圳大学 | Extreme learning machine-based extreme ts fuzzy inference method and system |
CN109102021A (en) * | 2018-08-10 | 2018-12-28 | 聚时科技(上海)有限公司 | The mutual polishing multicore k- mean cluster machine learning method of core under deletion condition |
CN109214429B (en) * | 2018-08-14 | 2021-07-27 | 聚时科技(上海)有限公司 | Local deletion multi-view clustering machine learning method based on matrix-guided regularization |
CN109195110B (en) * | 2018-08-23 | 2020-12-15 | 南京邮电大学 | Indoor positioning method based on hierarchical clustering technology and online extreme learning machine |
CN109783481A (en) * | 2018-12-19 | 2019-05-21 | 新华三大数据技术有限公司 | Data processing method and device |
CN109948715B (en) * | 2019-03-22 | 2021-07-02 | 杭州电子科技大学 | Water quality monitoring data missing value filling method |
CN110110447B (en) * | 2019-05-09 | 2023-04-18 | 辽宁大学 | Method for predicting thickness of strip steel of mixed frog leaping feedback extreme learning machine |
CN110378744A (en) * | 2019-07-25 | 2019-10-25 | 中国民航大学 | Civil aviaton's frequent flight passenger value category method and system towards incomplete data system |
CN112101457B (en) * | 2020-09-15 | 2023-11-17 | 湖南科技大学 | PMSM demagnetizing fault diagnosis method based on torque signal fuzzy intelligent learning |
CN114531696A (en) * | 2020-11-23 | 2022-05-24 | 维沃移动通信有限公司 | Method and device for processing partial input missing of AI (Artificial Intelligence) network |
CN112687349A (en) * | 2020-12-25 | 2021-04-20 | 广东海洋大学 | Construction method of model for reducing octane number loss |
CN115423005B (en) * | 2022-08-22 | 2023-10-31 | 江苏大学 | Big data reconstruction method and device for combine harvester |
CN116976230A (en) * | 2023-09-25 | 2023-10-31 | 中国海洋大学 | Chlorophyll remote sensing data reconstruction method based on numerical simulation and deep learning |
CN117435870B (en) * | 2023-12-21 | 2024-03-29 | 国网天津市电力公司营销服务中心 | Load data real-time filling method, system, equipment and medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750286A (en) * | 2011-04-21 | 2012-10-24 | 常州蓝城信息科技有限公司 | Novel decision tree classifier method for processing missing data |
CN104751229A (en) * | 2015-04-13 | 2015-07-01 | 辽宁大学 | Bearing fault diagnosis method capable of recovering missing data of back propagation neural network estimation values |
JP2015184853A (en) * | 2014-03-24 | 2015-10-22 | Kddi株式会社 | Missing data complementing device, missing data complementing method, and program |
CN106127262A (en) * | 2016-06-29 | 2016-11-16 | 海南大学 | The clustering method of one attribute missing data collection |
CN106407464A (en) * | 2016-10-12 | 2017-02-15 | 南京航空航天大学 | KNN-based improved missing data filling algorithm |
CN106971205A (en) * | 2017-04-06 | 2017-07-21 | 哈尔滨理工大学 | A kind of embedded dynamic feature selection method based on k nearest neighbor Mutual Information Estimation |
EP3214874A1 (en) * | 2016-03-01 | 2017-09-06 | Gigaset Communications GmbH | Sensoric system energy limiter with selective attention neural network |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488884B (en) * | 2013-09-12 | 2016-07-13 | 北京航空航天大学 | Degraded data based on wavelet neural network lacks interpolating method |
CN106156260B (en) * | 2015-04-28 | 2020-01-21 | 阿里巴巴集团控股有限公司 | Method and device for repairing missing data |
CN107274016A (en) * | 2017-06-13 | 2017-10-20 | 辽宁大学 | The strip exit thickness Forecasting Methodology of the random symmetrical extreme learning machine of algorithm optimization that leapfrogs |
-
2017
- 2017-10-23 CN CN201710992778.9A patent/CN107729943B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750286A (en) * | 2011-04-21 | 2012-10-24 | 常州蓝城信息科技有限公司 | Novel decision tree classifier method for processing missing data |
JP2015184853A (en) * | 2014-03-24 | 2015-10-22 | Kddi株式会社 | Missing data complementing device, missing data complementing method, and program |
CN104751229A (en) * | 2015-04-13 | 2015-07-01 | 辽宁大学 | Bearing fault diagnosis method capable of recovering missing data of back propagation neural network estimation values |
EP3214874A1 (en) * | 2016-03-01 | 2017-09-06 | Gigaset Communications GmbH | Sensoric system energy limiter with selective attention neural network |
CN106127262A (en) * | 2016-06-29 | 2016-11-16 | 海南大学 | The clustering method of one attribute missing data collection |
CN106407464A (en) * | 2016-10-12 | 2017-02-15 | 南京航空航天大学 | KNN-based improved missing data filling algorithm |
CN106971205A (en) * | 2017-04-06 | 2017-07-21 | 哈尔滨理工大学 | A kind of embedded dynamic feature selection method based on k nearest neighbor Mutual Information Estimation |
Non-Patent Citations (3)
Title |
---|
A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data;Li Zhang 等;《Pattern Analysis and Applications》;20151231;第18卷;377–384 * |
Interval kernel Fuzzy C-Means clustering of incomplete data;Tianhao Li 等;《Neurocomputing》;20170531;第237卷;316–331 * |
一种基于极限学习机的缺失数据填充方法;杨毅 等;《计算机应用与软件》;20161031;第33卷(第10期);243-246 * |
Also Published As
Publication number | Publication date |
---|---|
CN107729943A (en) | 2018-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107729943B (en) | Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof | |
CN108829763B (en) | Deep neural network-based attribute prediction method for film evaluation website users | |
Guo et al. | Research on CBR system based on data mining | |
CN104298778B (en) | A kind of Forecasting Methodology and system of the steel rolling product quality based on correlation rule tree | |
CN111461921B (en) | Load modeling typical user database updating method based on machine learning | |
CN107301328B (en) | Cancer subtype accurate discovery and evolution analysis method based on data flow clustering | |
Xun et al. | Incremental frequent itemsets mining based on frequent pattern tree and multi-scale | |
CN111949892B (en) | Multi-relation perception temporal interaction network prediction method | |
Ruparel et al. | Learning from small data set to build classification model: A survey | |
CN110334847A (en) | Based on the wind power prediction method for improving K-means cluster and support vector machines | |
Jie et al. | Review on the research of K-means clustering algorithm in big data | |
CN114745725B (en) | Resource allocation management system based on edge computing industrial Internet of things | |
Kumar et al. | Comparative analysis of SOM neural network with K-means clustering algorithm | |
CN110110447B (en) | Method for predicting thickness of strip steel of mixed frog leaping feedback extreme learning machine | |
CN111882114A (en) | Short-term traffic flow prediction model construction method and prediction method | |
CN111104601A (en) | Antagonistic multi-feedback-level paired personalized ranking method | |
CN113128124B (en) | Multi-grade C-Mn steel mechanical property prediction method based on improved neural network | |
Haq et al. | Classification of electricity load profile data and the prediction of load demand variability | |
CN110221540A (en) | Continuous-stirring reactor system control method based on Hammerstein model | |
Liu et al. | Wheel hub customization with an interactive artificial immune algorithm | |
Yu | Collaborative filtering recommendation algorithm based on both user and item | |
CN112711912A (en) | Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm | |
Gao et al. | Personalized context-aware collaborative filtering based on neural network and slope one | |
Zhou et al. | Online recommendation based on incremental-input self-organizing map | |
CN115688613A (en) | Carbonate reservoir permeability prediction method based on multi-target mayflies algorithm optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231226 Address after: 905, Building G, Huangjin Times Square, No. 9999 Jingshi Road, Lixia District, Jinan City, Shandong Province, 250000 Patentee after: Zhongchangxing (Shandong) Information Technology Co.,Ltd. Address before: 110136 58 Shenbei New Area Road South, Shenyang, Liaoning. Patentee before: LIAONING University |
|
TR01 | Transfer of patent right |