CN107729943B - Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof - Google Patents

Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof Download PDF

Info

Publication number
CN107729943B
CN107729943B CN201710992778.9A CN201710992778A CN107729943B CN 107729943 B CN107729943 B CN 107729943B CN 201710992778 A CN201710992778 A CN 201710992778A CN 107729943 B CN107729943 B CN 107729943B
Authority
CN
China
Prior art keywords
data
value
felm
missing
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710992778.9A
Other languages
Chinese (zh)
Other versions
CN107729943A (en
Inventor
张利
刘洋
高欣
潘辉
王军
赵中洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongchangxing Shandong Information Technology Co ltd
Original Assignee
Liaoning University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning University filed Critical Liaoning University
Priority to CN201710992778.9A priority Critical patent/CN107729943B/en
Publication of CN107729943A publication Critical patent/CN107729943A/en
Application granted granted Critical
Publication of CN107729943B publication Critical patent/CN107729943B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Abstract

The invention relates to a missing data fuzzy clustering algorithm for optimizing estimation of an information feedback extreme learning machine and application thereof, which comprises the following basic steps: 1) calculating and selecting data attributes with higher correlation by adopting mutual information, and selecting complete data in incomplete data as a training sample of the FELM network according to the attributes; 2) initializing an input weight omega and a bias value b of the FELM network; 3) pre-filling the missing attribute according to a nearest neighbor rule, and adjusting a pre-filling value according to an error obtained by training the FELM network by the training sample until a reasonable numerical value is found for filling, so as to obtain a recovered complete data set; 4) initializing parameters of a fuzzy C-means algorithm, clustering number C, fuzzy coefficient m, threshold epsilon and membership degree partition matrix U(0)(ii) a 5) And obtaining a final clustering result through a membership degree partition matrix U and a clustering center V of the iterative optimization fuzzy C-means algorithm. By the method, more reasonable attribute evaluation can be obtained by fully utilizing the relevance between the data samples and the attributes and the distribution information of the complete data samples and the incomplete data samples, so that the clustering result of the incomplete data set is more accurate.

Description

Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof
Technical Field
The invention relates to a missing data fuzzy clustering algorithm for optimizing estimation of an information feedback extreme learning machine and application thereof, belonging to an industrial informatization technology.
Background
Steel is an indispensable important material for construction and quartification realization in China, the steel industry is the basis of national development, and the steel industry keeps steady and high-speed development for more than sixty years after being established, so that an industrial strip steel technical system is established. At present, China is in an important stage of industrial development, and the demand of steel is still huge. For the steel industry, it faces a very large market space. The method has practical significance on how to creatively reform, reduce and produce high-quality, high-benefit and high-level steel with low carbon for the existing strip steel production line. At the present stage, informatization is a strategic measure covering the modernization overall situation, and the steel industry needs to fully combine the informatization technology to further innovate and modify, fully integrate the advanced technology of the informatization industry in the steel rolling process and fully realize the industrial informatization collaborative development. Therefore, it is very important to perform cluster analysis on strip steel data and to enhance the industrial production innovation by analysis results.
In recent years, clustering analysis has been adapted to aggregate many different types of data. Has achieved wide application and development in many research fields. It is a significant matter to use the analysis result to adjust the production line by using the mathematical method to determine the relationship between the strip data samples according to the property of the strip data itself and some similarity or difference measure and to perform cluster analysis on the relationship. However, in real production life, the influence of multiple factors is: such as failure of the data acquisition equipment, failure of the storage medium, failure of the transmission medium, omission of human factors or limitation of the detection instrument, etc. The collected data set has an incomplete phenomenon, and the traditional clustering method cannot be directly applied to the incomplete data set. Therefore, it is very important to select a proper way to process the incomplete data, and to analyze the final result and make future industrial plans.
Disclosure of Invention
In order to solve the problems, the invention provides a missing data fuzzy clustering algorithm for optimizing the estimation of the information feedback limit learning machine, and the missing data fuzzy clustering algorithm is applied to the analysis of strip steel data, and the industrial production reform is strengthened through the analysis result.
The invention is realized by the following technical scheme: the fuzzy clustering algorithm of the missing data of the optimized estimation of the information feedback extreme learning machine is characterized by comprising the following steps:
1) calculating and selecting data attributes with higher correlation by adopting mutual information, and selecting complete data in incomplete data as a training sample of the FELM network according to the attributes;
Figure BDA0001441786190000021
wherein, muX(x) An edge probability density function representing a variable X; mu.sY(Y) an edge probability density function representing the variable Y; mu.sXY(x, y) represents a joint probability density function between variables;
2) and (3) determining parameters of the FELM network: initializing an input weight omega and a bias value b; setting the initialization values of omega and b between intervals < -1,1 >, randomly selecting any random number in the interval to initialize the network, and determining the number of hidden layer nodes of the extreme learning machine;
3) pre-filling the missing attribute according to a nearest neighbor rule, and adjusting the pre-filling value by adopting an error retrieval method according to an error obtained by training the FELM network by a training sample until a reasonable numerical value is found and filled, thereby obtaining a recovered complete data set;
4) initializing parameters of a fuzzy C-means algorithm, clustering number C, fuzzy coefficient m, threshold epsilon and membership degree partition matrix U(0)
5) Clustering the recovered complete data set by using the fuzzy C mean value, and dividing the matrix U according to the formula (2) and the membership degree when the iteration number t is equal to l(l-1)Calculating a clustering center matrix V(l)According to the formulae (3) and V(l)Updating U(l)For a given threshold value ε, if
Figure BDA0001441786190000031
The algorithm is terminated; otherwise, continuing to iteratively update the membership grade division matrix and the clustering center, wherein l is l + 1.
Figure BDA0001441786190000032
Figure BDA0001441786190000033
The step 3) pre-fills the missing attribute according to the nearest neighbor rule, and adjusts the pre-filling value by adopting an error retrieval method according to the error obtained by training the FELM network by the training sample until a reasonable numerical value is found and filled, and then the process of obtaining the recovered complete data set is as follows:
1) and pre-filling the missing attribute according to a nearest neighbor rule, selecting k data closest to the data sample, calculating the average value of the corresponding positions of the k data samples from the corresponding positions of the missing data, and taking the average value as a pre-filling value of the incomplete data.
Figure BDA0001441786190000034
Wherein x isaAnd xbIs x respectivelyiaAnd xibAnd I isiThe satisfied condition is shown in formula (5):
Figure BDA0001441786190000041
2) calculating an output matrix H of the hidden layer of the FELM network by using a formula (6-8);
Figure BDA0001441786190000042
wherein the content of the first and second substances,
Figure BDA0001441786190000043
the output of the ith hidden layer is shown;
Figure BDA0001441786190000044
is that
Figure BDA0001441786190000045
And xjInner product of (d);
Figure BDA0001441786190000046
expressed is the input weight of the link between the input layer and the hidden layer; beta is aiDescribing the output weight value linked between the hidden layer and the output layer; biThe bias value of the jth hidden layer is indicated.
Hβ=T (7)
Where H is the output of the hidden layer node, β is the output weight, and T is the desired weight.
Figure BDA0001441786190000047
2) Calculating the output weight of the FELM network, and calculating the output weight by using the obtained output matrix H and the expected output value according to a formula (9);
Figure BDA0001441786190000048
wherein the content of the first and second substances,
Figure BDA0001441786190000049
is the Moore-penrose generalized inverse of H,
Figure BDA00014417861900000410
is the smallest and unique.
3) Obtaining the error between the actual output value and the real output value, feeding back the error, and assuming that the predicted value output by the extreme learning machine is Y, the actual value is Y, and the error is e0
e0=Y-y (10)
4) And judging the magnitude relation between the obtained error and the obtained error of the training sample, if the magnitude relation meets the iteration stop requirement, filling the missing attribute, otherwise, receiving the error, readjusting the pre-filling value, and returning to the step 1).
The error retrieval method comprises the following specific processes:
assume that the initial estimate using the k-nearest neighbor rule for the missing attribute is EkUsing the FELM network to derive the mean error value of the training samples as
Figure BDA0001441786190000051
If the output value obtained by performing the FELM learning prediction on the data containing the missing attribute is Y, and the real value of the data is Y, the error e is obtained0Y-Y, calculating
Figure BDA0001441786190000052
Adjusting the fill value of the missing attribute:
1) if E < 0, then re-adjust the fill value E for the missing attributenew=Ek+ ρ e, i.e., increasing the value with a certain probability, and then performing FELM learning as an input, where ρ ∈ [0, 1]]Is randomly selected according to a random function;
2) if it is
Figure BDA0001441786190000053
Then the fill value E of the missing attribute is readjustednew=Ek-pe, then as input to FELM learning;
3) if it is
Figure BDA0001441786190000054
It is said that the value predicted by the FELM network, which is close to the true value, is acceptable, so this value is used as a fill-in for the missing attributes of the incomplete data set.
The application of the missing data fuzzy clustering algorithm of the information feedback limit learning machine optimization estimation in strip steel data clustering statistics comprises the following processes:
1) collecting experimental data: collecting data collected by the strip steel at a certain period of time as a data sample;
2) the following attributes are extracted from the collected data sample: the rolling force of a rolling frame, the size of a roll gap between rolling rolls, the roll gap difference between the rolling rolls, the inlet temperature, the outlet temperature, the rolling current, the rolling speed and the SONY value;
3) taking the attribute value acquired in the step 2) as a training data set;
4) and carrying out normalization processing on the data set. For reasons such as data attribute magnitude, all values in a data set are converted to corresponding values in a [0, 1] interval to eliminate differences among data;
5) training samples are selected and optimized. And calculating and selecting data attributes with higher correlation by adopting mutual information, and selecting complete data in incomplete data as a training sample of the FELM network according to the attributes.
6) And determining parameters of the FELM network. Initializing the input weight ω and the offset value b. Setting the initialization values of omega and b between intervals < -1,1 >, randomly selecting any random number in the interval to initialize the network, and determining the number of hidden layer nodes of the extreme learning machine;
7) missing attribute evaluation. Pre-filling the missing attribute according to a nearest neighbor rule, and adjusting a pre-filling value by adopting an error retrieval method according to an error obtained by training a training sample until a reasonable numerical value is found for filling;
8) the FCM algorithm is used to perform cluster analysis on the recovered complete data set.
The invention has the beneficial effects that: traditional solutions either consider only inter-data associations or rely on inter-attribute associations. The method combines internal and external relations (namely, the relation between data and attributes), uses the FELM network to realize the optimized estimation of the missing value of the data, and then carries out corresponding fuzzy clustering analysis on the data set after the optimization is complete. And calculating the correlation between the sample attributes by using the mutual information, thereby providing a theoretical base pad for the selection of the training sample. And selecting a plurality of nearest neighbors adjacent to the incomplete data by using a nearest neighbor rule based on the local distance, and preparing a pre-filling value for each data missing value, wherein the pre-filling value is iteratively used by the FELM network. A plurality of errors (difference between real output and expected output) are obtained through a training sample set, and the average error of the errors is obtained. In response to this adjustment criterion, an error search is used to continually increase or decrease the difference optimization adjustment estimate. And repeating the steps, and harvesting the optimal estimated numerical value of the missing value to fulfill the aim of reasonably and efficiently perfecting the incomplete data set.
Drawings
Fig. 1 is a topological structure diagram of a feedback type extreme learning machine.
Fig. 2 is a flow chart of the algorithm of the present invention.
Fig. 3 is a signal acquisition diagram of strip rolling data.
Fig. 4 is a graph of the change between the number of iterations of the strip rolling data set and the objective function.
Detailed Description
The invention is based on the following theory:
1. information feedback limit learning machine
Extreme Learning Machines (ELMs) are a new type of single hidden layer feedforward neural network (SLFNs) learning algorithm, which was proposed by huang guang bin in 2004. In the extreme learning machine, the input weight connecting the input layer and the hidden layer and the bias value of the hidden layer are randomly selected, and the output weight connecting the hidden layer and the output layer is analyzed and determined by a generalized inverse method. The ELM gives up the gradient descent algorithm, tries to adopt the idea of least square method to solve the optimal neural network, and has achieved great success. However, the conventional extreme learning machine cannot reflect the value of the predicted output value to the network structure, and only depends on the input information to perform calculation in the learning process. Therefore, the traditional extreme learning machine is improved by using the idea of Kalman filtering, the feedback extreme learning machine is obtained, and estimation prediction and filling are better performed on missing attributes in an incomplete data set.
The core idea of the feedback type extreme learning machine is as follows: and the error between the predicted output and the actual output is utilized to achieve the purpose of reasonably adjusting the missing attribute filling, so that the filling value is more reasonable, and the clustering effectiveness is improved. As shown in fig. 1, a feedback type extreme learning machine model is shown.
As shown in fig. 1, the FELM network is composed of an input layer, a hidden layer, and an output layer. Each circle represents a node. The processing and calculation of the data will be performed by each node of the hidden and output layers, the specific number of nodes of the hidden layer will be determined experimentally.
2. Fuzzy C-means (FCM) clustering algorithm
The fuzzy C-means clustering algorithm (Bezdek, 1981) is to put the feature space X ═ X (X)1,x2,…,xn) The characteristic points in the cluster are classified into c types (c is more than 1 and less than or equal to n), and the clustering center V is { V ═ V1,v2,…vcH, the cluster center of the j-th class is vj∈RsRepresents, wherein arbitrary data points xj∈RsMembership of class j of uijDenotes xjDegree of membership to class j. And u isijThe following conditions are satisfied:
uik∈[O,1],i=1,2,…,c;k=1,2,…,n; (11)
Figure BDA0001441786190000081
Figure BDA0001441786190000082
the objective function is defined as follows:
Figure BDA0001441786190000083
wherein x isk=[x1k,x2k,…,xsk]TIs the kth data sample, xjkIs xkThe jth attribute value of (a); v. ofiIs the ith cluster center; m (m > 1) is an exponential weight which influences the fuzzification degree of the membership matrix; i | · | purple wind2Representing the euclidean distance.
The updating formula of the cluster center and the membership is as follows:
Figure BDA0001441786190000091
Figure BDA0001441786190000092
under the constraint of equation (12), alternating iterations U and V minimize equation (14).
Secondly, the implementation process of the invention:
1) calculating and selecting data attributes with higher correlation by adopting mutual information, and selecting complete data in incomplete data as a training sample of the FELM network according to the attributes;
Figure BDA0001441786190000093
wherein, muX(x) An edge probability density function representing a variable X; mu.sY(Y) an edge probability density function representing the variable Y; mu.sXY(x, y) represents the joint probability density function between the variables.
2) And determining parameters of the FELM network. Initializing the input weight ω and the offset value b. Setting the initialization values of omega and b between intervals < -1,1 >, randomly selecting any random number in the interval to initialize the network, and determining the number of hidden layer nodes of the extreme learning machine;
3) pre-filling the missing attribute according to a nearest neighbor rule, and adjusting a pre-filling value according to an error obtained by training the FELM network by the training sample until a reasonable numerical value is found for filling, so as to obtain a recovered complete data set;
4) initializing parameters of a fuzzy C-means algorithm, clustering number C, fuzzy coefficient m, threshold epsilon and membership degree partition matrix U(0)
5) Clustering the recovered complete data set by using the fuzzy C mean value, and when the iteration number t is equal to l, carrying out U according to the formula (2)(l-1)Calculating V(l)According to the formulae (3) and V(l)Updating U(l)If, if
Figure BDA0001441786190000101
Algorithm terminalStopping; otherwise, continuing to iteratively update the membership grade division matrix and the clustering center, wherein l is l + 1.
Figure BDA0001441786190000102
Figure BDA0001441786190000103
And (3) an error retrieval algorithm: assuming that the initial estimation value obtained by using k nearest neighbor rule for missing attribute is Ek, the average error value obtained by using ELM for training sample is Ek
Figure BDA0001441786190000104
If the output value obtained by ELM learning prediction for the data containing the missing attribute is Y and the real value of the data is Y, the error e is obtained0Y-Y, calculating
Figure BDA0001441786190000105
Adjusting the fill value of the missing attribute:
(1) if E < 0, then re-adjust the fill value E for the missing attributenew=Ek+ ρ e, i.e., increasing the value with a certain probability, and then performing ELM learning as an input, where ρ ∈ [0, 1]]Is randomly selected according to a random function;
(2) if it is
Figure BDA0001441786190000106
Then the fill value E of the missing attribute is readjustednew=Ek-pe, then as input to ELM learning;
(3) if it is
Figure BDA0001441786190000107
Then it is indicated that the value predicted by ELM is close to the true value and acceptable, so the value is used as the filling of missing attribute of the incomplete data set;
thirdly, the missing data fuzzy clustering algorithm of the information feedback limit learning machine optimized estimation is used for analyzing the strip steel data, and the industrial production reform is strengthened through the analysis result, and the method comprises the following specific steps:
1. collecting experimental data: strip data is data collected from a steel mill in China at a certain time of day, and the data set comprises 983 data samples. From this collected data sample, the following attributes are extracted: the rolling force of the rolling frame, the size of the roll gap between the rolling rolls, the roll gap difference between the rolling rolls, the inlet temperature, the outlet temperature, the rolling current, the rolling speed and the SONY value. Wherein, the attributes have different close relations with the predicted thickness of the strip steel outlet. These attribute values are used as inputs to the FELM network. Fig. 3 is a signal acquisition plot of data (with the vertical axis representing parameter values and the horizontal axis representing acquisition data time values).
2. And (3) analyzing an experimental result: the experimental data is processed manually to generate a rolling data set of random missing data, and then a training sample set is selected for each missing attribute. In order to illustrate the effectiveness of the incomplete data set fuzzy clustering algorithm of the information feedback limit learning machine optimized estimation, the experimental result of the algorithm is compared with a classical processing algorithm: and comparing results by using a mean value estimation method, a zero filling method, a k neighbor estimation method and an MBP-FCM algorithm. Comparing estimation deviations under different algorithms and different loss ratios, and measuring by three indexes: mean absolute deviation ABS, mean deviation Bias, and mean deviation root mean square RMSE between the true and estimated values. The smaller their values, the higher the accuracy of the estimates. As can be seen from tables 1 and 2, the algorithm provided by the invention has better estimation accuracy compared with the other four comparison algorithms, and the estimation effect is closer to the original data. At different miss ratios, as the number of miss values increases, the bias of the padding also increases with the difference. FIG. 4 is a graph depicting the variation trend of the number of iterations of the strip steel data set and the algorithm objective function of the FELM-FCM algorithm under four deficiency ratios. Fig. 4 shows that the function value of the algorithm proposed by the present invention floats obviously in the initial stage, and after several times of iterative optimization, the algorithm tends to a stable convergence state.
TABLE 1 comparison of missing strip data set estimate deviations under different algorithms
Figure BDA0001441786190000121
TABLE 2 comparison of the estimated deviations of missing strip data sets for different miss ratios
Figure BDA0001441786190000122

Claims (3)

1. The application of the missing data fuzzy clustering algorithm of the information feedback limit learning machine optimization estimation in strip steel data clustering statistics is characterized by comprising the following processes:
1) collecting experimental data: collecting data collected by the strip steel at a certain period of time as a data sample;
2) the following attributes are extracted from the collected data sample: the rolling force of a rolling frame, the size of a roll gap between rolling rolls, the roll gap difference between the rolling rolls, the inlet temperature, the outlet temperature, the rolling current, the rolling speed and the SONY value;
3) taking the attribute value acquired in the step 2) as a training data set;
4) carrying out normalization processing on the data set; for the order of magnitude of data attribute, all values in the data set are converted into corresponding values in the interval of [0, 1] to eliminate the difference between data;
5) selecting and optimizing training samples; calculating and selecting data attributes with higher correlation by adopting mutual information, and selecting complete data in incomplete data as a training sample of the FELM network according to the attributes;
6) determining parameters of the FELM network; initializing an input weight omega and a bias value b; setting the initialization values of omega and b between intervals < -1,1 >, randomly selecting any random number in the interval to initialize the network, and determining the number of hidden layer nodes of the extreme learning machine;
7) missing attribute evaluation; pre-filling the missing attribute according to a nearest neighbor rule, and adjusting a pre-filling value by adopting an error retrieval method according to an error obtained by training a training sample until a reasonable numerical value is found for filling;
8) cluster analysis of a recovered complete data set using FCM algorithm
The fuzzy clustering algorithm of missing data of the optimization valuation of the information feedback extreme learning machine comprises the following steps:
2.1) calculating by adopting mutual information, selecting data attributes with higher correlation, and selecting complete data in incomplete data as a training sample of the FELM network according to the attributes;
Figure FDA0003251778320000021
wherein, muX(x) An edge probability density function representing a variable X; mu.sY(Y) an edge probability density function representing the variable Y; mu.sXY(x, y) represents a joint probability density function between variables;
2.2) determining the parameters of the FELM network: initializing an input weight omega and a bias value b; setting the initialization values of omega and b between intervals < -1,1 >, randomly selecting any random number in the interval to initialize the network, and determining the number of hidden layer nodes of the extreme learning machine;
2.3) pre-filling the missing attribute according to a nearest neighbor rule, and adjusting the pre-filling value by adopting an error retrieval method according to the error obtained by training the FELM network by the training sample until a reasonable numerical value is found and filled, thereby obtaining a recovered complete data set;
2.4) initializing parameters of the fuzzy C-means algorithm, the cluster number C, the fuzzy coefficient m, the threshold epsilon and the membership degree partition matrix U(0)
2.5) clustering the recovered complete data set by using the fuzzy C mean value, and when the iteration time t is equal to l, dividing the matrix U according to the formula (2) and the membership degree(l-1)Calculating a clustering center matrix V(l)According to the formulae (3) and V(l)Updating U(l)For a given threshold value ε, if
Figure FDA0003251778320000022
The algorithm is terminated; otherwise, continuing to iteratively update the membership degree partition matrix and the clustering center if l is l + 1;
Figure FDA0003251778320000031
Figure FDA0003251778320000032
2. the application of the missing data fuzzy clustering algorithm of the information feedback limit learning machine optimized estimation in strip steel data clustering statistics as claimed in claim 1, wherein, in the step 2.3), the missing attributes are pre-filled according to the nearest neighbor rule, and the pre-filled values are adjusted by adopting an error retrieval method according to the error obtained by training the FELM network by the training sample until a reasonable numerical value is found and filled, so that the process of obtaining the recovered complete data set is as follows:
2.3.1) pre-filling the missing attribute according to a nearest neighbor rule, selecting k data nearest to the data sample, calculating an average value of corresponding positions of the k data samples from the corresponding positions of the missing data, and taking the average value as a pre-filling value of the incomplete data;
Figure FDA0003251778320000033
wherein x isaAnd xbIs x respectivelypaAnd xpbAnd I ispThe satisfied condition is shown in formula (5):
Figure FDA0003251778320000034
2.3.2) calculating an output matrix of a hidden layer of the FELM network, and calculating an output matrix H of the hidden layer by using formulas (6) to (8);
Figure FDA0003251778320000035
wherein the content of the first and second substances,
Figure FDA0003251778320000036
the output of the ith hidden layer is shown;
Figure FDA0003251778320000037
is that
Figure FDA0003251778320000038
And xjInner product of (d);
Figure FDA0003251778320000039
expressed is the input weight of the link between the input layer and the hidden layer; beta is aiDescribing the output weight value linked between the hidden layer and the output layer; biIndicating the bias value of the ith hidden layer;
Hβ=T (7)
wherein H is the output of the hidden layer node, β is the output weight, and T is the desired weight;
Figure FDA0003251778320000041
2.3.3) calculating the output weight of the FELM network, and calculating the output weight by using the obtained output matrix H and the expected output value according to a formula (9);
Figure FDA0003251778320000042
wherein the content of the first and second substances,
Figure FDA0003251778320000043
is the Moore-penrose generalized inverse of H,
Figure FDA0003251778320000044
is minimal and unique;
2.3.4) obtaining the error between the actual output value and the real output value, and feeding back the error, wherein the predicted value output by the extreme learning machine is assumed to be Y, the actual predicted value is Y, and the error is e0
e0=Y-y (10)
2.3.5) judging the size relation between the obtained error and the error obtained by the training sample, if the iteration stop requirement is met, filling the missing attribute, otherwise, receiving the error, readjusting the pre-filling value, and returning to the step 2.3.1).
3. The application of the missing data fuzzy clustering algorithm of the information feedback limit learning machine optimized estimation in the strip steel data clustering statistics as claimed in claim 2 is characterized in that the error retrieval method comprises the following specific processes:
assume that the initial estimate using the k-nearest neighbor rule for the missing attribute is EkUsing the FELM network to derive the mean error value of the training samples as
Figure FDA0003251778320000054
If the output value obtained by performing the FELM learning prediction on the data containing the missing attribute is Y, and the real value of the data is Y, the error e is obtained0Y-Y, calculating
Figure FDA0003251778320000051
Adjusting the fill value of the missing attribute:
4.1) if E < 0, then readjust the fill value E of the missing attributenew=Ek+ ρ e, i.e., increasing the value with a certain probability, and then performing FELM learning as an input, where ρ ∈ [0, 1]]Is randomly selected according to a random function;
4.2) if
Figure FDA0003251778320000052
Then the fill value E of the missing attribute is readjustednew=Ek-pe, then as input to FELM learning;
4.3) if
Figure FDA0003251778320000053
It is said that the values predicted by the FELM network, which are close to the true values, are acceptable, so the predicted values are used as the filling of missing attributes of the incomplete data set.
CN201710992778.9A 2017-10-23 2017-10-23 Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof Active CN107729943B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710992778.9A CN107729943B (en) 2017-10-23 2017-10-23 Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710992778.9A CN107729943B (en) 2017-10-23 2017-10-23 Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof

Publications (2)

Publication Number Publication Date
CN107729943A CN107729943A (en) 2018-02-23
CN107729943B true CN107729943B (en) 2021-11-30

Family

ID=61212371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710992778.9A Active CN107729943B (en) 2017-10-23 2017-10-23 Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof

Country Status (1)

Country Link
CN (1) CN107729943B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019218263A1 (en) * 2018-05-16 2019-11-21 深圳大学 Extreme learning machine-based extreme ts fuzzy inference method and system
CN109102021A (en) * 2018-08-10 2018-12-28 聚时科技(上海)有限公司 The mutual polishing multicore k- mean cluster machine learning method of core under deletion condition
CN109214429B (en) * 2018-08-14 2021-07-27 聚时科技(上海)有限公司 Local deletion multi-view clustering machine learning method based on matrix-guided regularization
CN109195110B (en) * 2018-08-23 2020-12-15 南京邮电大学 Indoor positioning method based on hierarchical clustering technology and online extreme learning machine
CN109783481A (en) * 2018-12-19 2019-05-21 新华三大数据技术有限公司 Data processing method and device
CN109948715B (en) * 2019-03-22 2021-07-02 杭州电子科技大学 Water quality monitoring data missing value filling method
CN110110447B (en) * 2019-05-09 2023-04-18 辽宁大学 Method for predicting thickness of strip steel of mixed frog leaping feedback extreme learning machine
CN110378744A (en) * 2019-07-25 2019-10-25 中国民航大学 Civil aviaton's frequent flight passenger value category method and system towards incomplete data system
CN112101457B (en) * 2020-09-15 2023-11-17 湖南科技大学 PMSM demagnetizing fault diagnosis method based on torque signal fuzzy intelligent learning
CN114531696A (en) * 2020-11-23 2022-05-24 维沃移动通信有限公司 Method and device for processing partial input missing of AI (Artificial Intelligence) network
CN112687349A (en) * 2020-12-25 2021-04-20 广东海洋大学 Construction method of model for reducing octane number loss
CN115423005B (en) * 2022-08-22 2023-10-31 江苏大学 Big data reconstruction method and device for combine harvester
CN116976230A (en) * 2023-09-25 2023-10-31 中国海洋大学 Chlorophyll remote sensing data reconstruction method based on numerical simulation and deep learning
CN117435870B (en) * 2023-12-21 2024-03-29 国网天津市电力公司营销服务中心 Load data real-time filling method, system, equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750286A (en) * 2011-04-21 2012-10-24 常州蓝城信息科技有限公司 Novel decision tree classifier method for processing missing data
CN104751229A (en) * 2015-04-13 2015-07-01 辽宁大学 Bearing fault diagnosis method capable of recovering missing data of back propagation neural network estimation values
JP2015184853A (en) * 2014-03-24 2015-10-22 Kddi株式会社 Missing data complementing device, missing data complementing method, and program
CN106127262A (en) * 2016-06-29 2016-11-16 海南大学 The clustering method of one attribute missing data collection
CN106407464A (en) * 2016-10-12 2017-02-15 南京航空航天大学 KNN-based improved missing data filling algorithm
CN106971205A (en) * 2017-04-06 2017-07-21 哈尔滨理工大学 A kind of embedded dynamic feature selection method based on k nearest neighbor Mutual Information Estimation
EP3214874A1 (en) * 2016-03-01 2017-09-06 Gigaset Communications GmbH Sensoric system energy limiter with selective attention neural network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488884B (en) * 2013-09-12 2016-07-13 北京航空航天大学 Degraded data based on wavelet neural network lacks interpolating method
CN106156260B (en) * 2015-04-28 2020-01-21 阿里巴巴集团控股有限公司 Method and device for repairing missing data
CN107274016A (en) * 2017-06-13 2017-10-20 辽宁大学 The strip exit thickness Forecasting Methodology of the random symmetrical extreme learning machine of algorithm optimization that leapfrogs

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750286A (en) * 2011-04-21 2012-10-24 常州蓝城信息科技有限公司 Novel decision tree classifier method for processing missing data
JP2015184853A (en) * 2014-03-24 2015-10-22 Kddi株式会社 Missing data complementing device, missing data complementing method, and program
CN104751229A (en) * 2015-04-13 2015-07-01 辽宁大学 Bearing fault diagnosis method capable of recovering missing data of back propagation neural network estimation values
EP3214874A1 (en) * 2016-03-01 2017-09-06 Gigaset Communications GmbH Sensoric system energy limiter with selective attention neural network
CN106127262A (en) * 2016-06-29 2016-11-16 海南大学 The clustering method of one attribute missing data collection
CN106407464A (en) * 2016-10-12 2017-02-15 南京航空航天大学 KNN-based improved missing data filling algorithm
CN106971205A (en) * 2017-04-06 2017-07-21 哈尔滨理工大学 A kind of embedded dynamic feature selection method based on k nearest neighbor Mutual Information Estimation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data;Li Zhang 等;《Pattern Analysis and Applications》;20151231;第18卷;377–384 *
Interval kernel Fuzzy C-Means clustering of incomplete data;Tianhao Li 等;《Neurocomputing》;20170531;第237卷;316–331 *
一种基于极限学习机的缺失数据填充方法;杨毅 等;《计算机应用与软件》;20161031;第33卷(第10期);243-246 *

Also Published As

Publication number Publication date
CN107729943A (en) 2018-02-23

Similar Documents

Publication Publication Date Title
CN107729943B (en) Missing data fuzzy clustering algorithm for optimizing estimated value of information feedback extreme learning machine and application thereof
CN108829763B (en) Deep neural network-based attribute prediction method for film evaluation website users
Guo et al. Research on CBR system based on data mining
CN104298778B (en) A kind of Forecasting Methodology and system of the steel rolling product quality based on correlation rule tree
CN111461921B (en) Load modeling typical user database updating method based on machine learning
CN107301328B (en) Cancer subtype accurate discovery and evolution analysis method based on data flow clustering
Xun et al. Incremental frequent itemsets mining based on frequent pattern tree and multi-scale
CN111949892B (en) Multi-relation perception temporal interaction network prediction method
Ruparel et al. Learning from small data set to build classification model: A survey
CN110334847A (en) Based on the wind power prediction method for improving K-means cluster and support vector machines
Jie et al. Review on the research of K-means clustering algorithm in big data
CN114745725B (en) Resource allocation management system based on edge computing industrial Internet of things
Kumar et al. Comparative analysis of SOM neural network with K-means clustering algorithm
CN110110447B (en) Method for predicting thickness of strip steel of mixed frog leaping feedback extreme learning machine
CN111882114A (en) Short-term traffic flow prediction model construction method and prediction method
CN111104601A (en) Antagonistic multi-feedback-level paired personalized ranking method
CN113128124B (en) Multi-grade C-Mn steel mechanical property prediction method based on improved neural network
Haq et al. Classification of electricity load profile data and the prediction of load demand variability
CN110221540A (en) Continuous-stirring reactor system control method based on Hammerstein model
Liu et al. Wheel hub customization with an interactive artificial immune algorithm
Yu Collaborative filtering recommendation algorithm based on both user and item
CN112711912A (en) Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm
Gao et al. Personalized context-aware collaborative filtering based on neural network and slope one
Zhou et al. Online recommendation based on incremental-input self-organizing map
CN115688613A (en) Carbonate reservoir permeability prediction method based on multi-target mayflies algorithm optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231226

Address after: 905, Building G, Huangjin Times Square, No. 9999 Jingshi Road, Lixia District, Jinan City, Shandong Province, 250000

Patentee after: Zhongchangxing (Shandong) Information Technology Co.,Ltd.

Address before: 110136 58 Shenbei New Area Road South, Shenyang, Liaoning.

Patentee before: LIAONING University

TR01 Transfer of patent right