CN107729943A

CN107729943A - The missing data fuzzy clustering algorithm of feedback of the information extreme learning machine optimization valuation and its application

Info

Publication number: CN107729943A
Application number: CN201710992778.9A
Authority: CN
Inventors: 张利; 刘洋; 高欣; 潘辉; 王军; 赵中洲
Original assignee: Liaoning University
Current assignee: Zhongchangxing Shandong Information Technology Co ltd
Priority date: 2017-10-23
Filing date: 2017-10-23
Publication date: 2018-02-23
Anticipated expiration: 2037-10-23
Also published as: CN107729943B

Abstract

The present invention relates to the missing data fuzzy clustering algorithm of feedback of the information extreme learning machine optimization valuation and its application, basic step are as follows：1) calculated using mutual information and select the higher data attribute of the degree of correlation, according to training sample of the complete data in these Attributions selection incomplete datas as FELM networks；2) the input weights ω and bias b of FELM networks are initialized；3) missing attribute is carried out according to Nearest Neighbor Method pre-filled, and trains the obtained error of FELM networks to supplement with money to be adjusted until finding rational numerical value to pre-fill and fill up according to training sample, and then the complete data set after being restored；4) parameter of FCM Algorithms, clusters number c, fuzzy coefficient m, threshold epsilon and degree of membership Matrix dividing U are initialized^（0）；5) final cluster result is obtained by the degree of membership Matrix dividing U and cluster centre V of iteration optimization FCM Algorithms.The distributed intelligence of relevance between data sample and attribute and partial data sample and incomplete data sample can be made full use of with this method to obtain more rational attribute valuation, so that the cluster result of Incomplete data set is more accurate.

Description

Feedback of the information extreme learning machine optimization valuation missing data fuzzy clustering algorithm and its Using

Technical field

The present invention relates to a kind of missing data fuzzy clustering algorithm of feedback of the information extreme learning machine optimization valuation and its answer With belonging to industrial information technology.

Background technology

Steel are China's construction and indispensable valuable cargo of realizing the four modernizations, and steel industry is a national development Basis, found the state six during the last ten years, Chinese strip industry keeps the sane, development of high speed, is completed industrialization strip technical system. At present, China is in the important stage of industrial development, and the demand of steel is still huge.For steel industry, it faces The very big market space.How for the innovative transformation of existing output strip line progress, minimizing and the production of low-carbon metaplasia Going out high quality, high benefit, high-caliber steel and being one has the problem of realistic meaning.At this stage, informationization is that covering is modern Change global strategic act, steel and iron industry want further innovation transformation will abundant combining information technology, information-based The advanced technology of industry is fully melted into the steel operation of rolling, realizes industrial information cooperative development comprehensively.Therefore, for strip number According to cluster analysis is carried out, it is extremely important to strengthen industrialized production reform by analysis result.

In recent years, cluster analysis is adapted to numerous different types of data acquisition systems.Achieved extensively in many research fields Application and development.According to the strip data attribute of itself, gone according to certain similitude or Diversity measure using mathematical method The kinship between strip data sample is determined, and cluster analysis is carried out to this relation and adjusts life thereby using analysis result Producing line is a significant thing.But due to being influenceed in the production and living of reality by multifactor：Such as data acquisition is set Standby failure, the failure of storage medium, the failure of transmission media appearance, slipping for human factor or being limited for detection instrument Etc..Incomplete phenomenon be present in the data set being collected into, and traditional clustering method be to incomplete data set can not be direct Application.Therefore, a kind of appropriate mode is selected to handle incomplete data, analysis and futurity industry to final result The formulation of plan is particularly important.

The content of the invention

In order to solve the above problems, the present invention provides a kind of missing data mould of feedback of the information extreme learning machine optimization valuation Clustering algorithm is pasted, and is applied in the analysis to strip data, industrialized production is strengthened by analysis result and reformed.

The present invention is achieved through the following technical solutions：Feedback of the information extreme learning machine optimizes the missing data mould of valuation Paste clustering algorithm, it is characterised in that step is as follows：

1) calculated using mutual information and select the higher data attribute of the degree of correlation, according to these Attributions selection incomplete datas In training sample of the complete data as FELM networks；

Wherein, μ_X(x) marginal probability density function of variable X is represented；μ_Y(y) the marginal probability density letter of variable Y is represented Number；μ_XY(x, y) represents joint probability density function between variable；

2) FELM network parameters are determined：Initialization input weights ω and bias b；ω and b initialization value is set Between section [- 1,1], any random number for randomly selecting the section initializes to network, determines extreme learning machine Hidden layer nodes；

3) it is pre-filled to missing attribute progress according to Nearest Neighbor Method, and train what FELM networks obtained according to training sample Error is supplemented with money using error descriptor index method to pre-fill to be adjusted until find rational numerical value and fill up, and then after being restored Complete data set；

4) parameter of FCM Algorithms, clusters number c, fuzzy coefficient m, threshold epsilon and degree of membership Matrix dividing are initialized U⁽⁰⁾；

5) complete data set after recovery is clustered using fuzzy C-mean algorithm, as iterations t=l, according to formula And degree of membership Matrix dividing U (2)^(l-1)Calculate cluster centre matrix V^(l), according to formula (3) and V^(l)Update U^(l), for what is given Threshold epsilon, ifAlgorithm terminates；Otherwise, l=l+1, iteration renewal degree of membership division is continued Matrix and cluster centre.

The step 3) is pre-filled to missing attribute progress according to Nearest Neighbor Method, and trains FELM nets according to training sample The error that network obtains is supplemented with money to be adjusted until finding rational numerical value to pre-fill using error descriptor index method and filled up, and then The process of complete data set after to recovery is as follows：

1) pre-filled, the nearest k evidence of the selected distance data sample is carried out to missing attribute according to Nearest Neighbor Method, Average value of the k according to sample relevant position is sought to the relevant position of missing data, the pre-fill using the value as incomplete data Supplement with money.

Wherein, x_aAnd x_bIth attribute be x respectively_iaAnd x_ib, and I_iShown in the condition of satisfaction such as formula (5)：

2) calculating of the network concealed layer output matrixes of FELM, the output matrix H of hidden layer is counted using formula (6-8) Calculate；

Wherein,What is represented is the output of i-th of hidden layer；It isWith x_jInner product；Then table What is reached is the input weight linked between input layer and hidden layer；β_iDescription be then linked between hidden layer and output layer it is defeated Go out weights；b_iWhat is represented is the bias of j-th of hidden layer.

H β=T (7)

Wherein, H is the output for hiding node layer, and β is output weight, and T is expectation weight.

2) calculating of FELM networks output weights, using output matrix H obtained above and it is expected defeated according to formula (9) Go out value to calculate output weight；

Wherein,It is H Moore-penrose generalized inverse matrix,Norm be minimum and unique.

3) error between real output value and true output is obtained, error is fed back, it is assumed that extreme learning machine The predicted value of output is y, and the actual value is Y, error e₀；

e₀=Y-y (10)

4) error and training sample that judgement is tried to achieve obtain the magnitude relationship between error, if meeting iteration stopping requirement, Then missing attribute is filled, otherwise receives error, readjusts pre-fill and supplement with money, return to step 1).

Described error descriptor index method, detailed process are as follows：

It is assumed that the initial estimate drawn to missing attribute using k neighbour's rules is E_k, using FELM networks to training sample Originally show that error mean isIf predicting that the output valve drawn is y for carrying out FELM study comprising missing attribute data, and its Data actual value is Y, then can obtain error e₀=Y-y, calculateThe Filling power of adjustment missing attribute：

If 1) e ＜ 0, then readjust the Filling power E of missing attribute_new=E_k+ ρ e, i.e., go to increase with certain probability This value, FELM study is then carried out again as input, ρ ∈ [0,1] here randomly select according to random function；

If 2)So readjust the Filling power E of missing attribute_new=E_k- ρ e, then carried out again as input FELM learns；

If 3)So explanation passes through the value of FELM neural network forecasts, presses close to very much with actual value, is acceptable, Therefore the filling of attribute is lacked using the value as Incomplete data set.

The missing data fuzzy clustering algorithm of feedback of the information extreme learning machine optimization valuation is in strip data clusters statistics Application, including following process：

1) experimental data is gathered：The data of a certain period collection of strip are gathered, as data sample；

2) from the gathered data sample extraction with properties：Roller gap size between the roll-force of rolling machine frame, Rolling roller, roll Roll gap is poor between roller processed, inlet temperature, outlet temperature, mill current size, mill speed, SONY values；

3) using the property value of step 2) collection as training dataset；

4) data set is normalized.Because reasons such as the data attribute orders of magnitude, first have to institute in data set There is the analog value that numerical value is transformed into [0,1] section, to eliminate the difference between data；

5) to training sample selection and optimize.Calculated using mutual information and select the higher data attribute of the degree of correlation, foundation Training sample of the complete data as FELM networks in these Attributions selection incomplete datas.

6) FELM network parameters are determined.Initialization input weights ω and bias b.ω and b initialization value is set Between section [- 1,1], any random number for randomly selecting the section initializes to network, determines extreme learning machine Hidden layer nodes；

7) attribute valuation is lacked.It is pre-filled to missing attribute progress according to Nearest Neighbor Method, and trained according to training sample Obtained error is supplemented with money to be adjusted until finding rational numerical value to pre-fill using error descriptor index method and filled up；

8) cluster analysis is carried out to recovering complete data set using FCM algorithms.

Beneficial effects of the present invention：Either traditional resolution policy only considers to contact between data, or contacted according between attribute As foundation.The present invention combines inside and outside contact (being contacted with reference between data and between attribute), is lacked using FELM real-time performance data The optimization valuation of mistake value, afterwards to optimize it is complete after data set carry out corresponding to fuzzy cluster analysis.Using mutual information to sample Correlation calculations between this attribute, so as to provide theoretical base wad to the selection of training sample.It is foundation using local distance Nearest Neighbor Method, several nearest-neighbors adjacent with incomplete data are selected, prepare FELM nets for each shortage of data value The pre-fill that network iteration uses is supplemented with money.Multiple errors (true output and desired output difference) are tried to achieve by training sample set, ask it Mean error.It is adjustment standard according to this, constantly goes to increase or decrease difference using error descriptor index method and optimize and revise estimate. So repeatedly, the estimated data of harvest preferably missing values, reaches Incomplete data set and rationally efficiently improves purpose.

Brief description of the drawings

Fig. 1 is the topology diagram of reaction type extreme learning machine.

Fig. 2 is the algorithm flow chart of the present invention.

Fig. 3 is belt steel rolling data signal acquisition figure.

Fig. 4 is the change curve between belt steel rolling data set iterations and object function.

Embodiment

First, theoretical foundation of the invention：

1st, feedback of the information extreme learning machine

Extreme learning machine (ELM) was a kind of new Single hidden layer feedforward neural networks (SLFNs) learning algorithm, in 2004 Itd is proposed by Huang Guangbin.In extreme learning machine, connect the input weights of input layer and hidden layer and the bias of hidden layer with Machine is chosen, and the output weights for connecting hidden layer and output layer are determined by Generalized Inverse Method analysis.ELM algorithms abandon gradient and decline calculation Method, the thought using least square method is attempted, to ask for optimal neural network, and achieves great success.It is but traditional Extreme learning machine can not embody prediction output valve for the value of network structure, and input is also relied solely on during study Information is calculated.Therefore, the thought for using for reference Kalman filtering is improved to traditional extreme learning machine, obtains reaction type pole Learning machine is limited, valuation prediction is preferably carried out to the missing attribute in Incomplete data set and is filled.

Reaction type extreme learning machine core concept is：Using existing error between prediction output and reality output, reach Reasonable adjusting makes Filling power more reasonable, so as to improve the validity of cluster for missing attribute filling.As shown in figure 1, it is one Individual reaction type extreme learning machine model.

As shown in figure 1, the FELM networks are made up of input layer, hidden layer and output layer.Each circle represents a node. The processing and calculating of data will will tested by each node execution of hidden layer and output layer, the specific number of hidden layer node Middle determination.

2nd, fuzzy C-mean algorithm (FCM) clustering algorithm

Fuzzy C-Means Cluster Algorithm (Bezdek, 1981) is by feature space X=(x₁, x₂..., x_n) in characteristic point point For c classes (1 ＜ c≤n), cluster centre V={ v₁, v₂... v_c, the cluster centre v of jth class_j∈R^sRepresent, wherein arbitrary data Point x_j∈R^sThe degree of membership for belonging to jth class is u_ij, represent x_jIt is under the jurisdiction of the degree of jth class.And u_ijMeet following condition：

u_ik∈ [O, 1], i=1,2 ..., c；K=1,2 ..., n； (II)

Object function is defined as follows：

Wherein, x_k=[x_1k, x_2k..., x_sk]^TIt is k-th of data sample, x_jkIt is x_kJ-th of property value；v_iIt is i-th Cluster centre；M (m ＞＞ 1) is to influence the index weight that subordinated-degree matrix is blurred degree；||·||₂Represent Euclidean distance.

Cluster centre and the more new formula of degree of membership are as follows：

Under the constraint of formula (12), alternating iteration U and V make formula (14) reach minimum.

2nd, implementation process of the invention：

Wherein, μ_X(x) marginal probability density function of variable X is represented；μ_Y(y) the marginal probability density letter of variable Y is represented Number；μ_XY(x, y) represents joint probability density function between variable.

2) FELM network parameters are determined.Initialization input weights ω and bias b.ω and b initialization value is set Between section [- 1,1], any random number for randomly selecting the section initializes to network, determines extreme learning machine Hidden layer nodes；

3) it is pre-filled to missing attribute progress according to Nearest Neighbor Method, and train what FELM networks obtained according to training sample Error is supplemented with money to pre-fill to be adjusted until find rational numerical value and fill up, and then the complete data set after being restored；

5) complete data set after recovery is clustered using fuzzy C-mean algorithm, as iterations t=l, according to formula And U (2)^(l-1)Calculate V^(l), according to formula (3) and V^(l)Update U^(l)IfAlgorithm is whole Only；Otherwise, l=l+1, iteration renewal degree of membership Matrix dividing and cluster centre are continued.

Error searching algorithm：It is assumed that the initial estimate drawn to missing attribute using k neighbour's rules is Ek, ELM is used Show that error mean is to training sampleIf predict the output valve drawn for carrying out ELM study comprising missing attribute data For y, and its data actual value is Y, then can obtain error e₀=Y-y, calculateThe Filling power of adjustment missing attribute：

(1) if e ＜ 0, then readjust the Filling power E of missing attribute_new=E_k+ ρ e, i.e., go to increase with certain probability This value, ELM study is then carried out again as input, ρ ∈ [0,1] here randomly select according to random function；

(2) ifSo readjust the Filling power E of missing attribute_new=E_k- ρ e, then carried out again as input ELM learns；

(3) ifThe value that so explanation is predicted by ELM, presses close to, is acceptable very much with actual value, therefore will Filling of the value as Incomplete data set missing attribute；

3rd, missing data fuzzy clustering algorithm that feedback of the information extreme learning machine of the present invention is optimized to valuation is used for strip In the analysis of data, industrialized production is strengthened by analysis result and reformed, is comprised the following steps that：

1st, experimental data is gathered：Strip data are the data collected from a certain period in certain domestic steel mill one day, the number 983 data samples are included altogether according to collection.From the gathered data sample extraction to properties：The roll-force of rolling machine frame, rolling Roll gap is poor between roller gap size, Rolling roller between roller, inlet temperature, outlet temperature, mill current size, mill speed, SONY values. Wherein, these attributes have different substantial connections from prediction strip exit thickness.Using these property values as FELM networks Input.Fig. 3 is the signal acquisition figure of data (wherein the longitudinal axis represents parameter value, and transverse axis represents gathered data time value).

2nd, analysis of experimental results：Experimental data is produced to the rolling data collection of missing at random data by artificial treatment, so It is afterwards each missing Attributions selection training sample set.In order to illustrate feedback of the information extreme learning machine optimization valuation proposed by the present invention Incomplete data set fuzzy clustering algorithm validity, by its experimental result and classical Processing Algorithm：Average technique of estimation, zero padding Fill method, k neighbours technique of estimation, MBP-FCM algorithms and carry out result comparison.It is inclined to contrast the valuation of algorithms of different and different missings than under Difference, and weighed by three kinds of indexs：Equal absolute deviation ABS, equal deviation Bias between actual value and valuation and average inclined Move root mean square RMSE.Their value is smaller, shows that the degree of accuracy of valuation is higher.The institute of the present invention it can be seen from Tables 1 and 2 The accuracy of the algorithm of proposition valuation compared with other four kinds contrast algorithm is more preferable, and its valuation effect is closer to initial data. Under different missing ratios, with the increase of missing values quantity, the deviation of filling can equally increase with the increase of difference. Fig. 4 describes FELM-FCM algorithms under four kinds of missing ratios, between the iterations and algorithm object function of strip data set Changing trend diagram.Algorithm proposed by the invention is more obvious in starting stage its functional value floating as seen from Figure 4, experience Several times after iteration optimization, convergence state that algorithm tends towards stability.

Table 1, which contrasts, lacks strip data set valuation deviation under algorithms of different

Table 2 contrasts different missings than lower missing strip data set valuation deviation

Claims

1. feedback of the information extreme learning machine optimizes the missing data fuzzy clustering algorithm of valuation, it is characterised in that step is as follows：

1) calculated using mutual information and select the higher data attribute of the degree of correlation, according in these Attributions selection incomplete datas Training sample of the complete data as FELM networks；

Wherein, μ_X(x) marginal probability density function of variable X is represented；μ_Y(y) marginal probability density function of variable Y is represented；μ_XY (x, y) represents joint probability density function between variable；

2) FELM network parameters are determined：Initialization input weights ω and bias b；ω and b initialization value is arranged on area Between between [- 1,1], any random number for randomly selecting the section initializes to network, determines hiding for extreme learning machine Node layer number；

3) error that is pre-filled, and being obtained according to training sample training FELM networks is carried out to missing attribute according to Nearest Neighbor Method Pre-fill is supplemented with money using error descriptor index method and is adjusted until find rational numerical value and fill up, so it is complete after being restored Entire data collection；

4) parameter of FCM Algorithms, clusters number c, fuzzy coefficient m, threshold epsilon and degree of membership Matrix dividing U are initialized^(o)；

5) complete data set after recovery is clustered using fuzzy C-mean algorithm, as iterations t=l, according to formula (2) With degree of membership Matrix dividing U^(l-1)Calculate cluster centre matrix V^(l), according to formula (3) and V^(l)Update U^(l), for given threshold Value ε, ifAlgorithm terminates；Otherwise, l=l+1, iteration renewal degree of membership division is continued Matrix and cluster centre.

。

2. the missing data fuzzy clustering algorithm of feedback of the information extreme learning machine optimization valuation according to claim 1, its It is characterised by, the step 3) is pre-filled to missing attribute progress according to Nearest Neighbor Method, and trains FELM according to training sample The error that network obtains is supplemented with money to be adjusted until finding rational numerical value to pre-fill using error descriptor index method and filled up, and then The process of complete data set after being restored is as follows：

1) pre-filled, the nearest k evidence of the selected distance data sample is carried out to missing attribute according to Nearest Neighbor Method, to lacking Average value of the k according to sample relevant position is sought in the relevant position for losing data, is supplemented with money the value as the pre-fill of incomplete data.

2) calculating of the network concealed layer output matrixes of FELM, the output matrix H of hidden layer is calculated using formula (6-8)；

Wherein,What is represented is the output of i-th of hidden layer；It isWith x_jInner product；Then express It is the input weight linked between input layer and hidden layer；β_iDescription is then the output power linked between hidden layer and output layer Value；b_iWhat is represented is the bias of j-th of hidden layer.

H β=T (7)

2) calculating of FELM networks output weights, output matrix H obtained above and desired output are used according to formula (9) Output weight is calculated；

3) error between real output value and true output is obtained, error is fed back, it is assumed that extreme learning machine exports Predicted value be y, and the actual value is Y, error e₀；

e₀=Y-y (10)

4) error and training sample that judgement is tried to achieve obtain the magnitude relationship between error, right if meeting iteration stopping requirement Missing attribute is filled, and is otherwise received error, is readjusted pre-fill and supplement with money, return to step 1).

3. the missing data fuzzy clustering algorithm of feedback of the information extreme learning machine optimization valuation according to claim 2, its It is characterised by, described error descriptor index method, detailed process is as follows：

It is assumed that the initial estimate drawn to missing attribute using k neighbour's rules is E_k, training sample is drawn using FELM networks Error mean isIf predicting that the output valve drawn is y for carrying out FELM study comprising missing attribute data, and its data is true Real value is Y, then can obtain error e₀=Y-y, calculateThe Filling power of adjustment missing attribute：

If 1) e ＜ 0, then readjust the Filling power E of missing attribute_new=E_k+ ρ e, i.e., go to increase this with certain probability Value, FELM study is then carried out again as input, ρ ∈ [0,1] here randomly select according to random function；

If 2)So readjust the Filling power E of missing attribute_new=E_k- ρ e, FELM is then carried out again as input Practise；

If 3)So explanation passes through the value of FELM neural network forecasts, presses close to very much with actual value, is acceptable, therefore will Filling of the value as Incomplete data set missing attribute.

4. the missing data fuzzy clustering algorithm of feedback of the information extreme learning machine optimization valuation is in strip data clusters statistics Using, it is characterised in that including following process：

2) from the gathered data sample extraction with properties：Roller gap size, Rolling roller between the roll-force of rolling machine frame, Rolling roller Between roll gap is poor, inlet temperature, outlet temperature, mill current size, mill speed, SONY values；

3) property value by step 2) collection is madeFor training dataset；

4) data set is normalized.Because reasons such as the data attribute orders of magnitude, first have to all numbers in data set Value is transformed into the analog value in [0,1] section, to eliminate the difference between data；

5) to training sample selection and optimize.Calculated using mutual information and select the higher data attribute of the degree of correlation, according to these Training sample of the complete data as FELM networks in Attributions selection incomplete data.

6) FELM network parameters are determined.Initialization input weights ω and bias b.ω and b initialization value is arranged on area Between between [- 1,1], any random number for randomly selecting the section initializes to network, determines hiding for extreme learning machine Node layer number；

7) attribute valuation is lacked.It is pre-filled to missing attribute progress according to Nearest Neighbor Method, and train to obtain according to training sample Error pre-fill supplemented with money using error descriptor index method be adjusted until finding rational numerical value and filling up；