CN109034231A - The deficiency of data fuzzy clustering method of information feedback RBF network valuation - Google Patents

The deficiency of data fuzzy clustering method of information feedback RBF network valuation Download PDF

Info

Publication number
CN109034231A
CN109034231A CN201810785729.2A CN201810785729A CN109034231A CN 109034231 A CN109034231 A CN 109034231A CN 201810785729 A CN201810785729 A CN 201810785729A CN 109034231 A CN109034231 A CN 109034231A
Authority
CN
China
Prior art keywords
data
network
value
incomplete
ifrbf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810785729.2A
Other languages
Chinese (zh)
Inventor
张利
石振桔
张皓博
刘洋
王彦杰
肖雪冬
王军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning University
Original Assignee
Liaoning University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning University filed Critical Liaoning University
Priority to CN201810785729.2A priority Critical patent/CN109034231A/en
Publication of CN109034231A publication Critical patent/CN109034231A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/02Computing arrangements based on specific mathematical models using fuzzy logic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Automation & Control Theory (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention relates to a kind of deficiency of data fuzzy clustering methods of information feedback RBF network valuation, and steps are as follows: 1) proposing that information feeds back RBF network model;2) a kind of deficiency of data fuzzy clustering method (IFRBF-FCM) of information feedback RBF numeric type valuation, is proposed;It 3) is that deficiency of data sample chooses corresponding training sample set using Nearest Neighbor Method, it is each missing attribute training IFRBF network using arest neighbors training sample set, the valuation for lacking attribute in deficiency of data sample is predicted to realize, obtains the complete data set after IFRBF network valuation restores;4) the valuation section of deficiency of data attribute is determined, proposes a kind of deficiency of data fuzzy clustering method (IFRBF-IFCM) of IFRBF interval type valuation, obtains fuzzy clustering result.The present invention improves accuracy rate using the cluster result that IFRBF network carries out the complete data set of recovery that valuation obtains to incomplete data sets compared with control methods, and more more acurrate than the cluster result of numeric type valuation, robustness is also more preferable.

Description

Incomplete data fuzzy clustering method for information feedback RBF network estimation
Technical Field
The invention relates to a fuzzy clustering method for incomplete data, in particular to a fuzzy clustering method for incomplete data of interval type evaluation of an information feedback RBF network.
Background
With the rapid development of information technology, there is a large amount of data in various fields. The processing of these data is not manually accessible. Therefore, it is necessary to process these data by means of a computer. Clustering analysis has a large impact in many areas. The traditional cluster analysis algorithm belongs to hard division, each data sample only belongs to or does not belong to a certain cluster, in other words, the membership value of each cluster is either 0 or 1. However, most data in reality have a certain ambiguity, and do not strictly and definitely belong to a certain cluster, but belong to a plurality of clusters in different degrees.
Therefore, as an unsupervised classification method, the fuzzy C-means (FCM) algorithm is the most widely used method among many clustering algorithms. And a membership matrix in the fuzzy clustering algorithm represents the degree [18] of each data sample belonging to each class cluster, and the fuzziness of the data can be embodied. However, the data set using this algorithm needs to be complete and it cannot directly act on the incomplete data set where there is a missing. In practice, however, incomplete data sets often occur. The reasons for this missing incomplete data set are manifold. In order to solve the problem, many scholars at home and abroad carry out further research on fuzzy clustering analysis of incomplete data.
Disclosure of Invention
The invention provides an incomplete data fuzzy clustering method for information feedback RBF network estimation, aiming at the problem that a basic FCM algorithm cannot be directly applied to fuzzy clustering of an incomplete data set. In addition, because the estimate for incomplete data is numerical by training the IFRBF network. However, the numerical data cannot accurately describe the uncertainty of the incomplete data, and certain errors exist. Aiming at the problem, the invention provides an IFRBF interval type estimated incomplete data fuzzy clustering method.
In order to achieve the purpose, the invention adopts the technical scheme that: the incomplete data fuzzy clustering method of the information feedback RBF network estimation is characterized by comprising the following steps:
1) and (3) providing an information feedback RBF network model: combining with Kalman filtering method, the input parameter is X ═ X1,x2,…,xn+m) The output parameter is Y ═ Y1,y2,…,ym) Calculating an error e between a theoretical expected output value of incomplete data and an actual output value of a network of the incomplete data, and feeding back a difference value between a predicted value of the RBF neural network and a theoretical expected value of the data to an input layer to obtain an IFRBF model;
2) selecting a corresponding training sample set for the incomplete data samples by using a nearest neighbor rule, and training the IFRBF network for each missing attribute by using the nearest neighbor training sample set, so that estimation prediction of the missing attributes in the incomplete data samples is realized, and then an integral data set after estimation recovery of the IFRBF network is obtained, and fuzzy clustering analysis is performed;
3) interval type conversion of incomplete data set: estimating and filling lost data attributes in incomplete data samples through an IFRBF network to obtain estimation errors of the complete attributes in the incomplete data samples by the IFRBF network, confirming interval type expression of the lost data attributes according to the average value of the absolute values of the estimation errors, and performing interval processing on the complete attributes in a data set;
4) and carrying out clustering analysis on the converted regional data set by using a regional fuzzy C-means clustering method, wherein each clustering center is represented by a regional vector, and thus obtaining a fuzzy clustering result.
In the step 1), the specific method comprises the following steps:
1.1) input data set normalization processing: all data are converted into numbers between intervals [0,1], so that the magnitude difference among dimensions is eliminated;
1.2) initializing the IFRBF network: setting corresponding numbers n + m, l, m, initialization weight w and central vector C for each layer of nodes of the networkiAnd width σ2Determining the maximum training times M and the error precision epsilon of the network1and each learning rate eta123
1.3) calculating the hidden layer output value of the network according to the formula (1)
1.4) calculating the output layer output value of the network according to the formula (2)
Wherein, wjkThe connection weight between the hidden layer and the output layer is obtained through a minimum variance algorithm;
1.5) calculating the error e between the output value of the network and its desired value according to equation (3)
ek=Yk-Ok,k=1,2,…,m (3)
1.6) adjusting parameters of the updated network, namely the central vector, the width and the connection weight of the network according to the formula (4), the formula (5) and the formula (6), and feeding back the obtained error to the input layer
η123Represents the rate of learning and, moreover,
wherein,
1.7) algorithm termination decision: when the number of training times reaches the maximum, or when the error e < epsilon1When so, the algorithm ends; otherwise, return to step 1.3).
In the step 2), the method specifically comprises the following steps:
2.1) selecting training samples: for an s-dimensional incomplete dataset X ═ X1,x2,…,xnIncomplete data samples xaAnd data sample xbThe similarity metric formula (2) is shown in formula (11):
wherein x isiaAnd xibAre each xaAnd xbI attribute of (2), and IiSatisfies the following conditions:
selecting corresponding nearest neighbor samples for incomplete data through similarity measurement formulas shown in a formula (11) and a formula (12), and selecting a corresponding training sample set for each incomplete data;
2.2) network training of IFRBF of incomplete data:
2.2.1) for the incomplete attribute in the training sample, adopting a method of replacing a value of '0', and replacing the value of the incomplete attribute with the value of '0' at the corresponding position of the incomplete attribute of the input layer node, thereby carrying out network training;
2.2.2) there is no feedback value in the first training, the corresponding input value is also replaced by a "0" value;
2.3.3) for the corresponding error of the incomplete property, replace it with the mean of the errors of the remaining complete properties, i.e.:
thereby completing the training of the IFRBF network;
2.3) assigning corresponding parameters in the IFRBF network by using the relevant parameters of the IFRBF network obtained by training aiming at the corresponding incomplete data with deficiency;
2.4) calculating the hidden layer output value of the network according to the formula (1);
2.5) calculating the output value of the output layer of the network according to the formula (2);
2.6) obtaining the estimation value of the missing attribute of the corresponding incomplete data from the output values obtained in 2.4) and 2.5), thereby filling the incomplete data set into a complete data set and carrying out fuzzy clustering analysis on the complete data set.
In the step 3), the method specifically comprises the following steps:
3.1) comparing the estimation errors of the complete attributes in the incomplete data samples obtained in the step 2) to obtain the average value of the absolute values of the estimation errors
3.2) segmenting the numerical evaluation of the missing data attribute in the incomplete data set, wherein the median value in the segment is the numerical evaluation x, and the width of the segment isThe interval type of the missing data attribute is represented asI.e. [ x-, x ]+];
3.3) the interval range of the obtained interval type estimation value is judged and limited within the range of the interval [0,1 ]:
if x-If the evaluation interval is less than 0, the left end point value of the evaluation interval of the missing data attribute is set to be 0, namely x-=0;
If x+If the evaluation interval of the missing data attribute is more than 1, the right end value of the evaluation interval of the missing data attribute is set to be 1, namely x+=1;
3.4) all the complete attributes in the incomplete dataset are represented in the form of an interval, i.e. the left and right end points of the interval are equal and equal to the original value of the complete attribute.
The interval type fuzzy C-means clustering method in the step 4) specifically comprises the following steps:
s-dimensional interval data set X ═ X1,x2,…,xnContains n pieces of data, and samples x of the datakEach attribute value in (1) is represented by interval, namely xkj=[xkj -,xkj +](1 ≦ k ≦ s), the data set X is divided into c classes, and the cluster center therein is denoted V ≦ sik]=[v1,v2,…,vc]And v isik=[vik -,vik +](i ═ 1,2, …, s) using a membership matrix U(c×n)To represent the clustering result of the interval type data set, wherein the element u in the membership degree matrixijThe following conditions are satisfied:
the target function formula of the IFCM algorithm is as follows:
wherein, the data xjThe euclidean distance to the cluster center vi is: d2(xj,vi) M is a fuzzy index and satisfies m ∈ (1, + ∞);
d2(xj,vi) The specific calculation formula of (2) is as follows:
wherein, interval type data attribute xjThe vectors of the left and right interval boundaries are respectively expressed as: x is the number ofj -=[x1j -,x2j -,…,xsj -]TAnd xj +=[x1j +,x2j +,…,xsj +]TInterval type clustering center viTo the left ofThe vectors of the right interval boundaries are respectively denoted as vi -=[v1i -,v2i -,…,vsi -]TAnd vi +=[v1i +,v2i +,…,vsi +]T
The Lagrange multiplier method is used for solving the necessary condition that the objective function equation (15) reaches the minimum value under the constraint condition of the equation (14) is as follows:
if the interval type data sample xjCompletely belong to the cluster center vh(h is more than or equal to 1 and less than or equal to c) is within the range of interval value, the membership degree is 1, if the interval type data sample xjCompletely not belonging to the cluster centre vhWithin the range of interval values of (2), the membership degree is 0, i.e.
Otherwise, as shown in equation (20)
And updating the membership degree of each data sample.
In the step 4), the specific steps of obtaining the fuzzy clustering result are as follows:
4.1) parameter initialization: the number of categories of clustering c; the maximum number of iterations G; a fuzzy index m and an iteration termination threshold epsilon; and degree of membershipMatrix U(0)Carrying out initialization;
4.2) updating the clustering center matrix: when iterating to the l (l ═ 1,2, …) times, according to U(l-1)Simultaneously using formula (17) and formula (18) to cluster center matrix V(l)V (C) is a left endpoint valuel)-And its right endpoint value V: (l)+Updating is carried out;
4.3) updating the membership matrix: according to V(l)Using formula (19) and formula (20) to pair the membership matrix U(l)Updating is carried out;
4.4) algorithm termination decision: when the number of training times reaches a maximum, or when max | U(l+1)-U(l)When | ≦ ε, the algorithm terminates; otherwise l ═ l +1, and return to step 2).
The beneficial effects created by the invention are as follows: according to the invention, through the analysis and research of the RBF neural network and the Kalman filtering thought, the difference value between the predicted value of the RBF neural network and the theoretical expected value of data is fed back to the input layer, so that an information feedback RBF network model, namely an IFRBF network for short, is obtained. And simultaneously, selecting a training sample set for the incomplete data by using a nearest neighbor rule, training the IFRBF network for each missing attribute by using the nearest neighbor training sample set to obtain a complete data set after the IFRBF network estimation is recovered, and performing fuzzy clustering analysis on the complete data set. After the estimation of the IFRBF network, the obtained estimation value of the incomplete data is numerical. However, the numerical data cannot accurately describe the uncertainty of the incomplete data, and certain errors also exist. Aiming at the problem, the numerical evaluation of the missing attribute is converted into an interval form, meanwhile, the complete attribute in the data set is also converted into the interval form, and then the interval type fuzzy C-means clustering method is used for carrying out fuzzy clustering analysis on the obtained interval type data set. The experimental result shows that the clustering result of the complete recovery data set obtained by adopting the IFRBF network to estimate the incomplete data set has higher accuracy compared with the comparison method, and the result of clustering by adopting the interval estimation is more accurate and has better robustness than the clustering result of the numerical estimation.
Drawings
FIG. 1: the artificial data set 1 is schematically illustrated.
FIG. 2: the artificial data set 2 is schematically illustrated.
FIG. 3: and the IFRBF-FCM algorithm is used for generating a change trend graph between the iteration number and the target function of the Iris data set under different miss rates.
FIG. 4: and the IFRBF-FCM algorithm is used for generating a variation trend graph between the iteration number and the objective function of the Bupa data set under different deficiency rates.
FIG. 5: and (3) an IFRBF-FCM algorithm is used for generating a change trend graph between the iteration number and the objective function of the Breast data set under different deficiency rates.
FIG. 6: and the IFRBF-IFCM algorithm is used for generating a change trend graph between the iteration number and the target function of the Iris data set under different miss rates.
FIG. 7: and the IFRBF-IFCM algorithm is used for generating a variation trend graph between the iteration number and the objective function of the Bupa data set under different deficiency rates.
FIG. 8: and (3) an IFRBF-IFCM algorithm is used for generating a change trend graph between the iteration number and the objective function of the Breast data set under different deficiency rates.
FIG. 9: and the IFRBF-IFCM algorithm is used for generating a change trend graph between the iteration number and the objective function of the artificial data set 1 under different deficiency rates.
FIG. 10: and the IFRBF-IFCM algorithm is used for generating a change trend graph between the iteration number and the objective function of the artificial data set 2 under different deficiency rates.
Detailed Description
1) And (3) providing an information feedback RBF network model (IFRBF network for short): combining with Kalman filtering method, the input parameter is X ═ X1,x2,…,xn+m) The output parameter is Y ═ Y1,y2,…,ym) And calculating an error e between the theoretical expected output value of the incomplete data and the actual output value of the network, and feeding back the difference value between the predicted value of the RBF neural network and the theoretical expected value of the data to an input layer to obtain an information feedback RBF network, namely an IFRBF model.
The specific method comprises the following steps:
1.1) input data set normalization processing: all data are converted into numbers between intervals [0,1], so that the magnitude difference among dimensions is eliminated;
1.2) initializing the IFRBF network: setting corresponding numbers n + m, l, m, initialization weight w and central vector C for each layer of nodes of the networkiAnd width σ2Determining the maximum training times M and the error precision epsilon of the network1and each learning rate eta123
1.3) calculating the hidden layer output value of the network according to the formula (1)
1.4) calculating the output layer output value of the network according to the formula (2)
Wherein, wjkThe connection weight between the hidden layer and the output layer is obtained through a minimum variance algorithm;
1.5) calculating the error e between the output value of the network and its desired value according to equation (3)
ek=Yk-Ok,k=1,2,…,m (3)
1.6) adjusting parameters of the updated network, namely the central vector, the width and the connection weight of the network according to the formula (4), the formula (5) and the formula (6), and feeding back the obtained error to the input layer
η123Represents the rate of learning and, moreover,
wherein,
1.7) algorithm termination decision: when the number of training times reaches the maximum, or when the error e < epsilon1When so, the algorithm ends; otherwise, return to step 1.3).
2) Aiming at the problem that the basic FCM algorithm cannot directly carry out fuzzy clustering analysis on an incomplete data set, the incomplete data fuzzy clustering method for information feedback RBF numerical estimation is provided, and is called IFRBF-FCM for short. Selecting a corresponding training sample set for the incomplete data samples by using a nearest neighbor rule, and training the IFRBF network for each missing attribute by using the nearest neighbor training sample set, so that estimation prediction of the missing attributes in the incomplete data samples is realized, and then an integral data set after estimation recovery of the IFRBF network is obtained, and fuzzy clustering analysis is performed;
the method specifically comprises the following steps:
2.1) selecting training samples: for an s-dimensional incomplete dataset X ═ X1,x2,…,xnIncomplete data samples xaAnd data sample xbThe similarity measurement formula (with or without the missing attribute) is shown in formula (11):
wherein x isiaAnd xibAre each xaAnd xbI attribute of (2), and IiSatisfies the following conditions:
selecting corresponding nearest neighbor samples for incomplete data through similarity measurement formulas shown in a formula (11) and a formula (12), and selecting a corresponding training sample set for each incomplete data;
2.2) network training of IFRBF of incomplete data:
2.2.1) for the incomplete attribute in the training sample, adopting a method of replacing a value of '0', and replacing the value of the incomplete attribute with the value of '0' at the corresponding position of the incomplete attribute of the input layer node, thereby carrying out network training;
2.2.2) there is no feedback value in the first training, the corresponding input value is also replaced by a "0" value;
2.3.3) for the corresponding error of the incomplete property, replace it with the mean of the errors of the remaining complete properties, i.e.:
thereby completing the training of the IFRBF network;
2.3) assigning corresponding parameters in the IFRBF network by using the relevant parameters of the IFRBF network obtained by training aiming at the corresponding incomplete data with deficiency;
2.4) calculating the hidden layer output value of the network according to the formula (1);
2.5) calculating the output value of the output layer of the network according to the formula (2);
2.6) obtaining the estimation value of the missing attribute of the corresponding incomplete data from the output values obtained in 2.4) and 2.5), thereby filling the incomplete data set into a complete data set and carrying out fuzzy clustering analysis on the complete data set.
3) Interval type conversion of incomplete data set: estimating and filling lost data attributes in incomplete data samples through an IFRBF network to obtain estimation errors of the complete attributes in the incomplete data samples by the IFRBF network, confirming interval type expression of the lost data attributes according to the average value of the absolute values of the estimation errors, and performing interval processing on the complete attributes in a data set;
the method specifically comprises the following steps:
3.1) comparing the estimation errors of the complete attributes in the incomplete data samples obtained in the step 2) to obtain the average value of the absolute values of the estimation errors
3.2)The numerical evaluation of the missing data attribute in the incomplete data set is partitioned, the median value in the interval is the numerical evaluation x, and the width of the interval isThe interval type of the missing data attribute is represented asI.e. [ x ]-,x+];
3.3) the interval range of the obtained interval type estimation value is judged and limited within the range of the interval [0,1 ]:
if x-If the evaluation interval is less than 0, the left end point value of the evaluation interval of the missing data attribute is set to be 0, namely x-=0;
If x+If the evaluation interval of the missing data attribute is more than 1, the right end value of the evaluation interval of the missing data attribute is set to be 1, namely x+=1;
3.4) representing all complete attributes in the incomplete dataset in the form of intervals, i.e. the left and right end-points of an interval are equal and equal to the original value of the complete attribute, e.g. the value of one complete attribute is xijThen its interval is represented by the form [ x ]ij,xij]。
4) Carrying out clustering analysis on the transformed regional data set by using a regional fuzzy C-means clustering method, wherein each clustering center is represented by a regional vector to obtain a fuzzy clustering result;
the regional fuzzy C-means clustering method specifically comprises the following steps:
s-dimensional interval data set X ═ X1,x2,…,xnContains n pieces of data, and samples x of the datakEach attribute value in (1) is represented by interval, namely xkj=[xkj -,xkj +](1 ≦ k ≦ s), the data set X is divided into c classes, and the cluster center therein is denoted V ≦ sik]=[v1,v2,…,vc]And v isik=[vik -,vik +](i ═ 1,2, …, s) using a membership matrix U(c×n)To represent the clustering result of the interval type data set, wherein the element u in the membership degree matrixijThe following conditions are satisfied:
the target function formula of the IFCM algorithm is as follows:
wherein, the data xjTo the center of the cluster viThe euclidean distance between them is: d2(xj,vi) M is a fuzzy index and satisfies m ∈ (1, + ∞);
d2(xj,vi) The specific calculation formula of (2) is as follows:
wherein, interval type data attribute xjThe vectors of the left and right interval boundaries are respectively expressed as: x is the number ofj -=[x1j -,x2j -,…,xsj -]TAnd xj +=[x1j +,x2j +,…,xsj +]TInterval type clustering center viThe vectors of the left and right interval boundaries are respectively represented as vi -=[v1i -,v2i -,…,vsi -]TAnd vi +=[v1i +,v2i +,…,vsi +]T
The Lagrange multiplier method is used for solving the necessary condition that the objective function equation (15) reaches the minimum value under the constraint condition of the equation (14) is as follows:
if the interval type data sample xjCompletely belong to the cluster center vh(h is more than or equal to 1 and less than or equal to c) is within the range of interval value, the membership degree is 1, if the interval type data sample xjCompletely not belonging to the cluster centre vhWithin the range of interval values of (2), the membership degree is 0, i.e.
Otherwise, as shown in equation (20)
And updating the membership degree of each data sample.
The specific steps for obtaining the fuzzy clustering result are as follows:
4.1) parameter initialization: the number of categories of clustering c; the maximum number of iterations G; a fuzzy index m and an iteration termination threshold epsilon; and to the membership matrix U(0)Carrying out initialization;
4.2) updating the clustering center matrix: when iterating to the l (l ═ 1,2, …) times, according to U(l-1)Simultaneously using formula (17) and formula (18) to cluster center matrix V(l)Left endpoint value V of(l)-And its right endpoint value V(l)+Updating is carried out;
4.3) updating the membership matrix: according to V(l)Using formula (19) and formula (20) to pair the membership matrix U(l)Updating is carried out;
4.4) algorithm termination decision: when the number of training times reaches a maximum, or when max | U(l+1)-U(l)When | ≦ ε, the algorithm terminates; otherwise l ═ l +1, and return to step 2).
5) Carrying out fuzzy clustering analysis on the interval type data set by the IFCM method obtained in the step 4) to obtain a fuzzy clustering result, and comparing the fuzzy clustering result with an IFRBF-FCM method and other four comparative classical algorithms (WDS-FCM, PDS-FCM, OCS-FCM and NPS-FCM) to verify the effectiveness of the invention:
(1) initialization of the experiment: the invention selects the data sets in the three UCI databases as the data sample sets of the experiment, namely Iris, Bupa and Breast data sets. Meanwhile, two artificial data sets are selected to carry out comparison experiments on two algorithms (IFRBF-FCM and IFRBF-IFCM) and four comparison algorithms (WDS-FCM, PDS-FCM, OCS-FCM and NPS-FCM) provided by the invention.
The Iris dataset is a dataset for multidimensional attribute analysis of Iris flowers. There are 150 sample data present in the dataset and are divided into three classes, respectively: irises, iris variegata, and irises virginiae. Each of these three categories contains 50 samples, and each sample contains 4 attributes. Respectively as follows: petal length, calyx length, petal width, and calyx width.
The Bupa dataset is sample data for a liver disease study. The data set contains 345 sample data; the total number of samples in each category was 145 and 200, respectively. The data sample contains 7 attributes, but the 7 th attribute is a category identifier and does not participate in the experiment. The remaining valid 6 attributes include the following: mean volume of red blood cells, glutamyl transpeptidase, and daily alcohol consumption.
The Breast dataset is a dataset that is descriptive of clinical cases of Breast cancer. The data set has 699 sample data, but 16 sample data have the problem of attribute loss, so the data used in the actual data analysis has 683 data. The data set is divided into two categories, namely breast benign tumor sample data and breast malignant tumor sample data; comprising 444 and 239 data samples, respectively. The sample data comprises 11 attribute columns, wherein two columns of attributes do not participate in fuzzy clustering experimental analysis, and the two columns of attributes are respectively the sample data number of the first column and the sample data category number of the last column. The remaining valid 9 attributes include: limbal adhesion, size of the single epithelial cells, mitosis, nude nuclear cells, and chromatin, among others. Table 1 is a description of information related to several UCI data sets described above.
Table 1 description of UCI data set information
The number of data samples in the artificial data set 1 is 200, the number of included categories is 2, and each sub-category includes 100 data samples. The number of data samples in the artificial data set 2 is 400, the number of included categories is 3, and each of the sub-categories includes 80, 100, and 220 data samples. Data samples (x) in the two artificial datasetsi,yi) All obey independent two-dimensional normal distributions.
The artificial data set 1 is generated according to the following parameters:
(i) class 1: u. of1=4,u2=4,σ1 2=2,σ2 2=2。
(ii) Class 2: u. of1=6,u2=8,σ1 2=2,σ2 2=2。
The distribution of the artificial data set 1 generated according to the above parameters is shown in fig. 1. The "+" in red represents data samples in a first subset of the data set, and the "+" in blue represents data samples in a second subset of the data set.
The artificial dataset 2 was generated as follows:
(i) class 1: u. of1=20,u2=20,σ1 2=2,σ2 2=4。
(ii) Class 2: u. of1=25,u2=30,σ1 2=9,σ2 2=25。
(iii) Class 3: u. of1=36,u2=36,σ1 2=16,σ2 2=16。
The distribution of the artificial data set 2 generated according to the above parameters is shown in fig. 2. The "+" in red represents a data sample in a first subset of the data set, the ". sup." in blue represents a data sample in a second subset of the data set, and the ". sup." in green represents a data sample in a third subset of the data set.
In order to enable incomplete data in an experiment to be closer to randomness generated by the incomplete data in practice, data used in the experiment is obtained by randomly losing a complete data set according to a ratio set by people, and therefore the incomplete data set is generated. The location of the missing attribute in the incomplete data set is determined by where the attribute exists, i.e., by the number of rows x and columns y in which the attribute exists. At the same time, we use "? "to replace it. The rules for randomly generating missing data attributes in a dataset are as follows:
(i) for an s-dimensional data set, it must be ensured that there are at most s-1 missing attribute values for any sample data in the data set.
(ii) It must be ensured that at least one complete value exists for any one-dimensional attribute in the data set.
The maximum training times of the IFRBF network is set to be M-500, and the error precision is set to be epsilon1each learning rate is set to η 0.011=0.1,η2=0.1,η30.1. For different data sets, the number of nodes in each layer of the IFRBF network is different, and needs to be determined according to the number of related attributes in the data set. The number of hidden layer nodes of the network needs to be determined through experiments. In addition, the maximum iteration times of the FCM algorithm and the IFCM algorithm are set to be G equal to 100, the fuzzy index is set to be m equal to 2, and the iteration termination threshold is set to be epsilon equal to 0.001. The missing rate of each data set is set as follows: 0%, 5%, 10%, 15% and 20%. Considering that the experimental results of each algorithm may have contingency, the invention performs 10 simulation experiments on each algorithm respectively. The average of the results obtained from these 10 experiments was analyzed and compared.
The evaluation performance of the experimental result evaluates the proposed IFRBF-FCM algorithm from two aspects, namely the average error score of the cluster and several external effectiveness evaluation indexes. The average error score can be used for visually comparing clustering results, and the external validity evaluation index can be used for evaluating the similarity degree between the real division of the experimental data and the corresponding fuzzy division results. These indices are respectively: rand Index, Adjusted Rand Index, Jaccard Coefficient, MinkowskiMeasure and Γ staticiscs. The smaller the value of the evaluation index Minkowski Measure is, the better the performance of the corresponding clustering algorithm is. And the larger the values of the other external evaluation indexes are, the better the performance of the clustering algorithm is.
(2) And (5) analyzing an experimental result.
(i) Comparative analysis of IFRBF-FCM Experimental results:
for the incomplete data fuzzy clustering method (IFRBF-FCM) of the information feedback RBF network estimation provided by the invention, the experimental result is compared with other four classical algorithms. The experimental results are shown in tables 2 to 7, in which the most preferable experimental results are marked with bold lines and the less preferable results are marked with underlines.
Table 2 average error score of 10 experiments on incomplete data set Iris
Table 3 average error score of 10 experiments with incomplete data set Bupa
Table 4 average error score of 10 experiments with incomplete data set break
TABLE 5 average effectiveness evaluation index of 10 experiments on incomplete data set Iris
Table 6 average effectiveness evaluation index of 10 experiments with incomplete data set Bupa
Table 7 average effectiveness evaluation index of 10 experiments of incomplete data set break
As can be seen from tables 2 to 7, the IFRBF-FCM algorithm provided by the present invention is relatively better compared with the other four comparative algorithms in terms of overall view under different missing rates of the respective data sets.
The average error score is an evaluation index. As can be seen from the experimental results in tables 2 to 4, the IFRBF-FCM algorithm proposed by the present invention can obtain relatively better experimental results compared with the other four comparative algorithms as a whole. The experimental results for the Bupa dataset were suboptimal only at a deletion rate of 15%.
For several average external effectiveness indicators. From the results in tables 5 to 7, it can be seen that, when the data sets have different deletion rates, the IFRBF-FCM algorithm proposed by the present invention can obtain relatively better experimental results compared with the other four comparative algorithms as a whole.
Fig. 3 to 5 are graphs of variation trends between iteration times and objective functions of the IFRBF-FCM algorithm for the three UCI data sets under different deficiency rates during cluster analysis.
Regarding the convergence of the algorithm, as can be seen from fig. 3 to fig. 5, under different deficiency rates of the respective data sets, the IFRBF-FCM algorithm has a relatively fast change speed of the objective function values at the initial stage of the algorithm, but after a plurality of iterations, the objective function values of the algorithm can be in a relatively stable state.
(ii) Experimental comparative analysis of IFRBF-IFCM:
the experimental results of the IFRBF interval type estimated incomplete data fuzzy clustering algorithm (IFRBF-IFCM) and the IFRBF numerical value estimated incomplete data fuzzy clustering algorithm (IFRBF-FCM) are compared. The experimental results are shown in tables 8 to 11, in which the most preferable results are marked by bolding.
Table 8 average error score for incomplete UCI data set 10 experiments
Table 9 average error score for incomplete artificial dataset 10 experiments
Table 10 average number of iterations for 10 experiments with incomplete UCI data set
TABLE 11 average number of iterations for 10 experiments with incomplete artificial data set
For the three UCI data sets, as can be seen from table 8, the IFRBF-IFCM algorithm is better than the IFRBF-FCM algorithm in terms of the evaluation index of the average error score from the global perspective. The result of the IFRBF-IFCM algorithm is not the same as the experimental result of the IFRBF-FCM algorithm only when the missing rate of the Iris data set is 15% and the missing rate of the Bupa data set is 10%.
For the artificial dataset 1, the distribution of the data samples in the various categories in the dataset is relatively uniform. As can be seen from Table 9, when the missing rate is 10%, the IFRBF-FCM algorithm is relatively better for the evaluation index of the average error score. However, the IFRBF-IFCM algorithm is better when the deficiency rate is 5%, 15%, or 20%. But it can be seen that there is no particularly large difference between the two. For the artificial data set 2, the distribution of the data samples in each category in the data set is not uniform, and the degree of dispersion of the data samples in each category is also very different. The distribution of the sample data in the first class of the data set is relatively centralized, and the distribution of the sample data in the second class and the third class is relatively dispersed. As can be seen from Table 9, the IFRBF-IFCM algorithm has better experimental results than the IFRBF-FCM algorithm under different deficiency rates. Moreover, it can be seen that the IFRBF-IFCM algorithm has much better results than the IFRBF-FCM algorithm when the miss rates are 15% and 20%. Therefore, for a data set which is not uniformly distributed, the IFRBF-IFCM algorithm can more accurately present the uncertainty of the incomplete data attribute, so that the robustness of the data set is improved.
Tables 10 and 11 are the average number of iterations of 10 experiments for the IFRBF-FCM algorithm and the IFRBF-IFCM algorithm. In terms of an evaluation index of the average number of iterations. As can be seen from tables 10 and 11, the average number of iterations of the two algorithms proposed by the present invention is different for different data sets. However, globally, regarding the evaluation index of the average iteration number, the IFRBF-FCM algorithm is better in the two algorithms proposed in the present invention. However, after a plurality of iterations, the objective function value of the IFRBF-IFCM algorithm can finally reach a relatively stable state.
Fig. 6 to fig. 10 are graphs showing the variation trend between the iteration times and the objective function of each data set under different loss rates by applying the IFRBF-IFCM algorithm proposed by the present invention during cluster analysis. As can be seen from fig. 6 to fig. 10, the objective function values of the incomplete data sets can finally reach a relatively stable state through multiple iterations under the condition of different deficiency rates.

Claims (6)

1. The incomplete data fuzzy clustering method of the information feedback RBF network estimation is characterized by comprising the following steps:
1) and (3) providing an information feedback RBF network model: combining with Kalman filtering method, the input parameter is X ═ X1,x2,…,xn+m) The output parameter is Y ═ Y1,y2,…,ym) Calculating the error e between the theoretical expected output value of incomplete data and the actual output value of the network, and feeding back the difference between the predicted value of RBF neural network and the theoretical expected value of data to the input layer to obtain the final productAn IFRBF model;
2) selecting a corresponding training sample set for the incomplete data samples by using a nearest neighbor rule, and training the IFRBF network for each missing attribute by using the nearest neighbor training sample set, so that estimation prediction of the missing attributes in the incomplete data samples is realized, and then an integral data set after estimation recovery of the IFRBF network is obtained, and fuzzy clustering analysis is performed;
3) interval type conversion of incomplete data set: estimating and filling lost data attributes in incomplete data samples through an IFRBF network to obtain estimation errors of the complete attributes in the incomplete data samples by the IFRBF network, confirming interval type expression of the lost data attributes according to the average value of the absolute values of the estimation errors, and performing interval processing on the complete attributes in a data set;
4) and carrying out clustering analysis on the converted regional data set by using a regional fuzzy C-means clustering method, wherein each clustering center is represented by a regional vector, and thus obtaining a fuzzy clustering result.
2. The fuzzy clustering method for incomplete data of information feedback RBF network estimation as claimed in claim 1, wherein: in the step 1), the specific method comprises the following steps:
1.1) input data set normalization processing: all data are converted into numbers between intervals [0,1], so that the magnitude difference among dimensions is eliminated;
1.2) initializing the IFRBF network: setting corresponding numbers n + m, l, m, initialization weight w and central vector C for each layer of nodes of the networkiAnd width σ2Determining the maximum training times M and the error precision epsilon of the network1and each learning rate eta123
1.3) calculating the hidden layer output value of the network according to the formula (1)
1.4) calculating the output layer output value of the network according to the formula (2)
Wherein, wjkThe connection weight between the hidden layer and the output layer is obtained through a minimum variance algorithm;
1.5) calculating the output value Y of the network according to the formula (3)kAnd its expected value OkError e between
ek=Yk-Ok,k=1,2,…,m (3)
1.6) adjusting parameters of the updated network, namely the central vector, the width and the connection weight of the network according to the formula (4), the formula (5) and the formula (6), and feeding back the obtained error to the input layer
η123Represents the rate of learning and, moreover,
wherein,
1.7) algorithm termination decision: when the number of training times reaches the maximum, or when the error e < epsilon1When so, the algorithm ends; otherwise, return to step 1.3).
3. The fuzzy clustering method for incomplete data of information feedback RBF network estimation as claimed in claim 1, wherein: in the step 2), the method specifically comprises the following steps:
2.1) selecting training samples: for an s-dimensional incomplete dataset X ═ X1,x2,…,xnIncomplete data samples xaAnd data sample xbThe similarity metric formula (2) is shown in formula (11):
wherein x isiaAnd xibAre each xaAnd xbI attribute of (2), and IiSatisfies the following conditions:
selecting corresponding nearest neighbor samples for incomplete data through similarity measurement formulas shown in a formula (11) and a formula (12), and selecting a corresponding training sample set for each incomplete data;
2.2) network training of IFRBF of incomplete data:
2.2.1) for the incomplete attribute in the training sample, adopting a method of replacing a value of '0', and replacing the value of the incomplete attribute with the value of '0' at the corresponding position of the incomplete attribute of the input layer node, thereby carrying out network training;
2.2.2) there is no feedback value in the first training, the corresponding input value is also replaced by a "0" value;
2.3.3) for the corresponding error of the incomplete property, replace it with the mean of the errors of the remaining complete properties, i.e.:
thereby completing the training of the IFRBF network;
2.3) assigning corresponding parameters in the IFRBF network by using the relevant parameters of the IFRBF network obtained by training aiming at the corresponding incomplete data with deficiency;
2.4) calculating the hidden layer output value of the network according to the formula (1);
2.5) calculating the output value of the output layer of the network according to the formula (2);
2.6) obtaining the estimation value of the missing attribute of the corresponding incomplete data from the output values obtained in 2.4) and 2.5), thereby filling the incomplete data set into a complete data set and carrying out fuzzy clustering analysis on the complete data set.
4. The fuzzy clustering method for incomplete data of information feedback RBF network estimation as claimed in claim 1, wherein: in the step 3), the method specifically comprises the following steps:
3.1) comparing the estimation errors of the complete attributes in the incomplete data samples obtained in the step 2) to obtain the average value of the absolute values of the estimation errors
3.2) segmenting the numerical evaluation of the missing data attribute in the incomplete data set, wherein the median value in the segment is the numerical evaluation x, and the width of the segment isThe interval type of the missing data attribute is represented asI.e. [ x ]-,x+];
3.3) the interval range of the obtained interval type estimation value is judged and limited within the range of the interval [0,1 ]:
if x-If the evaluation interval is less than 0, the left end point value of the evaluation interval of the missing data attribute is set to be 0, namely x-=0;
If x+If the evaluation interval of the missing data attribute is more than 1, the right end value of the evaluation interval of the missing data attribute is set to be 1, namely x+=1;
3.4) all the complete attributes in the incomplete dataset are represented in the form of an interval, i.e. the left and right end points of the interval are equal and equal to the original value of the complete attribute.
5. The fuzzy clustering method for incomplete data of information feedback RBF network estimation as claimed in claim 1, wherein: the interval type fuzzy C-means clustering method in the step 4) specifically comprises the following steps:
s-dimensional interval data set X ═ X1,x2,…,xnContains n pieces of data, and samples x of the datakEach attribute value in (1) is represented by interval, namely xkj=[xkj -,xkj +](1 ≦ k ≦ s), the data set X is divided into c classes, and the cluster center therein is denoted V ≦ sik]=[v1,v2,…,vc]And v isik=[vik -,vik +](i ═ 1,2, …, s) using a membership matrix U(c×n)To represent the clustering result of the interval type data set, wherein the element u in the membership degree matrixijThe following conditions are satisfied:
the target function formula of the IFCM algorithm is as follows:
wherein, the data xjTo the center of the cluster viThe euclidean distance between them is: d2(xj,vi) M is a fuzzy index and satisfies m ∈ (1, + ∞);
d2(xj,vi) The specific calculation formula of (2) is as follows:
wherein, interval type data attribute xjThe vectors of the left and right interval boundaries are respectively expressed as: x is the number ofj -=[x1j -,x2j -,…,xsj -]TAnd xj +=[x1j +,x2j +,…,xsj +]TInterval type clustering center viThe vectors of the left and right interval boundaries are respectively represented as vi -=[v1i -,v2i -,…,vsi -]TAnd vi +=[v1i +,v2i +,…,vsi +]T
The Lagrange multiplier method is used for solving the necessary condition that the objective function equation (15) reaches the minimum value under the constraint condition of the equation (14) is as follows:
if the interval type data sample xjCompletely belong to the cluster center vh(h is more than or equal to 1 and less than or equal to c) is within the range of interval value, the membership degree is 1, if the interval type data sample xjCompletely not belonging to the cluster centre vhWithin the range of interval values of (2), the membership degree is 0, i.e.
Otherwise, as shown in equation (20)
And updating the membership degree of each data sample.
6. The fuzzy clustering method for incomplete data of information feedback RBF network estimation as claimed in claim 1, wherein: in the step 4), the specific steps of obtaining the fuzzy clustering result are as follows:
4.1) parameter initialization: the number of categories of clustering c; the maximum number of iterations G; a fuzzy index m and an iteration termination threshold epsilon; and to the membership matrix U(0)Carrying out initialization;
4.2) updating the clustering center matrix: when iterating to the l (l ═ 1,2, …) times, according to U(l-1)Simultaneously using formula (17) and formula (18) to cluster center matrix V(l)Left endpoint value ofAnd its right endpoint valueUpdating is carried out;
4.3) updating the membership matrix: according to V(l)Using formula (19) and formula (20) to pair the membership matrix U(l)Updating is carried out;
4.4) algorithm termination decision: when the number of training times reaches a maximum, or when max | U(l+1)-U(l)When | ≦ ε, the algorithm terminates; otherwise l ═ l +1, and return to step 2).
CN201810785729.2A 2018-07-17 2018-07-17 The deficiency of data fuzzy clustering method of information feedback RBF network valuation Pending CN109034231A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810785729.2A CN109034231A (en) 2018-07-17 2018-07-17 The deficiency of data fuzzy clustering method of information feedback RBF network valuation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810785729.2A CN109034231A (en) 2018-07-17 2018-07-17 The deficiency of data fuzzy clustering method of information feedback RBF network valuation

Publications (1)

Publication Number Publication Date
CN109034231A true CN109034231A (en) 2018-12-18

Family

ID=64643578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810785729.2A Pending CN109034231A (en) 2018-07-17 2018-07-17 The deficiency of data fuzzy clustering method of information feedback RBF network valuation

Country Status (1)

Country Link
CN (1) CN109034231A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816017A (en) * 2019-01-24 2019-05-28 电子科技大学 Power grid missing data complementing method based on fuzzy clustering and Lagrange's interpolation
CN109948715A (en) * 2019-03-22 2019-06-28 杭州电子科技大学 A kind of water monitoring data missing values complementing method
CN110087207A (en) * 2019-05-05 2019-08-02 江南大学 Wireless sensor network missing data method for reconstructing
CN110298382A (en) * 2019-05-27 2019-10-01 湖州师范学院 A kind of integrated TSK Fuzzy Classifier based on IFCM, KNN and data dictionary
CN110298434A (en) * 2019-05-27 2019-10-01 湖州师范学院 A kind of integrated deepness belief network based on fuzzy division and FUZZY WEIGHTED
CN110457770A (en) * 2019-07-18 2019-11-15 中国电力科学研究院有限公司 A kind of distribution transformer heavy-overload judgment method towards time scale
CN111191687A (en) * 2019-12-14 2020-05-22 贵州电网有限责任公司 Power communication data clustering method based on improved K-means algorithm
CN111881502A (en) * 2020-07-27 2020-11-03 中铁二院工程集团有限责任公司 Bridge state discrimination method based on fuzzy clustering analysis
CN112183114A (en) * 2020-08-10 2021-01-05 招联消费金融有限公司 Model training and semantic integrity recognition method and device

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816017A (en) * 2019-01-24 2019-05-28 电子科技大学 Power grid missing data complementing method based on fuzzy clustering and Lagrange's interpolation
CN109948715A (en) * 2019-03-22 2019-06-28 杭州电子科技大学 A kind of water monitoring data missing values complementing method
CN109948715B (en) * 2019-03-22 2021-07-02 杭州电子科技大学 Water quality monitoring data missing value filling method
CN110087207B (en) * 2019-05-05 2020-04-10 江南大学 Method for reconstructing missing data of wireless sensor network
CN110087207A (en) * 2019-05-05 2019-08-02 江南大学 Wireless sensor network missing data method for reconstructing
CN110298434B (en) * 2019-05-27 2022-12-09 湖州师范学院 Integrated deep belief network based on fuzzy partition and fuzzy weighting
CN110298434A (en) * 2019-05-27 2019-10-01 湖州师范学院 A kind of integrated deepness belief network based on fuzzy division and FUZZY WEIGHTED
CN110298382A (en) * 2019-05-27 2019-10-01 湖州师范学院 A kind of integrated TSK Fuzzy Classifier based on IFCM, KNN and data dictionary
CN110298382B (en) * 2019-05-27 2022-12-09 湖州师范学院 Integrated TSK fuzzy classifier based on IFCM, KNN and data dictionary
CN110457770A (en) * 2019-07-18 2019-11-15 中国电力科学研究院有限公司 A kind of distribution transformer heavy-overload judgment method towards time scale
CN110457770B (en) * 2019-07-18 2022-07-01 中国电力科学研究院有限公司 Time scale-oriented overload judgment method for distribution transformer
CN111191687A (en) * 2019-12-14 2020-05-22 贵州电网有限责任公司 Power communication data clustering method based on improved K-means algorithm
CN111191687B (en) * 2019-12-14 2023-02-10 贵州电网有限责任公司 Power communication data clustering method based on improved K-means algorithm
CN111881502A (en) * 2020-07-27 2020-11-03 中铁二院工程集团有限责任公司 Bridge state discrimination method based on fuzzy clustering analysis
CN112183114A (en) * 2020-08-10 2021-01-05 招联消费金融有限公司 Model training and semantic integrity recognition method and device
CN112183114B (en) * 2020-08-10 2024-05-14 招联消费金融股份有限公司 Model training and semantic integrity recognition method and device

Similar Documents

Publication Publication Date Title
CN109034231A (en) The deficiency of data fuzzy clustering method of information feedback RBF network valuation
CN108763590B (en) Data clustering method based on double-variant weighted kernel FCM algorithm
CN105809672B (en) A kind of image multiple target collaboration dividing method constrained based on super-pixel and structuring
CN107203785A (en) Multipath Gaussian kernel Fuzzy c-Means Clustering Algorithm
Zhu et al. A novel clustering validity function of FCM clustering algorithm
CN107301328B (en) Cancer subtype accurate discovery and evolution analysis method based on data flow clustering
Ramathilagam et al. Extended Gaussian kernel version of fuzzy c-means in the problem of data analyzing
de Arruda et al. A complex networks approach for data clustering
CN107301430A (en) Broad sense Multivariable Fuzzy c means clustering algorithms
CN108921853B (en) Image segmentation method based on super-pixel and immune sparse spectral clustering
CN111222847A (en) Open-source community developer recommendation method based on deep learning and unsupervised clustering
Bandyopadhyay Multiobjective simulated annealing for fuzzy clustering with stability and validity
CN104615722B (en) Blended data clustering method with quickly dividing is searched for based on density
CN109086831A (en) Hybrid Clustering Algorithm based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm
CN111738346A (en) Incomplete data clustering method for generating type confrontation network estimation
CN115098699A (en) Link prediction method based on knowledge graph embedded model
CN114463848A (en) Progressive learning gait recognition method based on memory enhancement
CN102110173A (en) Improved multi-path spectral clustering method for affinity matrix
CN109409394A (en) A kind of cop-kmeans method and system based on semi-supervised clustering
Suresh et al. Data clustering using multi-objective differential evolution algorithms
CN113656707A (en) Financing product recommendation method, system, storage medium and equipment
CN108510080A (en) A kind of multi-angle metric learning method based on DWH model many-many relationship type data
CN111353525A (en) Modeling and missing value filling method for unbalanced incomplete data set
CN109671468B (en) Characteristic gene selection and cancer classification method
CN115273645B (en) Map making method for automatically clustering indoor surface elements

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181218

RJ01 Rejection of invention patent application after publication