CN102799902A - Enhanced relationship classifier based on representative samples - Google Patents
Enhanced relationship classifier based on representative samples Download PDFInfo
- Publication number
- CN102799902A CN102799902A CN201210287636XA CN201210287636A CN102799902A CN 102799902 A CN102799902 A CN 102799902A CN 201210287636X A CN201210287636X A CN 201210287636XA CN 201210287636 A CN201210287636 A CN 201210287636A CN 102799902 A CN102799902 A CN 102799902A
- Authority
- CN
- China
- Prior art keywords
- centerdot
- sample
- membership
- cluster
- degree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an enhanced relationship classifier based on representative samples. The method mainly comprises two steps: first, selecting representative samples to form a new training sample set Xnew according to the clustering membership of samples; and then, constructing a fuzzy relationship matrix R with a Phi composite operator about the clustering membership and the class membership of Xnew. The enhanced relationship classifier is mainly characterized in that (1) the matrix R can reveal the inherent logic relationship between a cluster and a class; (2) the computation complexity of the matrix R decreases from O(NLc) to O(MLc), wherein L is the number of classes, c is the number of clusters, N is the number of samples of the original dataset X, M is the number of samples of Xnew, and N is greater than M; and (3) when sufficient judgment information cannot be found in certain areas in the sample space, the classifier rejects to make strategies for test samples falling into the areas, so as to guarantee the confidence level of classification results.
Description
Technical field
The invention belongs to area of pattern recognition, particularly a kind of sorter that concerns based on cluster analysis.
Background technology
The main task of pattern-recognition is that the various forms of information that characterize affairs or phenomenon are handled and analyzed, thereby things or phenomenon are classified (or grouping) and explained.Traditional mode identification field comprises two important research themes, does not promptly have the classification of supervision type cluster and supervision type.
The classification of supervision type is intended to according to given data and type label type of designing discriminant function thereof, thereby can make correct prediction to the classification of unknown sample.These class methods are paid close attention to the classification ownership of sample, can not cause good relatively generalization to meeting sample.But such algorithm is only stressed the classification individual to sample, and has ignored the portrayal of mutual relationship between excavation and the sample of the structural knowledge that sample space is hidden, thereby has caused the interpretation and the transparent variation of classification results.Typical method comprises neural network (Neural Networks), and SVMs (Support Vector Machine, SVM) etc.No supervision type cluster is intended to utilize the similarity between sample, the sample with identical characteristics assign to same have certain meaning bunch in, thereby find the potential distributed architecture of sample, understand better and analyze data.These class methods can disclose the structure distribution of data, but the classification of the sample of can't making a strategic decision ownership.
These two class methods respectively have relative merits, and the method for therefore designing and have both advantages concurrently, overcome both shortcomings is a very important research topic.Around this problem, the researchist has proposed serial of methods.See that from design cycle these methods all are to use clustering algorithm to excavate the immanent structure of data earlier, utilize the data structure that obtains to come design category mechanism again.(Radial Basis Function Neural Network is typically not have supervision type cluster+classifier design RBFNN) to radial primary function network.RBFNN uses no supervision type clustering algorithm such as C-average or Fuzzy C average to confirm the hidden node parameter earlier; (Mean Squared Error MSE) optimizes connection weights between latent layer and the output layer to utilize square error criterion between true output and target output again.The no supervision type clustering method here is used to confirm the complexity and the parameter of network, only is the supplementary means of network design therefore, can't really play the effect that discloses the data immanent structure.So RBFNN does not have real Fusion of Clustering study and classification learning advantage separately.Study vector quantization (Learning Vector Quantization) utilizes the LVQ clustering algorithm to obtain the position and the classification information thereof of central point (being code book); And (1Neighbor-Nearest 1NN) realizes classification feature to use 1 neighbour based on these central points.In fact, these algorithms all do not pass through real training in the classifier design stage, and in other words, they do not carry out the real design of sorter.
(Fuzzy Relational Classifier FRC) has really realized the classify mutual supplement with each other's advantages of two class methods of no supervision type cluster and supervision type to the fuzzy relation sorter.FRC concerns and links up cluster and classification through making up fuzzy logic between cluster and classification, reaches the transparency and the interpretation of classification results.FRC has two significant advantages: (1) utilizes the operator computing to construct fuzzy relation matrix, thus the internal logical relationship between the cluster of disclosing and classification; (2) when there are not enough discriminant informations in some zone of sample space, sorter will be refused to make a policy to falling into this regional test sample book, thereby guarantee the confidence level of classification results.
Have an important relational matrix R in the FRC sorter, its effect is the structure and the relation of the fuzzy logic between classification of portrayal data.The correctness of this matrix has determined the validity and the robustness of FRC classification to a great extent.And in FRC, sorter uses all samples of training set to construct R, and does not use sample point distinctively according to the design feature of the input space.When data centralization contained more class overlapping region, the R that this mode is constructed can't correctly reflect classification and interstructural logical relation truly, thereby causes FRC to have following defective: classification lacked robustness; Classification performance descends; Heavy computational burden.The reason of this phenomenon is that the sample of type of falling into overlapping region makes the relational matrix R of final generation can not correctly reflect the characteristic distributions of data.
Summary of the invention
In order to overcome the problems referred to above; Through utilizing training sample distinctively; The present invention proposes a kind of enhancement mode and concern sorter (Enhanced FRC based on representative sample; EFRC), the R in this sorter can reflect the logical relation between data structure and classification more realistically, therefore can effectively improve the validity of sorter.
For realizing the foregoing invention purpose, the technical scheme that the present invention adopted is:
A kind of enhancement mode based on representative sample concerns sorter, may further comprise the steps:
Step 1: adopt unsupervised Fuzzy C average to produce cluster degree of membership matrix U and cluster centre V;
Step 2:, confirm representative sample set X according to the cluster degree of membership matrix U of all samples
New, concrete grammar is: according to cluster degree of membership set { u
Ij, X divides firmly to the training sample set, forms c sample subclass C
jAt each sample subclass C
jIn, sample is arranged according to its degree of membership value to j cluster from big to small; Sample subclass C after arrangement
jIn, select the bigger preceding λ % sample of cluster degree of membership to form representational sample set
λ ∈ (0,1);
Step 3: according to representational sample set X
NewCluster degree of membership and type label thereof, utilize the φ composition operators to set up the fuzzy relationship matrix r between cluster and classification, concrete grammar is: at first, utilize the φ composition operators to calculate representational sample set X
NewIn each sample point corresponding relationship matrix Ri:
(r
jl)
i=min(1,1-u
ji+y
li),l=1,2,…,L,j=1,2,…,c (1)
Y wherein
LiBe the degree of membership of i sample to l classification, its value is confirmed by following formula:
Secondly, can be through fuzzy composite operator with all sample corresponding relationship matrix R
iAggregate into final relational matrix
Wherein each element calculates through minimization function:
Step 4: according to the degree of membership
of this sample of distance calculation between test sample book x and the cluster centre V to all clusters
Step 5: utilize degree of membership
Classification degree of membership with relational matrix R calculating test sample book x
Wherein °
TBe the sup-t composition operators, the classification degree of membership
In each element be calculated as follows:
Step 6: Depending on the category of membership
maximize operator obtained with the test sample x class label
final output of the category number.
Method of the present invention constructs representative new data acquisition X at first through the purifying cluster from original training set
New, try hard to eliminate overlapping region between the class in the luv space; Utilize X then
NewRather than all sample comes structural matrix R, thereby realizes the classification prediction to test sample book.Provable through mathematical derivation: compare with the FRC sorter, EFRC sorter of the present invention can not only keep between cluster and classification incompatible relation constant, but also can improve the compatibility relation between cluster and classification.Experimental result shows that the matrix R in the EFRC sorter of the present invention can reflect the logical relation between data structure and classification more realistically, therefore can improve the validity of sorter well.
Description of drawings
Fig. 1 is the method flow diagram of sorter of the present invention.
Fig. 2 is a data set sample distribution synoptic diagram in the embodiment of the invention.
Fig. 3 is the relational matrix R that obtained of the inventive method and FRC method and the comparison sheet of classification accuracy rate.
Embodiment
The present invention has at first set following experiment service condition:
1, each dimensional feature of data centralization normalizes to interval [0,1] through the minimax normalization method;
2, the cluster number c among the EFRC is at [c
Min, c
Max] confirm c wherein in the scope
MinBe the number of classification, c
MaxFor
(N is a number of samples).
3, X
NewChoose in (0,1) scope with the ratio lambda of original collection X.
Be the basis with above-mentioned condition, the enhancement mode based on representative sample that the present invention proposes concerns that sorter realizes in science computing platform Matlab, and has proved the validity of this method through the experimental result among the Matlab.
The concrete grammar process flow diagram of EFRC sorter of the present invention is as shown in Figure 1.Below in conjunction with accompanying drawing, further describe practical implementation step of the present invention:
Step 1: (Fuzzy C-Means FCM) produces cluster degree of membership matrix U and cluster centre V to adopt unsupervised Fuzzy C average.
Given N training sample set X={x
1, x
2..., x
NAnd type label set ω={ ω
1, ω
2..., ω
N, x wherein
i∈ R
d, ω
i∈ 1,2 ..., L}, N are number of samples.U={u
1, u
2..., u
NThe cluster degree of membership set of expression training sample, wherein u
JiRepresent that i sample belongs to the degree of membership of j cluster centre; (1≤m<∞) is u to m
JiWeighted index, be used for controlling the fog-level of cluster result, be traditionally arranged to be 2.
The concrete execution in step of Fuzzy C mean algorithm is following: at first, and initialization cluster centre V=[v
1, v
2..., v
c] and ε is set is very little positive number; Then, upgrade degree of membership matrix U and the cluster centre V of FCM according to formula (1) and formula (2);
Repeat above-mentioned step of updating and satisfy following condition up to resulting cluster centre: | V
New-V
Old|<ε.FCM can obtain the locally optimal solution of objective function through above-mentioned alternately iterative equation.
Step 2:, confirm representative sample set X according to the cluster degree of membership of all samples
New
At training set X={x
1, x
2..., x
NGo up to carry out clustering algorithm after, according to cluster degree of membership { u
Ij, X is divided firmly, form c sample subclass C
jAt each C
jIn, sample is arranged according to its degree of membership value to j cluster from big to small; Sample subclass C after arrangement
jIn, select the bigger preceding λ % sample of cluster degree of membership to form
Here, λ ∈ (0,1), its value is X
NewRatio with original collection X.
Step 3: according to representative sample X
NewCluster degree of membership and type label thereof, utilize the φ composition operators to set up the fuzzy relationship matrix r between cluster and classification.
X
NewThe all corresponding relational matrix R of each sample point in the set
i, its concrete element value is calculated by the φ composition operators:
(r
jl)
i=min(1,1-u
ji+y
li),l=1,2,…,L,j=1,2,…,c (3)
Y wherein
LiBe the degree of membership of i sample to l classification, its value is confirmed by following formula:
Can be through fuzzy composite operator with the R of all samples correspondences
iAggregate into final relational matrix R:
Wherein each element calculates through minimization function:
The size of matrix R is c * L, respectively a corresponding c cluster centre and L classification:
R wherein
IlThe compatibility relation of representing i cluster and l classification.Its value is big more, representes between this cluster and the classification compatible more; Its value is more little, representes between this cluster and the classification incompatible more.
Step 4: accomplish classification prediction to test sample book x according to fuzzy relationship matrix r and cluster centre V.
To certain test sample book x, the assorting process of FRC comprises following three steps.At first, according to the degree of membership
of this sample of the distance calculation between sample x and the cluster centre to all clusters
Wherein
expression x is to the degree of membership of j cluster.In second step, utilize the degree of membership vector
Classification degree of membership with relational matrix R calculating sample x
Note as
when a plurality of maximal value is arranged; Sorter will make a policy to sample x refusal; That is to say that a refusal decision-making will be made sample x.The refusal decision-making means that training set contains contradictory information or do not have lacking information at special area.
Than the FRC sorter, EFRC sorter of the present invention has strengthened the robustness of classification, and has improved the degree of reliability of classification.In addition, the computation complexity of R is reduced to O (MLc) from O (NLc) in the EFRC sorter, and wherein L is the classification number, and c is the cluster number, and N is the number of samples of raw data set X, and M is X
NewNumber of samples, and N>M.
Consideration is experiment Analysis on data set shown in Figure 2, and this data set comprises three clusters and two types of samples, and the sample among cluster C1 and the C2 is respectively from classification 1, and the sample among the cluster C3 is from classification 2, and this data set exists class to a certain degree overlapping.Accompanying drawing 3 has provided on this data set, the experimental result contrast of the present invention and FRC method.Can be known by Fig. 3: the R that FRC obtains on this data set does
Element 0.88 in matrix first row is far longer than 0.01, that is to say that the compatibility relation between cluster C1 and the classification 1 is better than the compatibility relation of C1 and classification 2, and this conclusion has reflected the design feature of data set truly.But the element 0.15 (0.05) in matrix second (three) row representes that then the compatibility relation between cluster C2 (C3) and classification 1 (2) is very faint, so the relation between incorrect reflection cluster of (three) row of second among the R and classification.Based on such R, FRC has obtained 100%, 64.4% and 22.0% classification accuracy rate respectively at the cluster C1 of test set on C2 and the C3.Can obtain as drawing a conclusion from this embodiment: when the data during type of containing overlapping region, the R that FRC uses all training samples to obtain with being equal to can't correctly reflect the true distribution of data, thereby causes classification performance bad.The R that EFRC obtains on these data does
Show that cluster C1 and C2 and classification 1 have very strong compatibility relation, cluster C3 and classification 2 have stronger logical relation.Hence one can see that, and the matrix R among the EFRC can reflect the logical relation between data structure and classification more realistically, therefore can effectively improve the classification correctness of FRC.
The content of not doing in the application form of the present invention to describe in detail belongs to this area professional and technical personnel's known prior art.
Claims (1)
1. the enhancement mode based on representative sample concerns sorter, it is characterized in that may further comprise the steps:
Step 1: adopt unsupervised Fuzzy C average to produce cluster degree of membership matrix U and cluster centre V;
Step 2:, confirm representative sample set X according to the cluster degree of membership matrix U of all samples
New, concrete grammar is: according to cluster degree of membership set { u
Ij, X divides firmly to the training sample set, forms c sample subclass C
jAt each sample subclass C
jIn, sample is arranged according to its degree of membership value to j cluster from big to small; Sample subclass C after arrangement
jIn, select the bigger preceding λ % sample of cluster degree of membership to form representational sample set
λ ∈ (0,1);
Step 3: according to representational sample set X
NewCluster degree of membership and type label thereof, utilize the φ composition operators to set up the fuzzy relationship matrix r between cluster and classification, concrete grammar is: at first, utilize the φ composition operators to calculate representational sample set X
NewIn each sample point corresponding relationship matrix R
i:
(r
jl)
i=min(1,1-u
ji+y
li),l=1,2,…,L,j=1,2,…,c (1)
Y wherein
LiBe the degree of membership of i sample to l classification, its value is confirmed by following formula:
Secondly, can be through fuzzy composite operator with all sample corresponding relationship matrix R
iAggregate into final relational matrix
Wherein each element calculates through minimization function:
Step 4: according to the degree of membership of this sample of distance calculation between test sample book x and the cluster centre V to all clusters
Step 5: utilize degree of membership
Classification degree of membership with relational matrix R calculating test sample book x
Wherein °
TBe the sup-t composition operators, the classification degree of membership
In each element be calculated as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210287636XA CN102799902A (en) | 2012-08-13 | 2012-08-13 | Enhanced relationship classifier based on representative samples |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210287636XA CN102799902A (en) | 2012-08-13 | 2012-08-13 | Enhanced relationship classifier based on representative samples |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102799902A true CN102799902A (en) | 2012-11-28 |
Family
ID=47199001
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210287636XA Pending CN102799902A (en) | 2012-08-13 | 2012-08-13 | Enhanced relationship classifier based on representative samples |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102799902A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106250378A (en) * | 2015-06-08 | 2016-12-21 | 腾讯科技(深圳)有限公司 | Public identifier sorting technique and device |
CN107862344A (en) * | 2017-12-01 | 2018-03-30 | 中南大学 | A kind of image classification method |
CN110287996A (en) * | 2019-05-27 | 2019-09-27 | 湖州师范学院 | A kind of fuzzy integrated classifier with high interpretation based on collateral learning |
CN111444937A (en) * | 2020-01-15 | 2020-07-24 | 湖州师范学院 | Crowdsourcing quality improvement method based on integrated TSK fuzzy classifier |
CN113505860A (en) * | 2021-09-07 | 2021-10-15 | 天津所托瑞安汽车科技有限公司 | Screening method and device for blind area detection training set, server and storage medium |
CN115329657A (en) * | 2022-07-06 | 2022-11-11 | 中国石油化工股份有限公司 | Drilling parameter optimization method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040267770A1 (en) * | 2003-06-25 | 2004-12-30 | Lee Shih-Jong J. | Dynamic learning and knowledge representation for data mining |
CN101025729A (en) * | 2007-03-29 | 2007-08-29 | 复旦大学 | Pattern classification rcognition method based on rough support vector machine |
CN101295362A (en) * | 2007-04-28 | 2008-10-29 | 中国科学院国家天文台 | Combination supporting vector machine and pattern classification method of neighbor method |
-
2012
- 2012-08-13 CN CN201210287636XA patent/CN102799902A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040267770A1 (en) * | 2003-06-25 | 2004-12-30 | Lee Shih-Jong J. | Dynamic learning and knowledge representation for data mining |
CN101025729A (en) * | 2007-03-29 | 2007-08-29 | 复旦大学 | Pattern classification rcognition method based on rough support vector machine |
CN101295362A (en) * | 2007-04-28 | 2008-10-29 | 中国科学院国家天文台 | Combination supporting vector machine and pattern classification method of neighbor method |
Non-Patent Citations (1)
Title |
---|
蔡维玲等: "一种基于最优监督型聚类中心的关系分类器", 《传感器与微系统》, vol. 28, no. 4, 30 April 2009 (2009-04-30), pages 85 - 87 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106250378A (en) * | 2015-06-08 | 2016-12-21 | 腾讯科技(深圳)有限公司 | Public identifier sorting technique and device |
CN106250378B (en) * | 2015-06-08 | 2020-06-02 | 腾讯科技(深圳)有限公司 | Public identification classification method and device |
CN107862344A (en) * | 2017-12-01 | 2018-03-30 | 中南大学 | A kind of image classification method |
CN107862344B (en) * | 2017-12-01 | 2021-06-11 | 中南大学 | Image classification method |
CN110287996A (en) * | 2019-05-27 | 2019-09-27 | 湖州师范学院 | A kind of fuzzy integrated classifier with high interpretation based on collateral learning |
CN110287996B (en) * | 2019-05-27 | 2022-12-09 | 湖州师范学院 | Parallel learning-based fuzzy integration classifier with high interpretability |
CN111444937A (en) * | 2020-01-15 | 2020-07-24 | 湖州师范学院 | Crowdsourcing quality improvement method based on integrated TSK fuzzy classifier |
CN111444937B (en) * | 2020-01-15 | 2023-05-12 | 湖州师范学院 | Crowd-sourced quality improvement method based on integrated TSK fuzzy classifier |
CN113505860A (en) * | 2021-09-07 | 2021-10-15 | 天津所托瑞安汽车科技有限公司 | Screening method and device for blind area detection training set, server and storage medium |
CN113505860B (en) * | 2021-09-07 | 2021-12-31 | 天津所托瑞安汽车科技有限公司 | Screening method and device for blind area detection training set, server and storage medium |
CN115329657A (en) * | 2022-07-06 | 2022-11-11 | 中国石油化工股份有限公司 | Drilling parameter optimization method and device |
CN115329657B (en) * | 2022-07-06 | 2023-06-09 | 中国石油化工股份有限公司 | Drilling parameter optimization method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Du et al. | Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach | |
Kang et al. | A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence | |
CN102799902A (en) | Enhanced relationship classifier based on representative samples | |
CN107622182B (en) | Method and system for predicting local structural features of protein | |
CN106650767B (en) | Flood forecasting method based on cluster analysis and real-time correction | |
CN103116762B (en) | A kind of image classification method based on self-modulation dictionary learning | |
Chiroma et al. | Progress on artificial neural networks for big data analytics: a survey | |
Wang et al. | Fault recognition using an ensemble classifier based on Dempster–Shafer Theory | |
Khayatian et al. | Building energy retrofit index for policy making and decision support at regional and national scales | |
CN103020642A (en) | Water environment monitoring and quality-control data analysis method | |
de Barros Franco et al. | Clustering of solar energy facilities using a hybrid fuzzy c-means algorithm initialized by metaheuristics | |
Nathiya et al. | An analytical study on behavior of clusters using k means, em and k* means algorithm | |
Huang et al. | Research on urban modern architectural art based on artificial intelligence and GIS image recognition system | |
CN103426004B (en) | Model recognizing method based on error correcting output codes | |
CN105046323B (en) | Regularization-based RBF network multi-label classification method | |
CN109919236A (en) | A kind of BP neural network multi-tag classification method based on label correlation | |
CN115239127A (en) | Ecological vulnerability evaluation method, computer device, storage medium and verification method | |
Shi et al. | Multicriteria semi-supervised hyperspectral band selection based on evolutionary multitask optimization | |
Liao et al. | Weighted fuzzy kernel-clustering algorithm with adaptive differential evolution and its application on flood classification | |
CN115310589A (en) | Group identification method and system based on depth map self-supervision learning | |
Tayal et al. | Review on various clustering methods for the image data | |
CN103093239B (en) | A kind of merged point to neighborhood information build drawing method | |
Dai et al. | GCL-OSDA: Uncertainty prediction-based graph collaborative learning for open-set domain adaptation | |
Orazi et al. | A regional Kohonen map of financial inclusion and related macroeconomic variables | |
Huang et al. | Consistency regularization for deep semi-supervised clustering with pairwise constraints |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20121128 |