KR101866522B1

KR101866522B1 - Object clustering method for image segmentation

Info

Publication number: KR101866522B1
Application number: KR1020160172271A
Authority: KR
Inventors: 전광길
Original assignee: 인천대학교 산학협력단
Priority date: 2016-12-16
Filing date: 2016-12-16
Publication date: 2018-06-12

Abstract

The present invention relates to an object clustering method to segment an image, capable of increasing correctness of clustering. According to the present invention, the object clustering method to segment an image comprises: a step (a) of setting the number of cluster prototypes, a fuzzing parameter, and an interrupt condition; a step (b) of initializing the cluster prototypes; a step (c) of calculating fuzzy membership values for each object with respect to clusters; a step (d) of assigning an approximate region corresponding to a rough set of each object based on the fuzzy membership value; a step (e) of calculating a prototype for each cluster based on a weight parameter dependent on distribution characteristics of each cluster; a step (f) of determining the prototypes determined in the current repetitive step as the prototype for each cluster and, if not, repeating the steps (c) to (e) until the prototype for each cluster is determined; and a step (g) of assigning each object to the corresponding cluster in accordance with the fuzzy membership value based on the determined prototype for each cluster.

Description

{OBJECT CLUSTERING METHOD FOR IMAGE SEGMENTATION}

본 발명은 이미지 분할을 위한 오브젝트 클러스터링 방법에 관한 것이다.The present invention relates to an object clustering method for image segmentation.

클러스터링 분석은 데이터세트(dataset)의 고유한 구조적인 복잡성을 발견하기 위한 기본적인 기법이다. 클러스터링 분석의 주된 태스크는 동일한 그룹 내의 오브젝트들이 상이한 그룹들에 속한 오브젝트들보다 서로에 대해 더 많은 유사성을 공유하도록 레이블(label)이 없는 데이터세트를 몇몇 그룹들로 분할하는 것이다. 클러스터링 프로세스는 인간이 오브젝트들을 상이한 클래스들 및 카테고리들에 할당하는 방법을 모방하는 것을 시도한다. 데이터 마이닝(Mitra, 2003)에서 가장 널리 사용되는 방법들 중 하나인 클러스터링 분석은 패턴 인식, 머신 러닝, 웹 마이닝(Mecca, 2007), 생물학(Xu, 2010); Valafar(2002), 이미지 프로세싱(Jain, 1988), 및 마켓 분할(Bigus, 1996)과 같은, 넓은 범위의 엔지니어링에 적용되었다. 이미지 분할은 이미지 프로세싱과 컴퓨터 비전에서 주된 기법들 중 하나이다. 이미지 분할의 태스크는 이미지를 그레이 레벨, 컬러 및 질감과 같은 동일한 특징들을 갖는 다수의 중복되지 않는 영역들로 분할하는 것이기 때문에, 많은 클러스터링-기반 방법들이 이미지 분할을 위해 제안되었다.Clustering analysis is a fundamental technique for discovering the inherent structural complexity of a dataset. The main task of the clustering analysis is to divide a data set without labels into several groups so that objects in the same group share more similarity with each other than objects belonging to different groups. The clustering process attempts to mimic how a person assigns objects to different classes and categories. One of the most widely used methods in data mining (Mitra, 2003) is pattern recognition, machine learning, web mining (Mecca, 2007), biology (Xu, 2010); It has been applied to a wide range of engineering such as Valafar (2002), image processing (Jain, 1988), and market segmentation (Bigus, 1996). Image segmentation is one of the main techniques in image processing and computer vision. Many clustering-based methods have been proposed for image segmentation, because the task of image segmentation is to partition the image into a number of non-overlapping areas with the same characteristics such as gray level, color and texture.

k-평균(k-means) 알고리즘은 고전적인 분할 클러스터링 방법으로서 간주되었다. k-평균 알고리즘(Tou, 1974)에 있어서, 각 오브젝트는 하나의 클러스터에 할당된다. 다양한 영역들의 경계들은 각 오브젝트가 단지 하나의 클러스터에 할당되기 때문에 크리스프(crisp)하거나 하드(hard)하다. 그것은 빠른 수렴 속도의 이점을 나타내지만, 중복 또는 기울기를 갖는(skewed) 데이터 분포 문제들(Xiong, 2009)을 처리하는데 있어서 원하는 클러스터링 결과가 쉽게 달성될 수 없다. 한편, 실세계 데이터세트들은 항상 불확실성 및 중복된 경계들을 갖는 것을 특징으로 한다. 실제적인 응용의 요구를 정확히 충족시킬 수 있는 융통성있는 정보 처리 능력을 갖는 소프트 컴퓨팅 기법에 대한 상당한 요구가 존재한다.The k-means algorithm is considered as a classical partitioned clustering method. In the k-means algorithm (Tou, 1974), each object is assigned to one cluster. The boundaries of the various regions are crisp or hard because each object is assigned to only one cluster. It represents the advantage of fast convergence speed, but the desired clustering results can not be easily achieved in handling redundant or skewed data distribution problems (Xiong, 2009). On the other hand, real world datasets are always characterized by uncertainty and overlapping boundaries. There is a significant need for a soft computing technique with flexible information processing capabilities that can accurately meet the needs of real applications.

실제 적용에 있어서, 주어진 패턴 세트의 완전한 정보를 획득하는 것은 불가능할 수 있다. 불완전한 정보는 다양한 패턴 인식 방법들을 사용함으로써 패턴 세트에 대한 불완전한 표현들을 초래할 수 있다. 더욱이, 이미지 분석 분야에 있어서, 영역 또는 영역들 간의 경계의 개념에 대한 정확한 정의가 존재하지 않는다. 이것은 패턴 인식 방법들을 설계할 때 연구자들로 하여금 다양한 유형의 불확실성을 고려하도록 촉발하였다.In practical applications, it may not be possible to obtain complete information of a given pattern set. Incomplete information can lead to incomplete representations of the pattern set by using various pattern recognition methods. Moreover, in the field of image analysis, there is no precise definition of the concept of boundaries between regions or regions. This prompted researchers to consider various types of uncertainty when designing pattern recognition methods.

이미지 분할의 가장 일반적인 방법들 중 하나는 퍼지 클러스터링이고, 이것은 몇몇 경우에, 하드(hard) 클러스터링보다 더 많은 정보를 보유할 수 있다. 상이한 영역들에 속한 이미지의 픽셀 값들의 범위는 일반적으로 중복되기 때문에, 퍼지 클러스터링 기법들은 픽셀들을 식별하기 위한 적합한 그리고 실제적인 선택인 것으로 보인다. 널리 퍼진 소프트 컴퓨팅 기법으로서, 퍼지셋 이론(Dubois(2012); Pedrycz(2012))은 실제 응용에서 널리 사용되고(Vieira(2012); Bermudez(2012); Niros(2012)), 퍼지 c-평균 알고리즘(FCM: fuzzy c-means algorithm)(Bezdek, 1981)에의 개발을 위하여 k-평균 알고리즘의 프레임워크에 부가되었다. FCM에서, 오브젝트는 동시에 상이한 멤버십 정도(membership degree)를 갖는 많은 클러스터들에 할당될 수 있고, 이것은 k-평균 알고리즘에서의 종속 요건을 완화한다. 이 조건 하에서, FCM은 중복 환경을 처리하는데 있어서 바람직한 결과를 획득할 수 있었다. FCM의 약점은, 오브젝트가 노이즈가 있는 오브젝트인 경우, 결과로서 생성되는 퍼지 멤버십 값들이 오브젝트가 클러스터들에 속하는 실제 소속도(belonging degree)에 항상 대응하지 않을 수 있다는 것이다.One of the most common methods of image segmentation is fuzzy clustering, which in some cases can hold more information than hard clustering. Since the range of pixel values of an image belonging to different regions is generally redundant, fuzzy clustering techniques appear to be a suitable and practical choice for identifying pixels. As a widespread soft computing technique, fuzzy set theory (Dubois (2012); Pedrycz (2012)) is widely used in real applications (Vieira (2012); Bermudez (2012); Niros (2012) FCM: fuzzy c-means algorithm) (Bezdek, 1981). In an FCM, an object may be assigned to many clusters simultaneously having different membership degrees, which alleviates the dependency requirement in the k-means algorithm. Under this condition, the FCM could obtain desirable results in handling the redundant environment. A weakness of the FCM is that if the object is an object with noise, the resulting fuzzy membership values may not always correspond to the actual belonging degree to which the object belongs to the clusters.

러프셋(rough sets) 이론은 불확실성 또는 모호성을 기술하기 위한 다른 효과적인 소프트 컴퓨팅 기법이다. 그것은 부정확한 세트를 위한 근사 정의를 형성하기 위하여 하한 근사(lower approximation) 및 상한 근사(upper approximation)의 쌍을 이용한다. 최근에, 러프셋 이론은 어떤 러프셋-기반 클러스터링 알고리즘들(Lingras(2004); Mitra(2004); Maji(2011))을 형성하는 클러스터링 알고리즘들과 결합되었다. 러프셋 이론에 기반하여(Tiwari(2012); Chen(2012)), 클러스터는 프로토타입(prototype) 및 한 쌍의 하한 근사와 상한 근사에 의해 함께 기술된다. 하한 근사 내의 오브젝트는 명백히 클러스터에 속한다는 것과 관련하여 더 많은 확실성을 가지고 있는 반면에, 상한 근사 내의 오브젝트는 확실한 클러스터에 할당된다는 것과 관련하여 자신감을 덜 가지고 있다. 상기 근사 쌍들은 인식의 특성을 설명하는데 더 적합한 두 측면들로부터 부정확한 세트의 실제 경계를 찾는 것을 목표로 한다. 이 이유로 인하여, 러프셋-기반 클러스터링 알고리즘(Mitra, 2004, 2006)은 클러스터가 실제 데이터세트에서 불확실하거나 모호한 경우 두 근사들의 도움을 받아 불확실성을 잘 처리할 수 있었다.The rough sets theory is another effective soft computing technique for describing uncertainty or ambiguity. It uses a pair of lower approximations and upper approximations to form an approximation definition for the incorrect set. Recently, rough set theory has been combined with clustering algorithms that form some rough set-based clustering algorithms (Lingras (2004); Mitra (2004); Maji (2011)). Based on the rough set theory (Tiwari (2012); Chen (2012)), clusters are described together by prototype and pair of lower and upper bound approximations. Objects within the lower bound approximation have more certainty with respect to being explicitly in the cluster, while objects within the upper bound approximation have less confidence in being assigned to a definite cluster. The approximate pairs are aimed at finding an incorrect set of actual boundaries from two aspects that are more suitable for describing the properties of the recognition. For this reason, the rough set-based clustering algorithm (Mitra, 2004, 2006) was able to handle uncertainty well with the help of two approximations when the cluster was uncertain or ambiguous in the actual dataset.

소프트 컴퓨팅 기법들은 패턴 인식에서 불확실성 및 중복 문제들을 처리하는데 유리한 융통성이 있는 정보 처리 능력을 보여준다. 퍼지셋(fuzzy sets)과 러프셋(rough sets)은 두 가지 유형의 소프트 컴퓨팅 방법들이고, 양자는 그들의 각각의 이점들을 갖는다. 퍼지셋은 중복 분할을 해결하는데 양호한 능력을 가지고 있고 러프셋은 불확실성과 모호함을 잘 처리한다. 퍼지셋과 러프셋을 결합하는 것은 불확실성을 관리하는데 있어서 중요한 방향을 제공한다. 그들의 이점들을 결합함으로써, 러프-퍼지 c-평균 알고리즘(RFCM: rough-fuzzy c-means algorithm)이라고 불리우는 통합된 기법이 제안되었다(Mitra(2006)); Maji(2007a). 확률적(probabilistic) 그리고 가능적(possibilistic) 멤버십 함수들 양자의 이점들을 함께 결합하기 위하여, 러프-퍼지 가능적 c-평균(RFPCM: rough-fuzzy possibilistic c-menas)이라고 불리우는 새로운 하이브리드 클러스터링 알고리즘이 제안되었다(Maji(2007b)). 그들 간에 상관을 발견하는 것에 대한 많은 노력은 여전히 수행되고 있는 중이었다(Mitra(2006); Peters(2006); Pawlak(1991); Maji(2007a,a,b); Zhou(2011); Mitra(2010)).Soft computing techniques show the flexibility of information processing ability to deal with uncertainty and duplication problems in pattern recognition. Fuzzy sets and rough sets are two types of soft computing methods, both of which have their respective advantages. A fuzzy set has good ability to resolve redundant partitions and a rough set handles uncertainty and ambiguity well. Combining the fuzzy set with the rough set provides an important direction in managing uncertainty. By combining their advantages, an integrated technique called a rough-fuzzy c-means algorithm (RFCM) has been proposed (Mitra (2006)); Maji (2007a). In order to combine the advantages of both probabilistic and possibilistic membership functions together, a new hybrid clustering algorithm called rough-fuzzy possibilistic c-menas (RFPCM) (Maji (2007b)). Much of the effort to find correlations among them was still underway (Mitra (2006); Peters (2006); Pawlak (1991); Maji (2007a, )).

패턴 인식의 분야, 특히 클러스터링에서 불확실성을 표현하고 관리하는 것이 시도된다. 주어진 데이터세트에 내장된 불확실성 정보를 적합하게 식별할 수 있다면, 성능 개선은 현실이 될 수 있다. 퍼지셋과 러프셋을 포함하는 소프트 컴퓨팅 기법을 통합함으로써, 하이브리드 클러스터링 알고리즘들은 클러스터링 프로세스를 통해 정보 입자들의 효과적인 기술로 이어진다. 러프셋-기반 클러스터링 알고리즘들이 효과적인 소프트 컴퓨팅 방법들로서 고려되었을지라도, 일반적인 문제는 가중 파라미터(weighted parameter) 선택이다. 새로운 프로토타입을 갱신할 때 하한 근사와 상한 근사의 중요도를 제어하는 가중 파라미터는 항상 수동으로 설정된다. 아무런 사전 지식도 제공되지 않는 조건 하에서, 적합한 가중 파라미터는 용이하게 선택될 수 없다. 더욱이, 데이터세트의 분포 특성이 무엇일지라도, 각 클러스터의 하한 근사와 상한 근사의 중요도의 비율은 보통 동일하게 설정된다. 하지만, 콘텍스트(context)는 클러스터 마다 변하기 때문에, 상이한 프로토타입들을 갱신할 때 동일한 가중 파라미터를 사용하는 것은 합리적이지 않다.It is attempted to express and manage uncertainties in the field of pattern recognition, especially clustering. If the uncertainty information embedded in a given data set can be appropriately identified, the performance improvement can become a reality. By integrating soft computing techniques, including fuzzy sets and rough sets, hybrid clustering algorithms lead to effective techniques for information particles through a clustering process. Although rough set-based clustering algorithms have been considered as effective soft computing methods, a common problem is weighted parameter selection. When updating a new prototype, weighting parameters that control the importance of the lower bound approximation and the upper bound approximation are always set manually. Under conditions where no prior knowledge is provided, suitable weighting parameters can not be readily selected. Furthermore, whatever the distribution characteristics of the data set, the ratio of the lower bound approximation to the upper bound approximation of each cluster is usually set to be the same. However, since the context varies from cluster to cluster, it is not reasonable to use the same weighting parameter when updating different prototypes.

[1] S. Mitra, T. Acharya, Data mining: multimedia, soft comput-ing, and bioinformatics, John Wiley, New York, 2003.[1] S. Mitra, T. Acharya, Data mining: multimedia, soft comput- ing, and bioinformatics, John Wiley, New York, 2003. [2] G. Mecca, S. Raunich, A. Rappalardo, A new algorithm for clustering search results, Data and Knowledge Engineering62(2007) 504-522.[2] G. Mecca, S. Raunich, A. Rappalardo, A new algorithm for clustering search results, Data and Knowledge Engineering 62 (2007) 504-522. [3] R. Xu, D. Wunsch, II, Clustering algorithm in biomedical research: areview, IEEE Reviewsin Biomedical Engineer-ing3(2010) 120-154.[3] R. Xu, D. Wunsch, II, Clustering algorithm in biomedical research: AreView, IEEE Reviewsin Biomedical Engineer-ing3 (2010) 120-154. [4] F. Valafar, Pattern recognition techniques in microarray data analysis: asurvey, Annals of New York Academy of Sciences980(2002) 41-64.[4] F. Valafar, Pattern recognition techniques in microarray data analysis: Asurvey, Annals of New York Academy of Sciences 980 (2002) 41-64. [5] A.K. Jain, R.C. Dubes, Algorithms for clustering data, Upper Saddle River, NJ, USA: Prentice Hall, 1988.[5] A.K. Jain, R.C. Dubes, Algorithms for clustering data, Upper Saddle River, NJ, USA: Prentice Hall, 1988. [6] J. P. Bigus, Data mining with neural networks, McGraw-Hill, 1996.[6] J. P. Bigus, Data mining with neural networks, McGraw-Hill, 1996. [7] J. T. Tou, R. C. Gonzalez, Pattern recognition principles, Addison-Wesley, London, 1974.[7] J. T. Tou, R. C. Gonzalez, Pattern recognition principles, Addison-Wesley, London, 1974. [8] H. Xiong, J. J. Wu, J. Chen, K-means clustering versus validation measures: a data-distribution perspective, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 39(2009) 318-331.[8] H. Xiong, J. J. Wu, J. Chen, K-means clustering versus validation measures: a data-distribution perspective, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 39 (2009) 318-331. [9] D. Dubois, H. Prade, Gradualness, uncertainty and bipolarity: making sense of fuzzy sets, Fuzzy Sets and Systems 192(2012) 3-24.[9] D. Dubois, H. Prade, Gradualness, uncertainty and bipolarity: making sense of fuzzy sets, Fuzzy Sets and Systems. [10] A. Pedrycz, K. Hirota, W. Pedrycz, F. Y. Dong, Granular representation and granular computing with fuzzy sets, Fuzzy Sets and Systems 203(2012) 17-32.[10] A. Pedrycz, K. Hirota, W. Pedrycz, F. Y. Dong, Granular representation and granular computing with fuzzy sets, Fuzzy Sets and Systems 203 (2012) 17-32. [11] S. M. Vieira, J. M. C. Sousa, U. Kaymak, Fuzzy criteria for feature selection, Fuzzy Sets and Systems 189(2012) 1-18.[11] S. M. Vieira, J. M. C. Sousa, U. Kaymak, Fuzzy criteria for feature selection, Fuzzy Sets and Systems 189 (2012) 1-18. [12] J. D. Bermudez, J. V. Segura, E. Vercher, A multi-objective genetic algorithm for cardinality constrained fuzzy portfolio selection, Fuzzy Sets and Systems 188(2012) 16-26.[12] J. D. Bermudez, J. V. Segura, E. Vercher, A multi-objective genetic algorithm for cardinality constrained fuzzy portfolio selection, Fuzzy Sets and Systems 188 (2012) 16-26. [13] A. D. Niros, G. E. Tsekouras, A novel training algorithm for RBF neural network using a hybrid clustering approach, Fuzzy Sets and Systems 193(2012) 62-84.[13] A. D. Niros, G. E. Tsekouras, A novel training algorithm for RBF neural networks using a hybrid clustering approach, Fuzzy Sets and Systems 193 (2012) 62-84. [14] J. C. Bezdek, Pattern recognition with fuzzy objective function algorithms, Norwell, MA, USA: Kluwer Academic, 1981.[14] J. C. Bezdek, Pattern recognition with fuzzy objective function algorithms, Norwell, MA, USA: Kluwer Academic, 1981. [15] P. Lingras, C. West, Interval set clustering of web users with rough k-means, Journal of Intelligent Information Systems23(2004) 5-16.[15] P. Lingras, C. West, Interval set clustering of web users with rough k-means, Journal of Intelligent Information Systems 23 (2004) 5-16. [16] S. Mitra, An evolutionary rough partitive clustering, Pattern Recognition Letters25(2004) 1439-1449.[16] S. Mitra, An evolutionary rough partitive clustering, Pattern Recognition Letters 25 (2004) 1439-1449. [17] P. Maji, Fuzzy-rough supervised attribute clustering algorithm and classification of microarray data, IEEE Transactions on Systems, Man, and Cybernetics-PartB:Cybernetics 41(2011) 222-233.[17] P. Maji, Fuzzy-rough supervised attribute clustering algorithm and classification of microarray data, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 41 (2011) 222-233. [18] S. P. Tiwari, A. K. Srivastava, Fuzzy rough set, fuzzy preorders and fuzzy topologies, Fuzzy Sets and Systems Available online in Press, 2012.[18] S. P. Tiwari, A. K. Srivastava, Fuzzy rough set, Fuzzy preorders and fuzzy topologies, Fuzzy Sets and Systems. [19] D. G. Chen, S. Kwong, Q. He, H. Wang, Geometrical interpretation and application of membership functions with fuzzy rough sets, Fuzzy Sets and Systems 193(2012) 122-135.[19] D. G. Chen, S. Kwong, Q. He, H. Wang, Geometrical interpretation and application of membership functions with fuzzy rough sets, Fuzzy Sets and Systems 193 (2012) 122-135. [20] S. Mitra, H. Banka, W. Pedrycz, Rough-fuzzy collaborative clustering, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics36(2006) 795-805.[20] S. Mitra, H. Bank, W. Pedrycz, Rough-fuzzy collaborative clustering, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics36 (2006) 795-805. [21] G. Peters, Some refinements of rough k-means clustering, Pattern Recognition 39(2006) 1481-1491.[21] G. Peters, Some refinements of rough k-means clustering, Pattern Recognition 39 (2006) 1481-1491. [22] Z. Pawlak, Rough sets, Theoretical Aspects of Reasoning About Data, Dordrecht, The Netherlands, Kluwer, 1991.[22] Z. Pawlak, Rough sets, Theoretical Aspects of Reasoning About Data, Dordrecht, The Netherlands, Kluwer, 1991. [23] P. Maji, S. K. Pal, RFCM: a hybrid clustering algorithm using rough and fuzzy sets, Fundamenta Informaticae 80(2007) 475-496.[23] P. Maji, S. K. Pal, RFCM: A hybrid clustering algorithm using rough and fuzzy sets, Fundamenta Informaticae 80 (2007) 475-496. [24] P. Maji and S. K. Pal, Rough set based generalized fuzzy c-means algorithm and quantitative indices, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 37(2007) 1529-1540.[24] P. Maji and S. K. Pal, Rough set based generalized fuzzy c-means algorithm and quantitative indices, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 37 (2007) 1529-1540. [25] J. Zhou, W. Pedrycz, D. Q. Miao, Shadowed sets in the characterization of rough-fuzzy clustering, Pattern Recognition 44(2011) 1738-1749.[25] J. Zhou, W. Pedrycz, D. Q. Miao, Shadowed sets of characterization of rough-fuzzy clustering, Pattern Recognition 44 (2011) 1738-1749. [26] S. Mitra, W. Pedrycz, B. Barman, Shadowed c-means: integrating fuzzy and rough clustering, Pattern Recognition 43(2010) 1282-1291.[26] S. Mitra, W. Pedrycz, B. Barman, Shadowed c-means: integrating fuzzy and rough clustering, Pattern Recognition 43 (2010) 1282-1291. [27] H.J. Sun, S.G. Wang,Q.S. Jiang, FCM-based model selection algorithms for determining the number of clusters, Pattern Recognition 37(2004) 2027-2037.[27] H.J. Sun, S.G. Wang, Q.S. Jiang, FCM-based model selection algorithms for determining the number of clusters, Pattern Recognition 37 (2004) 2027-2037. [28] G.E. Tsekouras, H. Sarimveis, A new approach for measuring the validity of the fuzzy c-means algorithm, Advance in Engineering Software 35(2004) 567-575.[28] G.E. Tsekouras, H. Sarimveis, A new approach for measuring the validity of the fuzzy c-means algorithm, Advance in Engineering Software 35 (2004) 567-575. [29] K. L. Wu, M. S. Yang, A cluster validity index for fuzzy clustering, Pattern Recognition Letters 26(2005)1275-1291. J.C. Bezdek, Numerical taxonomy with fuzzy sets, Journal of Mathematical Biology1(1974) 57-71.[29] K. L. Wu, M. S. Yang, A cluster validity index for fuzzy clustering, Pattern Recognition Letters 26 (2005) 1275-1291. J.C. Bezdek, Numerical taxonomy with fuzzy sets, Journal of Mathematical Biology 1 (1974) 57-71. [30] E. Trauwaert, On the meaning of Dunn's partition coefficient for fuzzy clusters, Fuzzy Sets and Systems 25(1988) 217-242.[30] E. Trauwaert, On the meaning of Dunn's partition coefficient for fuzzy clusters, Fuzzy Sets and Systems 25 (1988) 217-242. [31] R. McGill, J.W. Tukey,W.A. Larsen, Variations of box plots, The American Statistician 32(1978) 12-16.[31] R. McGill, J.W. Tukey, W.A. Larsen, Variations of box plots, The American Statistician 32 (1978) 12-16. [32] M.K. Pakhira, S. Bandyopadhyay, U. Maulik, Validity index for crisp and fuzzy clusters, Pattern Recognition37(2004) 487-501.[32] M.K. Pakhira, S. Bandyopadhyay, U. Maulik, Validity index for crisp and fuzzy clusters, Pattern Recognition37 (2004) 487-501. [33] W.Wang, Y.J. Zhang, On fuzzy cluster validity indices, fuzzy sets and systems158(2007) 2095-2117.[33] W. Wang, Y.J. Zhang, On fuzzy cluster validity indices, fuzzy sets and systems158 (2007) 2095-2117. [34] K. Tasdemir, E. Merenyi, A validity index for prototype-based clustering of data sets with complex cluster structures, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 41(2011) 1039-1053.[34] K. Tasdemir, E. Merenyi, A validity index for prototype-based clustering of data sets with complex cluster structures, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 41 (2011) 1039-1053. [35] R. Xu, J. Xu, D. C. Wunsch, II, A comparison study of validity indices on swarm-intelligence-based clustering, IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 15(2012) 1243-1256.IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 15 (2012) 1243-7. [14] R. Xu, J. Xu, DC Wunsch, 1256. [36] J.C. Bezdek, Mathematical models for systematic and taxonomy, in: Proceedings of the Eight International Conference on Numerical Taxonomy, 1975, pp. 143-166.[36] J.C. Bezdek, Mathematical models for systematic and taxonomy, in: Proceedings of the Eight International Conference on Numerical Taxonomy, 1975, pp. 143-166. [37] L. Hubert, P. Arabie, Comparing partitions, Journal of Classification 2(1985) 193-218.[37] L. Hubert, P. Arabie, Comparing partitions, Journal of Classification 2 (1985) 193-218. [38] A. Ben-Hur, I. Guyon, Detecting stable clusters using principal component analysis in Methods in Molecular Biology, Humana Press, Totowa, NJ, 2003.[38] A. Ben-Hur, I. Guyon, Detecting stable clusters using principal component analysis in Methods in Molecular Biology, Humana Press, Totowa, NJ, 2003. [39] X.L. Xie, G.A. Beni, A validity measure for fuzzy clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 13(1991) 841-847.[39] X.L. Xie, G.A. Me, A validity measure for fuzzy clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (1991) 841-847. [40] A. Asuncion, D. Newman, UCI machine learning repository, 2007. [Online]. Available: http://www.ics.uci.edu/ mlearn/MLRepository.html[40] A. Asuncion, D. Newman, UCI machine learning repository, 2007. [Online]. Available: http://www.ics.uci.edu/ mlearn / MLRepository.html

본 발명이 해결하고자 하는 과제는 각 클러스터의 분포 특징에 기반하여 가중 파라미터를 적응적으로 적용함으로써 클러스터링의 정확도를 향상시킬 수 있는, 이미지 분할을 위한 오브젝트 클러스터링 방법을 제공하는 것이다.The object of the present invention is to provide an object clustering method for image segmentation, which improves the accuracy of clustering by adaptively applying weighting parameters based on distribution characteristics of each cluster.

상기 과제를 해결하기 위한 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법은,According to an aspect of the present invention, there is provided an object clustering method for image segmentation,

(a) 클러스터 프로토타입들의 수(c), 퍼지화 파라미터(m) 및 중단 조건(

)을 설정하는 단계;(a) the number of cluster prototypes (c), the fuzzy parameter (m)

);

(b) 클러스터 프로토타입들을 초기화하는 단계;(b) initializing cluster prototypes;

(c) 클러스터들에 대한 각 오브젝트를 위한 퍼지 멤버십 값들을 계산하는 단계;(c) calculating fuzzy membership values for each object for the clusters;

(d) 상기 퍼지 멤버십 값에 기반하여 각 오브젝트를 러프셋의 대응하는 근사 영역에 할당하는 단계;(d) assigning each object to a corresponding approximate region of the rough set based on the fuzzy membership value;

(e) 각 클러스터의 분포 특징에 의존하는 가중 파라미터에 기반하여 각 클러스터에 대한 프로토타입을 계산하는 단계;(e) computing a prototype for each cluster based on a weighting parameter that is dependent on a distribution characteristic of each cluster;

(f)

인 경우, 현재의 반복 단계에서 결정된 프로토타입들을 각 클러스터에 대한 프로토타입으로 결정하고, 그렇지 않은 경우, 각 클러스터에 대한 프로토타입이 결정될 때까지 단계 (c) 내지 단계 (e)를 반복하는 단계; 및(f)

, Determining prototypes determined in the current iteration step as prototypes for each cluster, and if not, repeating steps (c) through (e) until a prototype for each cluster is determined; And

(g) 상기 결정된 각 클러스터에 대한 프로토타입에 기반하여 퍼지 멤버십 값에 따라 각 오브젝트를 대응하는 클러스터에 할당하는 단계를 포함하고,(g) assigning each object to a corresponding cluster according to a fuzzy membership value based on the determined prototype for each cluster,

상기에서

는 클러스터 프로토타입들의 벡터들이고,

는 현재의 반복 단계에서 획득된 프로토타입들이 이전의 반복 단계에서 생성된 프로토타입들과 동일하다는 것을 의미한다.In the above,

Are vectors of cluster prototypes,

Means that the prototypes obtained in the current iteration step are the same as the prototypes generated in the previous iteration step.

본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법에 있어서, 상기 각 클러스터의 분포 특징은 각 클러스터의 밀집도(compactness)를 포함할 수 있다.In an object clustering method for image segmentation according to an embodiment of the present invention, the distribution characteristic of each cluster may include compactness of each cluster.

또한, 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법에 있어서, 상기 단계 (e)에서 각 클러스터에 대한 프로토타입은,Also, in the object clustering method for image segmentation according to an embodiment of the present invention, the prototype for each cluster in step (e)

에 의해 계산되고,

Lt; / RTI >

상기에서

이며,In the above,

Lt;

이고,

ego,

이며,

Lt;

상기에서

는 클러스터 i의 가중 파라미터로서,

에 의해 계산되고,In the above,

Is a weighting parameter of cluster i,

Lt; / RTI >

이며,

Lt;

상기에서

는 클러스터 i를 나타내고,

이며,

는 클러스터 i의 하한 근사이고,

는 클러스터 i의 경계 영역이며,

는 오브젝트 j가 클러스터 i에 속하는 정도를 나타내는 퍼지 멤버십 값이고,

는 오브젝트 j를 나타내며,

이고, n은 양의 정수이며,

는 클러스터 i의 하한 근사에 있는 오브젝트들의 수일 수 있다.In the above,

Represents cluster i,

Lt;

Is the lower bound approximation of cluster i,

Is the boundary region of the cluster i,

Is a fuzzy membership value indicating the degree to which object j belongs to cluster i,

Represents an object j,

, N is a positive integer,

May be the number of objects in the lower bound approximation of cluster i.

본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법에 의하면, 각 클러스터의 분포 특징에 기반하여 가중 파라미터를 적응적으로 적용함으로써 클러스터링의 정확도를 향상시킬 수 있다.According to the object clustering method for image segmentation according to an embodiment of the present invention, the accuracy of clustering can be improved by adaptively applying the weighting parameters based on the distribution characteristics of each cluster.

도 1은 클러스터의 하한 근사 및 상한 근사를 도시한 도면.
도 2는 RFCM II에 의한 마지막 반복 단계에서 상이한 가중 파라미터들에 따른 분할 결과의 비주얼화를 도시한 도면으로, 도 2a는

=0.51, 도 2b는

=0.6, 도 2c는

=0.7, 도 2d는

=0.8, 도 2e는

=0.9, 도 2f는

=0.99인 경우의 도면.
도 3은 RFCM II에 의한 합성 데이터세트에 대한 상이한 가중 파라미터들에 따른 CA, ARI 및 MS의 상자 그림으로서, 도 3a는 CA, 도 3b는 ARI, 도 3c는 MS의 도면.
도 4는 RFCM II에 의한 각 클러스터의 하한 근사와 상한 근사의 비주얼화를 도시한 도면으로, 도 4a는 최종 분할 결과, 도 4b는 3개의 클러스터들의 하한 근사들, 도 4c는 빨간색 클러스터의 경계 영역, 도 4d는 녹색 클러스터의 경계 영역, 도 4e는 파란색 클러스터의 경계 영역을 도시한 도면.
도 5는 4개의 합성 데이터세트의 산포도.
도 6은 4개의 테스트 문제들을 해결하는데 있어서 각 알고리즘에 의해 획득된 CA, ARI 및 MS의 통계값들을 도시한 도면으로서, 도 6a는 CA, 도 6b는 ARI, 도 6c는 MS의 도면.
도 7은 합성 데이터세트 2에 대한 ARFCM에 의해 획득된 각 발생에 따른 가중 파라미터들의 변화를 도시한 도면.
도 8은 4개의 알고리즘에 의해 획득된 통계값들을 도시한 도면으로서, 도 8a는 8 UCI 데이터세트를 해결하는데 있어서 4개의 알고리즘들에 의해 획득된 CA의 통계값들, 도 8b는 8 UCI 데이터세트를 해결하는데 있어서 4개의 알고리즘들에 의해 획득된 ARI의 통계값들, 도 8c는 8 UCI 데이터세트를 해결하는데 있어서 4개의 알고리즘들에 의해 획득된 MS의 통계값들을 도시한 도면.
도 9는 4개의 알고리즘들에 의해 획득된 분할 결과들을 도시한 도면으로서, 도 9a는 하우스(house), 도 9b는 라이스(rice), 도 9c는 카메라맨(cameraman), 도 9d는 브레인(brain), 도 9e는 두번째 브레인 MR 이미지, 도 9f는 세번째 브레인 MR 이지미를 도시한 도면.
도 10은 상이한 크기를 갖는 4개의 합성 데이터세트의 산포도.
도 11은 상이한 크기를 갖는 합성 데이터세트를 해결하는데 있어서 4개의 알고리즘들에 의한 실행 시간의 평균값을 도시한 도면.
도 12는 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법의 흐름도.Brief Description of the Drawings Fig. 1 is a diagram showing a lower limit approximation and an upper limit approximation of a cluster. Fig.
Figure 2 shows the visualization of the result of the division according to different weighting parameters in the last iteration step according to RFCM II,

= 0.51, Figure 2b

= 0.6, Fig. 2C

= 0.7, Figure 2d

= 0.8, Figure 2e shows

= 0.9, Figure 2f

= 0.99.
FIG. 3 is a box diagram of CA, ARI and MS according to different weighting parameters for the composite data set by RFCM II; FIG. 3A is a CA, FIG. 3B is an ARI, and FIG.
FIG. 4A shows the final division result, FIG. 4B shows the lower bound approximations of the three clusters, FIG. 4C shows the boundary region of the red cluster, FIG. 4D is a boundary region of a green cluster, and FIG. 4E is a boundary region of a blue cluster.
5 is a scatter diagram of four synthetic data sets.
FIG. 6 shows statistical values of CA, ARI and MS obtained by each algorithm in solving four test problems, FIG. 6A being a CA, FIG. 6B being an ARI, and FIG.
7 shows a variation of weighting parameters according to each occurrence obtained by the ARFCM for composite data set 2;
FIG. 8 shows statistical values obtained by four algorithms. FIG. 8A shows statistical values of CA obtained by four algorithms in solving 8 UCI data sets, FIG. 8B shows statistical values of 8 UCI data sets Fig. 8C shows statistical values of an MS obtained by four algorithms in solving 8 UCI data sets; Fig.
9A is a house, FIG. 9B is a rice, FIG. 9C is a cameraman, FIG. 9D is a brain, and FIG. 9B is a diagram showing the results obtained by the four algorithms. , FIG. 9E shows a second brain MR image, and FIG. 9F shows a third brain MR image.
10 is a scatter diagram of four synthetic data sets having different sizes.
Fig. 11 shows average values of execution times by four algorithms in solving a composite data set having different sizes; Fig.
12 is a flowchart of an object clustering method for image segmentation according to an embodiment of the present invention.

본 발명의 목적, 특정한 장점들 및 신규한 특징들은 첨부된 도면들과 연관되어지는 이하의 상세한 설명과 바람직한 실시예들로부터 더욱 명백해질 것이다.BRIEF DESCRIPTION OF THE DRAWINGS The objectives, specific advantages and novel features of the present invention will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which: FIG.

이에 앞서 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이고 사전적인 의미로 해석되어서는 아니되며, 발명자가 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있는 원칙에 입각하여 본 발명의 기술적 사상에 부합되는 의미와 개념으로 해석되어야 한다.Prior to that, terms and words used in the present specification and claims should not be construed in a conventional and dictionary sense, and the inventor may properly define the concept of the term in order to best explain its invention Should be construed in accordance with the principles and the meanings and concepts consistent with the technical idea of the present invention.

본 명세서에서 각 도면의 구성요소들에 참조번호를 부가함에 있어서, 동일한 구성 요소들에 한해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 번호를 가지도록 하고 있음에 유의하여야 한다.It should be noted that, in the present specification, the reference numerals are added to the constituent elements of the drawings, and the same constituent elements are assigned the same number as much as possible even if they are displayed on different drawings.

또한, "제1", "제2", "일면", "타면" 등의 용어는, 하나의 구성요소를 다른 구성요소로부터 구별하기 위해 사용되는 것으로, 구성요소가 상기 용어들에 의해 제한되는 것은 아니다.Also, the terms "first", "second", "one side", "other side", etc. are used to distinguish one element from another, It is not.

이하, 본 발명을 설명함에 있어, 본 발명의 요지를 불필요하게 흐릴 수 있는 관련된 공지 기술에 대한 상세한 설명은 생략한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS In the following description of the present invention, a detailed description of known arts which may unnecessarily obscure the gist of the present invention will be omitted.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시형태를 상세히 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

실제 패턴 인식 응용에 있어서, 주어진 세트의 완전한 그리고 정확한 정보는 항상 획득하기가 용이하지 않다. 이러한 불완전한 정보는 많은 패턴 인식 방법들을 사용함으로써 상기 세트의 불완전한 표현을 초래할 수 있다. 러프셋 이론은 상이한 파라미터들에 의해 가중되는 한 쌍의 하한 근사 및 상한 근사에 의해 부정확한 세트를 근사적으로 기술하도록 설계된다. 분포 특징은 세트 마다 변하기 때문에, 다양한 주어진 세트들을 기술할 때 하한 근사와 상한 근사의 중요도를 제어하기 위하여 일정한 가중 파라미터를 이용하는 것은 바람직하지 않다. 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법은 파라미터 선택 전략이 일정한 파라미터를 수동으로 선택하는 대신에 각 클러스터의 분포 특징에 의존하여 가중 파라미터를 적응적으로 조정하도록 설계되는 개선된 러프-퍼지 c-평균 클러스터링 알고리즘을 제공한다. 가중 파라미터는 클러스터의 구조적인 특징에 따라 자동으로 선택되고, 온라인 방식으로 각 반복 단계에서 갱신된다. 대응하는 프로토타입의 계산에 중요한, 각 클러스터의 상대적인 정확한 근사 영역들이 형성될 수 있고, 이것은 형성된 프로토타입이 바람직한 위치에 근접할 수 있게 한다. 합성 데이터세트, 실생활 데이터세트 및 이미지 분할 문제에 대한 실험 결과는 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법에서 제안한 적응적 파라미터 선택 전략의 효과를 확인한다. 적응적인 파라미터 선택 전략의 도입으로, 개선된 러프셋-기반 클러스터링 알고리즘은 기존 알고리즘을 능가한다.For real pattern recognition applications, complete and accurate information for a given set is not always easy to obtain. This incomplete information can lead to an incomplete representation of the set by using many pattern recognition methods. The rough set theory is designed to approximate an inaccurate set by a pair of lower and upper bound approximations weighted by different parameters. Since distribution characteristics vary from set to set, it is not desirable to use a constant weighting parameter to control the importance of the lower and upper bound approximations when describing the various given sets. The object clustering method for image segmentation according to an embodiment of the present invention is a method for clustering objects for image segmentation, wherein the parameter selection strategy is an improved roughness filter that is designed to adaptively adjust weighting parameters depending on the distribution characteristics of each cluster, - Provides a fuzzy c-mean clustering algorithm. The weighting parameters are automatically selected according to the structural characteristics of the cluster and updated at each iteration step in an online fashion. Relatively accurate approximate regions of each cluster, which are important in the computation of the corresponding prototype, can be formed, which allows the formed prototype to be close to the desired location. Experimental results on synthetic data sets, real life data sets, and image segmentation problems confirm the effect of the adaptive parameter selection strategy proposed in the object clustering method for image segmentation according to an embodiment of the present invention. With the introduction of adaptive parameter selection strategies, the improved rough set-based clustering algorithm outperforms existing algorithms.

러프셋Rough set (rough set)-기반 클러스터링 알고리즘(rough set) -based clustering algorithm

러프셋 이론에 의하면, 부정확한 집합은 하한 근사(lower approximation) 및 상한 근사(upper approximation) 양자에 의해 근사적으로 서술될 수 있다. 이들 두 근사들은 양측에서 부정확한 집합의 근사 정의를 제공한다. 하한 근사에 있는 오브젝트들은 타깃 집합의 부분 집합을 형성하고, 그래서 그들은 확실히 그 집합에 속한다. 상한 근사들에 있는 오브젝트들은 상기 타깃 집합과 비공(nonempty) 교차를 형성한다. 러프셋을 클러스터링 알고리즘에 도입함으로써, 각 클러스터의 프로토타입(prototype)은 k-평균(k-means) 또는 FCM에서와 같이 모든 오브젝트들을 사용하는 대신에 그것의 하한 근사 및 상한 근사에 속하는 오브젝트들에 의존하여 계산되고, 그다음, 쓸모없는 정보는 제거될 수 있다(Zhou(2011)).According to rough set theory, an incorrect set can be approximated by both a lower approximation and an upper approximation. These two approximations provide an approximate definition of the inaccurate set on both sides. The objects in the lower bound approximation form a subset of the target set, so they certainly belong to that set. The objects in the upper bound approximations form a nonempty intersection with the target set. By introducing a set of roughs into the clustering algorithm, the prototype of each cluster can be applied to objects belonging to its lower and upper bound approximations instead of using all objects such as k-means or FCM , And then the useless information can be removed (Zhou (2011)).

도 1은 클러스터의 개요도를 제공한다. 하한 근사에 위치한 오브젝트들은 확실히 이 클러스터에 속한다. 상한 근사에 위치한 오브젝트들은 아마도 이 클러스터에 속한다. 하한 근사와 상한 근사 사이의 오브젝트들은 경계 영역(boundary region)을 형성한다. 클러스터의 상이한 근사들에 위치한 오브젝트들의 기여도는 구별된다. 일반적으로, 제외 영역에 위치한 오브젝트들은 대응하는 프로토타입의 계산 시 거의 무시된다. 하한 근사에 있는 오브젝트들은 대응하는 프로토타입의 계산 시 경계 영역에 있는 오브젝트들과 비교할 때 더 높은 중요도를 나타낸다.Figure 1 provides a schematic of a cluster. The objects in the lower bound approximation are definitely in this cluster. The objects in the upper bound approximation probably belong to this cluster. Objects between a lower bound approximation and an upper bound approximation form a boundary region. The contribution of objects located in different approximations of the cluster is distinguished. In general, objects located in exclusion areas are almost ignored in the calculation of the corresponding prototype. Objects in the lower bound approximation exhibit a higher degree of importance when compared to objects in the boundary region in the calculation of the corresponding prototype.

러프셋-기반 클러스터링 알고리즘에 있어서, 하기의 기본 러프셋 속성들이 만족될 필요가 있다:For a rough set-based clustering algorithm, the following basic rough set properties need to be satisfied:

1. 오브젝트는 단지 하나의 클러스터의 하한 근사에 할당될 수 있다.1. An object can be assigned to a lower bound approximation of only one cluster.

2. 오브젝트가 클러스터의 하한 근사에 있는 경우, 그것은 또한 동일한 클러스터의 상한 근사에 속한다.2. If the object is in the lower bound approximation of the cluster, it also belongs to the upper bound approximation of the same cluster.

3. 오브젝트가 어떤 클러스터의 하한 근사에도 속하지 않는 경우, 그것은 어떤 클러스터들의 상한 근사들, 적어도 그것들의 2개의 상한 근사들에 속한다.3. If the object does not belong to the lower bound approximation of any cluster, it belongs to the upper bound approximations of some clusters, at least two of them, the upper bound approximations.

러프 c-평균 클러스터링 알고리즘(Rough c-mean clustering algorithm ( RCMRCM : Rough c-means clustering algorithm): Rough c-means clustering algorithm)

RCM(Peters(2006))에서, 클러스터는 프로토타입(prototype)과 2개의 근사들에 의해 서술된다. n개의 오브젝트들이 c개의 클러스터들

에 할당된다고 가정하자. 프로토타입들은 하기와 같이 갱신된다:In RCM (Peters (2006)), clusters are described by prototypes and two approximations. n < / RTI >

Lt; / RTI > The prototypes are updated as follows:

상기에서

이고,In the above,

ego,

이며,

Lt;

이다.

to be.

상기에서

는 클러스터 i를 나타내고,

이다.

는 오브젝트 j를 나타내고,

이며,

와

는 각각 클러스터 i의 하한 근사 및 상한 근사이고,

는 클러스터 i의 하한 근사에 있는 오브젝트들의 수이며,

는 클러스터 i의 경계 영역이고,

와

는 각각 상기 하한 영역과 경계 영역의 가중 파라미터들이다. 그들은 대응하는 프로토타입을 계산할 때 하한 영역과 경계 영역의 중요도를 제어한다. 하한 근사는 경계 영역보다 더 기여를 하기 때문에,

이고

이다.In the above,

Represents cluster i,

to be.

Represents an object j,

Lt;

Wow

Are the lower bound approximation and upper bound approximation of cluster i, respectively,

Is the number of objects in the lower bound approximation of cluster i,

Is the boundary region of the cluster i,

Wow

Are the weighting parameters of the lower bound region and the boundary region, respectively. They control the importance of lower bounds and bounding regions when computing the corresponding prototype. Since the lower bound approximation contributes more than the boundary region,

ego

to be.

각 클러스터의 하한 영역과 경계 영역에 속하는 오브젝트들은 하기 규칙들에 의해 결정된다:

와

를 모든 클러스터들에 대한 오브젝트 j의 최소 거리 및 두번째 최소 거리라 하자.

는 임계값 파라미터이다.

인 경우, 오브젝트 j는 클러스터 i의 하한 근사에 속한다. 그렇지 않으면, 오브젝트 j는 클러스터 i와 클러스터 k 양자의 경계 영역들에 속한다.Objects belonging to the lower bound area and the border area of each cluster are determined by the following rules:

Wow

Let be the minimum distance and second minimum distance of object j for all clusters.

Is a threshold parameter.

, Then object j belongs to the lower bound approximation of cluster i. Otherwise, the object j belongs to the boundary areas of both cluster i and cluster k.

러프-퍼지 c-평균 클러스터링 알고리즘(Rough-Fuzzy c-Mean Clustering Algorithm ( RFCMRFCM : rough-fuzzy c-means clustering algorithm): rough-fuzzy c-means clustering algorithm)

아무런 멤버십 정도(membership degrees)도 포함되지 않기 때문에, 중복되는 클러스터 경계들로부터 발생하는 불확실성은 RCM에 의해 효과적으로 처리될 수 없다. 퍼지 셋과 러프 셋이 어떤 경우에 보완적이기 때문에, 퍼지 셋과 러프셋 양자의 이점들을 결합하는 통합 방법을 설계하는 것이 자연스럽게 떠오르는 의견이다. 이러한 환경에서, S. Mitra et al.은 퍼지 셋과 러프 셋 양자를 c-평균 클러스터링 방법의 프레임워크에 도입하였고 러프-퍼지 c-평균 알고리즘을 제안하였다(RFCM I으로 지칭됨)(Mitra, 2006).Since no membership degrees are involved, uncertainties arising from overlapping cluster boundaries can not be effectively handled by the RCM. Since the fuzzy set and the rough set are complementary in some cases, it is natural to design an integrated method that combines the advantages of both the fuzzy set and the rough set. In this environment, S. Mitra et al. Introduced both the fuzzy set and the rough set into the framework of the c-means clustering method and proposed a rough-fuzzy c-means algorithm (referred to as RFCM I) (Mitra, 2006 ).

RFCM I에서, 클러스터는 RCM에서의 크리스프 근사(crisp approximation)와는 상이한, 퍼지 하한 근사와 퍼지 상한 근사에 의해 표시된다. 프로토타입들은 하기 방법으로 갱신된다:In RFCM I, clusters are represented by a purge lower approximation and a fuzzy upper bound approximation, which is different from the crisp approximation in RCM. Prototypes are updated in the following way:

상기에서

이고,In the above,

ego,

이며,

Lt;

이고,

ego,

이다.

to be.

상기에서 클러스터들의 수는 c이고,

는 클러스터 i를 나타내며,

이고,

는 클러스터 i의 하한 근사이고,

는 클러스터 i의 경계 영역이며,

는 클러스터 i에 속하는 오브젝트 j의 멤버십 정도(membership degree)를 나타내는 퍼지 멤버십 값이고, m은 퍼지 계수이다. 두 근사들 어느 것도 비어 있지 않다면, 새로운 프로토타입이 하한 근사와 경계 영역 양자에 의해 계산되고, 그들의 중요도는 가중 파라미터들(

와

)에 의해 제어된다. 하한 근사는 새로운 프로토타입의 계산 시 경계 영역보다 더 많이 기여하기 때문에,

이고

이다.Where the number of clusters is c,

Represents cluster i,

ego,

Is the lower bound approximation of cluster i,

Is the boundary region of the cluster i,

Is a fuzzy membership value indicating the degree of membership of an object j belonging to cluster i, and m is a fuzzy coefficient. If neither of the two approximations is empty, a new prototype is computed by both the lower bound approximation and the bounding domain,

Wow

). Since the lower bound approximation contributes more than the bounded area in the computation of the new prototype,

ego

to be.

오브젝트들은 하기 규칙들에 따라 상이한 근사들에 할당된다;

와

를 모든 클러스터들에 대한 오브젝트 j의 최대 및 두 번째 최대 멤버십 값들이라 하자.

인 경우, 오브젝트 j는 클러스터 i의 하한 근사에 속한다. 그렇지 않으면, 오브젝트 j는 클러스터 i와 클러스터 k 양자의 경계 영역들에 속한다.The objects are assigned to different approximations according to the following rules:

Wow

Let be the maximum and second maximum membership values of object j for all clusters.

오브젝트가 어떤 근사에 속하는지를 결정하는데 있어서, RCM에서 채택된 절대 거리 및 상대 거리는 퍼지 멤버십 정도로 대체되었다. 퍼지 멤버십 함수는 동시에 오브젝트를 소유권의 상이한 정도를 갖는 모든 클러스터들에 속하게 하는 이점을 갖고, 클러스터링 알고리즘의 강건함을 증대한다. 퍼지셋의 도입은 러프셋-기반 클러스터링 알고리즘으로 하여금 중복 문제를 효과적으로 처리할 수 있도록 할 것이다.In determining which approximation an object belongs to, the absolute distance and relative distance adopted in the RCM has been replaced by a degree of fuzzy membership. The fuzzy membership function has the advantage of simultaneously belonging to all clusters with different degrees of ownership of the object, increasing the robustness of the clustering algorithm. The introduction of a fuzzy set will allow a rough set-based clustering algorithm to effectively handle duplicate problems.

RFCM의RFCM's 증강된 버전 Enhanced version

러프셋-기반 클러스터링 알고리즘의 기본적인 특징은 하한 근사에 있는 오브젝트들은 확실히 클러스터에 속한다는 것이다. 따라서, 하한 근사에 있는 오브젝트들은 프로토타입을 갱신할 때 동일한 기여를 해야하고 그들의 가중치들은 다른 프로토타입에 의해 영향을 받지 않아야 한다. 반면에, 경계 영역들에 있는 오브젝트들은 하나의 클러스터에 아마도 속하고 잠재적으로 다른 클러스터에 속한다. 따라서, 경계 영역들에 있는 오브젝트들은 프로토타입을 갱신하는데 다른 영향을 미친다. 수학식 2에서 A₁의 계산으로부터 하한 근사와 경계 영역에 있는 오브젝트들의 가중치들은 RFCM I에서 퍼지화된다는 것을 알 수 있다. 어느 정도, 수학식 2에서 프로토타입들을 갱신하기 위한 기준은 하한 근사에 있는 오브젝트들의 중요도를 감소시킬 수 있고 결과로서 나오는 프로토타입들을 정확한 위치에서 표류하게 할 것이다(Maji, 2007a). RFCM I의 수정된 버전은 Maji에 의해 제안되었는데(2007a), 이것은 RFCM II로 지칭된다. RFCM II에서, 프로토타입들을 갱신하기 위한 기준은 하기와 같이 요약된다:The basic feature of the rough set-based clustering algorithm is that the objects in the lower bound approximation are definitely in the cluster. Therefore, objects in the lower bound approximation must make the same contribution when updating the prototype, and their weights should not be affected by other prototypes. On the other hand, objects in bounded areas probably belong to one cluster and potentially belong to another cluster. Thus, objects in the bounding regions have different effects on updating the prototype. From the calculation of A ₁ in Equation (2), it can be seen that the lower bound approximation and the weights of the objects in the boundary region are fuzzy in RFCM I. To some extent, the criterion for updating prototypes in equation (2) can reduce the importance of objects in the lower bound approximation and cause the resulting prototypes to drift in the correct position (Maji, 2007a). A modified version of RFCM I was proposed by Maji (2007a), which is referred to as RFCM II. In RFCM II, the criteria for updating prototypes are summarized as follows:

상기에서

이고,In the above,

ego,

이며,

Lt;

이다.

to be.

상기에서

와

는 클러스터 i의 크리스프(crisp) 하한 근사 및 퍼지 경계 영역이고,

는 클러스터 i의 하한 근사에 있는 오브젝트들의 수이다. 수학식 3에서, 하한 근사에 있는 오브젝트들은 새로운 프로토타입의 계산 시 동일한 중요도를 갖고, 그들은 다른 프로토타입들을 고려하지 않는다. 경계 영역에 있는 오브젝트들은 여전히 프로토타입의 갱신시 상이한 중요도를 갖는다. RFCM II 알고리즘의 주 단계들과

와

의 파라미터 설정은 프로토타입들을 갱신하기 위한 기준을 제외하곤 RFCM I 알고리즘과 동일하다.In the above,

Wow

Is a crisp lower bound approximation and a fuzzy boundary region of cluster i,

Is the number of objects in the lower bound approximation of cluster i. In Equation 3, the objects in the lower bound approximation have the same importance in the computation of the new prototype, and they do not consider other prototypes. Objects in the bounds area still have different significance in updating the prototype. The main steps of the RFCM II algorithm and

Wow

Is the same as the RFCM I algorithm except for the criteria for updating prototypes.

RCM, RFCM I 및 RFCM II와 같은 러프셋-기반 c-평균 클러스터링 알고리즘의 공통점들에 따라, 러프셋-기반 c-평균 클러스터링 알고리즘들의 일반화된 프레임워크는 하기와 같다:In accordance with the commonness of the rough set-based c-means clustering algorithms such as RCM, RFCM I and RFCM II, the generalized framework of the rough set-based c-means clustering algorithms is as follows:

단계 1: 클러스터 프로토타입들의 수(c), 퍼지화 파라미터(m), 및 중단 조건(

)을 설정한다.Step 1: The number of cluster prototypes (c), the fuzzification parameter (m), and the stop condition

).

단계 2: c 클러스터들에 대한 프로토타입들을 랜덤하게 초기화한다.Step 2: Initialize random prototypes for the c-clusters.

단계 3: 각 오브젝트를 대응하는 근사들에 할당한다.Step 3: Assign each object to corresponding approximations.

단계 4: 각 클러스터에 대한 프로토타입을 갱신한다.Step 4: Update the prototype for each cluster.

단계 5: 중단 기준이 도달될 때까지 단계 3 및 단계 4를 반복한다.Step 5: Repeat steps 3 and 4 until the break criterion is reached.

러프셋-기반 클러스터링 알고리즘에 있어서, 오브젝트들은 어떤 클러스터에 관하여 3개의 영역들로 나누어진다. 프로토타입의 갱신 시 그리고 클러스터의 서술시 상이한 영역들에 속하는 오브젝트들의 기여도는 구별된다. 가중 파라미터들은 프로토타입들을 갱신할 때 하한 근사와 경계 영역들의 기여도를 제어한다. 가중 파라미터의 값들은 주어진 클러스터의 정확한 근사 영역들을 형성하는데에 큰 영향을 미치고 클러스터링 결과에 직접적으로 영향을 미친다. 보통, 가중 파라미터는 데이터 세트의 분포를 고려하지 않고 미리 수동으로 설정된다. 일정한(constant) 파라미터를 선택하는 것은 클러스터들의 구조적인 특징 및 어떤 클러스터에 속하는 오브젝트들의 밀집도(compactness)를 반영할 수 없다. 이 경우, 형성된 근사 영역들은 왜곡될 수 있고 결과로서 생성되는 프로토타입들은 기대되는 위치들에서 벗어날 수 있다.In a rough set-based clustering algorithm, objects are divided into three regions for a certain cluster. During the update of the prototype and in the description of the cluster, the contribution of objects belonging to different regions is distinguished. The weighting parameters control the lower bound approximation and the contribution of the boundary regions when updating the prototypes. The values of the weighting parameters have a large effect on forming accurate approximate regions of a given cluster and directly affect the clustering results. Normally, the weighting parameter is manually set in advance without considering the distribution of the data set. Selecting a constant parameter can not reflect the structural characteristics of the clusters and the compactness of the objects belonging to any clusters. In this case, the formed approximate regions may be distorted and the resulting prototypes may deviate from the expected locations.

가중 파라미터의 Weighting parameter 적응적Adaptive 선택 Selection

고전적인 러프셋-기반 클러스터링 알고리즘에 있어서, 가중 파라미터(weighted parameter)는 오브젝트들 간의 글로벌 관계에 대한 지식을 고려하지 않고 수동으로 설정된다. 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법은, 러프셋-기반 클러스터링 알고리즘에서 가중 파라미터의 적응적인 선택 전략을 제안한다. 이러한 방법으로, 각 클러스터는 그 자체로 오브젝트들의 근접성(closeness)을 반영하는 적합한 가중 파라미터를 획득할 것이다. 형성된 근사 영역들은 데이터세트에서 모든 클러스터들 사이의 차이를 반영하는 개별적인 클러스터들의 관점에서 결정된다.In a classical rough set-based clustering algorithm, a weighted parameter is set manually without taking knowledge of the global relationship between objects. The object clustering method for image segmentation according to an embodiment of the present invention proposes an adaptive selection strategy of weighting parameters in a rough set-based clustering algorithm. In this way, each cluster will itself acquire a suitable weighting parameter that reflects the closeness of the objects. The formed approximate regions are determined in terms of individual clusters reflecting differences between all clusters in the data set.

관례적으로, 어떤 러프셋-기반 클러스터링 알고리즘이 사용될지라도, 사용자-정의 가중 파라미터가 채택된다. 클러스터링 결과에 대한 가중 파라미터의 효과를 알아보기 위하여, 합성 데이터세트에 대한 클러스터링 결과 대 상이한 값의 가중 파라미터들이 비교된다. 러프셋-기반 클러스터링 알고리즘들의 대표성을 고려하여, RFCM II가 가중 파라미터 분석을 위해 채택된다. 하한 근사의 중요도가 프로토타입의 갱신 시 경계 영역의 중요도보다 더 크다는 것은 당연하기 때문에

라 하자.Conventionally, even if any rough set-based clustering algorithm is used, a user-defined weighting parameter is adopted. To see the effect of the weighting parameters on the clustering results, the weighting parameters of the different values versus the clustering result for the composite data set are compared. In view of the robustness of set-based clustering algorithms, RFCM II is adopted for weighted parameter analysis. It is not surprising that the importance of the lower bound approximation is greater than the importance of the boundary region in the prototype update

Let's say.

도 2는 가중 파라미터의 변화에 따라 최종 반복 단계에서 RFCM II에 의해 획득된 산포도(scatter plot)를 도시한 것이다. 정량적인 척도 및 프로토타입들의 최종 위치들이 표 1에 표시된다. 구체적으로, 가중 파라미터가 부적합한 값을 취하는 경우, 하한 근사 및 경계 영역의 기여도는 프로토타입들의 갱신 시 잘 제어될 수 없고 결과로서 생성되는 프로토타입들은 그들의 바람직한 위치를 벗어날 수 있다. 도 2 및 표 1은 상이한 가중 파라미터들이 동일한 문제를 해결하는데 있어서 아주 다양한 클러스터링 결과들을 야기한다는 것을 나타낸다.Figure 2 shows a scatter plot obtained by RFCM II in the final iterative step according to the change in weighting parameter. The final positions of quantitative measures and prototypes are shown in Table 1. Specifically, if the weighting parameter takes an unsuitable value, the lower bound approximation and contribution of the boundary region can not be well controlled at the time of updating the prototypes, and the resulting prototypes may deviate from their preferred position. Figure 2 and Table 1 show that different weighting parameters result in a wide variety of clustering results in solving the same problem.

클러스터링 정확도(CA: Clustering Accuracy, 1975), 조정된 란드 지수(ARI: Adjusted Rand Index)(허버트(Hubert), 1985), 및 민코프스키 점수(MS: Minkowski Scores)(벤-허(Ben-Hur), 2003)를 포함하는 평가 지수들이 클러스터링 결과 대 상이한 값의 가중 파라미터들의 효과를 평가하기 위하여 채택된다. CA, ARI 및 MS의 통계적인 결과들은 도 3에 상자 그림(box plot)에 의해 도시된다. 상자 그림 내의 중간 라인은 통계적인 샘플들의 평균 결과를 나타내는데 사용되고, 기호 '+'는 샘플 분포의 안정 특징을 서술하는데 사용된다. CA, ARI 및 MS의 평균 결과들은 표 2에 표시된다. 도 3으로부터, 가중 파라미터가 상대적으로 큰 값을 취할 때 상자 그림이 매우 넓다는 것을 알 수 있는데, 이것은 클러스터링 결과들이 가중 파라미터의 변화에 따라 매우 안정적이지 않다는 것을 나타낸다. 게다가, CA, ARI 및 MS의 관점에서 획득된 평균 결과들은 또한 이러한 값을 가지고 잘 수행될 수 없다.(Clustering Accuracy, 1975), Adjusted Rand Index (ARI) (Hubert, 1985), and Minkowski Scores (Ben-Hur) , 2003) are employed to evaluate the effect of weighting parameters on clustering results versus different values. The statistical results of CA, ARI and MS are shown by box plot in Fig. The middle line in the box figure is used to represent the average result of the statistical samples, and the sign '+' is used to describe the stable characteristics of the sample distribution. The average results of CA, ARI and MS are shown in Table 2. From FIG. 3, it can be seen that when the weighting parameter takes a relatively large value, the box picture is very wide, indicating that the clustering results are not very stable as the weighting parameter changes. In addition, average results obtained from the CA, ARI and MS perspectives can also not be performed well with these values.

도 3 및 표 2는 RFCM II가 가중 파라미터에 민감하고 가중 파라미터의 신중한 선택만이 만족할만한 클러스터링 결과를 도출할 것임을 나타낸다. RFCM II는 가중 파라미터가 0.51인 경우 비교적 잘 수행된다. 도 4는 가중 파라미터가 0.51인 경우 각 클러스터의 상세한 근사 영역들을 도시한 것이다. 최종 분할 결과는 도 4a에 도시되는데, 프로토타입들은 노란색으로 표시되어 있다. 3개의 클러스터들의 하한 근사들은 도 4b에 도시된다. 도 4c 내지 도 4e는 각각 각 클러스터의 경계 영역을 도시한 것이다. 각 클러스터의 하한 근사 내의 오브젝트들의 수는 상이하다. 도 4로부터, 녹색 클러스터는 그것의 하한 근사에 가장 많은 오브젝트들을 갖는 가장 밀집한 클러스터인 반면에, 파랑색 클러스터는 가장 밀도가 희박한 클러스터이고 그것의 하한 근사에 가장 적은 수의 오브젝트들을 가지고 있다는 것을 알 수 있다. 각 클러스터의 구조적인 특징은 상이하다.Figure 3 and Table 2 show that RFCM II is sensitive to the weighting parameters and that careful selection of the weighting parameters will result in satisfactory clustering. RFCM II performs relatively well with a weighting parameter of 0.51. FIG. 4 shows detailed approximate regions of each cluster when the weighting parameter is 0.51. The final segmentation result is shown in FIG. 4A, where prototypes are shown in yellow. The lower bound approximations of the three clusters are shown in FIG. 4B. 4C to 4E show boundary regions of respective clusters, respectively. The number of objects in the lower bound approximation of each cluster is different. It can be seen from FIG. 4 that the green cluster is the closest cluster with the largest number of objects in its lower bound approximation, while the blue cluster is the least dense cluster and has the smallest number of objects in its lower bound approximation have. The structural characteristics of each cluster are different.

상기 분석으로부터, 상이한 프로토타입들을 갱신하기 위하여 일정한 가중 파라미터를 사용하는 것은 클러스터들 사이의 구조적인 변화를 반영할 수 없고 만족스럽지 못한 클러스터링 결과를 야기할 수 있다는 것이 증명되었다. 물론, 가중 파라미터들의 선택은 주어진 문제에 의존하고, 각 클러스터에 관해 상이하다. 각 클러스터가 그것에 속하는 오브젝트들의 근접성(closeness)을 반영하도록, 그 자신의 2개의 근사들 간의 중요도의 비율을 제어하는 적합한 가중 파라미터를 가져야 한다는 아이디어가 자연스럽게 떠오른다. 각 클러스터에 대한 정확한 근사 영역들을 형성하기 위하여, 가중 파라미터의 온-라인 결정이 필요한데, 이것은 클러스터들의 고유한 구조적인 특징이 잘 표현될 수 있게 한다.From this analysis it has been proved that the use of certain weighted parameters to update different prototypes can not reflect the structural changes between clusters and can lead to unsatisfactory clustering results. Of course, the choice of weighting parameters depends on a given problem and is different for each cluster. The idea naturally comes to have an appropriate weighting parameter that controls the ratio of importance between its two approximations, so that each cluster reflects the closeness of the objects it belongs to. In order to form accurate approximate regions for each cluster, on-line determination of the weighting parameters is required, which allows the unique structural characteristics of the clusters to be well represented.

일단 퍼지 분할 결과가 획득되면, 밀집도(compactness)는, 분할 결과가 정확하게 데이터 구조를 기술하는지를 입증하기 위한 클러스터링 타당성 지수를 설계하는데 있어서 주된 기준 중 하나이다. 밀집도는 클러스터들 내의 오브젝트들의 근접성(closeness)을 나타내는 척도가 되고, 분할 엔트로피(PE: partition entropy) 지수, 분할 계수(PC: partition coefficient) 지수 및 분할 계수와 지수 분리(PCAES: partition coefficient and exponential separation) 지수(Sun(2004), Tsekouras(2004); Wu(2005); Bezdek(1974); Trauwaert(1988))와 같은 많은 클러스터링 타당성 지수들에서 발견될 수 있다.Once the fuzzy partitioning result is obtained, compactness is one of the main criteria in designing the clustering feasibility index to prove that the partitioning result accurately describes the data structure. The density is a measure of the closeness of objects in clusters and is a measure of the partition entropy (PE), partition coefficient (PC) index and partition coefficient and exponential partition (PCAES) ) Can be found in many clustering validity indices such as, for example, Sun (2004), Tsekouras (2004), Wu (2005), Bezdek (1974) and Trauwaert (1988).

상기 분할 엔트로피(PE) 지수(Bezdek, 1974)는 하기와 같이 정의되는, 분할 매트릭스에서 퍼지니스(fuzziness)의 양을 측정한다:The partitioned entropy (PE) index (Bezdek, 1974) measures the amount of fuzziness in the partitioning matrix, defined as:

상기에서 a는 로그의 기수(base)이고, n은 샘플들의 수이며, c는 클러스터들의 수이고,

는 퍼지 멤버십 값이다.Where a is the base of the log, n is the number of samples, c is the number of clusters,

Is a fuzzy membership value.

분할 계수(PC) 지수(Trauwaert, 1988)는 하기와 같이 정의되는, 분할 매트릭스에서 퍼지 서브셋의 쌍들 중 하나를 공유하는 멤버십의 평균 상대량을 측정한다:The partition coefficient (PC) index (Trauwaert, 1988) measures the average relative amount of membership sharing one of the pairs of fuzzy subsets in the partitioning matrix, defined as:

상기에서 n은 샘플들의 수이고, c는 클러스터들의 수이며,

는 퍼지 멤버십 값이고, m은 퍼지 계수이다. 큰 PC 값은 더 나은 분할 결과를 나타낸다.Where n is the number of samples, c is the number of clusters,

Is a fuzzy membership value, and m is a fuzzy coefficient. A large PC value represents better splitting results.

상기 분할 계수 및 지수 분리(PCAES) 지수는 각 클러스터에 대해 정규화된 분할 계수 및 지수 분리(Wu, 2005)를 갖는 2개의 인자들을 고려하고, 그래서 모든 클러스터의 이들 2개의 인자들을 함께 모은다. PCAES 지수는 하기와 같이 정의된다:The partitioning factor and exponent division (PCAES) index takes into account two factors with a normalized partitioning factor and exponentiation (Wu, 2005) for each cluster, thus collecting together these two factors of all clusters. The PCAES index is defined as:

상기에서,In the above,

이다.

to be.

상기에서 n은 샘플들의 수이고, c는 클러스터들의 수이며,

는 퍼지 멤버십 값이고, m은 퍼지 계수이다.Where n is the number of samples, c is the number of clusters,

Is a fuzzy membership value, and m is a fuzzy coefficient.

는 오브젝트 j가 클러스터 i에 속하는 정도를 측정하는 퍼지 멤버십 값이다. 그것은 상기한 3개의 클러스터링 타당성 지수들에서 나타난다. PCAES 지수에서,

로 지칭되는, 정규화된 분할 계수가 가장 컴팩트한 클러스터에 관한 클러스터 i의 밀집도(compactness)를 측정하는데 채택된다. 이 용어는 상대적인 값이고 PC 지수에서 사용되는 어떤 클러스터의 밀집도 척도와 유사하다. 차이는, PC 지수에서 밀집도 척도는 모든 클러스터들의 평균 값이고, PCAES 지수에서 각 클러스터의 밀집도 척도는 총 측정치를 형성하기 위하여 합산된다는 것이다. 퍼지 분할 결과를 획득한 이후에, 밀집도는 클러스터링 타당성의 중요한 측정치이다. 이 경우, 밀집도 척도는 또한 러프-퍼지 클러스터링 알고리즘에서 각 클러스터에 속하는 오브젝트들의 근접성을 스케일링하는데 사용될 수 있다.

Is a fuzzy membership value that measures the degree to which object j belongs to cluster i. It appears in the above three clustering validity indices. In the PCAES index,

Quot; is employed to measure the compactness of cluster i with respect to the most compact cluster. This term is a relative value and is similar to the cluster density measure of any cluster used in the PC index. The difference is that in the PC index, the density measure is the average value of all clusters, and in the PCAES index, the density measure of each cluster is summed to form the total measure. After obtaining the fuzzy partitioning result, the density is an important measure of clustering validity. In this case, the density measure may also be used to scale the proximity of objects belonging to each cluster in a rough-fuzzy clustering algorithm.

정확도는 타깃 러프셋의 부정확도를 기술하기 위한 수치적인 특징이다. 러프셋 X의 정확도는 상한 근사(

) 내의 오브젝트들의 수에 대한 하한 근사(

) 내의 오브젝트들의 수의 비율이다. 타깃 러프셋의 정확도가 높을수록, 형성될 그것의 2개의 근사들은 더 양호하다. 러프셋 X의 정확도는 하기와 같이 정의된다:Accuracy is a numerical feature to describe the inaccuracy of the target rough set. The accuracy of the rough set X is determined by the upper bound approximation (

Lt; RTI ID = 0.0 > (

Quot;). The higher the accuracy of the target rough set, the better the two approximations to be formed. The accuracy of the rough set X is defined as:

수학식 7의 정의는 각 오브젝트가 단지 하나의 클러스터에 할당된다는 사실에 기반한다. 퍼지셋과 러프셋 양자는 하이브리드 클러스터링 알고리즘에서 고려되기 때문에, 오브젝트는 동시에 상이한 클러스터들에 할당될 수 있다. 어느 정도까지, 수학식 7은 하드 버전(hard version)의 타깃 셋의 정확도인데, 이것은 하이브리드 클러스터링 알고리즘들에서의 클러스터의 부정확도를 기술하기에는 적합하지 않다.The definition in Equation (7) is based on the fact that each object is assigned to only one cluster. Since both the fuzzy set and the rough set are considered in the hybrid clustering algorithm, objects can be assigned to different clusters at the same time. To some extent, Equation 7 is the accuracy of the target set of the hard version, which is not suitable to describe the inaccuracy of the clusters in the hybrid clustering algorithms.

러프셋-기반 클러스터링 알고리즘들에서, 클러스터는 하한 근사와 경계 영역의 관점에서 함께 기술되는 타깃 셋으로서 고려된다. 가중 파라미터는 클러스터를 나타내는데 있어서 하한 근사와 경계 영역의 중요도를 제어한다. 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법에서 각 클러스터의 구조적인 특징은 상이하기 때문에, 각 클러스터의 가중 파라미터를 적응적으로 선택한다. 상기한 분석 결과에 의하면, 하이브리드 클러스터링 알고리즘들에서 분포 특징을 기술하기 위하여 그리고 각 클러스터의 가중 파라미터를 선택하기 위하여, 퍼지 셋을 분석하기 위한 밀집도를 채택하는 이점과 러프셋을 기술하기 위한 정확도를 사용하는 이점을 함께 통합함으로써 어떤 영감이 떠오른다. 자동으로 선택된 가중 파라미터는 하기와 같이 계산된다:In rough set-based clustering algorithms, clusters are considered as a target set that is described together in terms of the lower bound approximation and the bounding domain. The weighting parameters control the lower bound approximation and the importance of the boundary region in representing clusters. In the object clustering method for image segmentation according to an embodiment of the present invention, since the structural characteristics of each cluster are different, the weight parameters of each cluster are adaptively selected. According to the above-mentioned analysis result, in order to describe the distribution characteristic in the hybrid clustering algorithms and to select the weight parameter of each cluster, the advantage of adopting the density for analyzing the fuzzy set and the accuracy for describing the rough set are used Some inspiration comes to mind by integrating the benefits together. The automatically selected weighting parameters are calculated as follows:

그리고

이다.And

to be.

상기에서 n은 오브젝트들의 수이고,

는 클러스터 i를 나타내며,

이고,

와

는 각각 클러스터 i의 하한 근사 및 경계 영역이며,

는 오브젝트 j가 클러스터 i에 속하는 정도를 나타내는 퍼지 멤버십 값이고, m은 퍼지 계수이다.

는 클러스터 i의 프로토타입을 갱신할 때 클러스터 i의 하한 근사 내의 오브젝트들의 중요도를 반영한다. 그것은 전체 클러스터 i의 밀집도에 대한 클러스터 i에서의 하한 근사의 밀집도의 비율이다. 수학식 7과 수학식 8을 비교함으로써,

의 정의는 러프셋의 정확도의 표기와 유사하다는 것을 알 수 있다. 어느 정도까지,

는 퍼지 버전에서 러프셋의 정확도이다. 가중 파라미터의 계산은 퍼지셋을 분석하기 위한 밀집도의 이점과 러프셋을 기술하기 위한 정확도의 이점을 함께 통합한다.Where n is the number of objects,

Represents cluster i,

ego,

Wow

Are the lower bound approximation and boundary region of cluster i, respectively,

Is a fuzzy membership value indicating the degree to which object j belongs to cluster i, and m is a fuzzy coefficient.

Reflects the importance of the objects in the lower bound approximation of cluster i when updating the prototype of cluster i. It is the ratio of the density of the lower bound approximation in cluster i to the density of the entire cluster i. By comparing Equation 7 and Equation 8,

Is similar to the representation of the accuracy of the rough set. To some extent,

Is the accuracy of the rough set in the fuzzy version. The calculation of the weighting parameters together incorporates the advantages of density to analyze the fuzzy set and the accuracy to describe the rough set.

적응적인 가중 파라미터를 가지고, 미리 가중 파라미터를 수동으로 선택하는 것의 단점은 회피될 수 있고 각 클러스터의 구조적인 특징에 관한 지식이 고려된다. 게다가, 상기 파라미터 선택 방식은 어떤 추가적인 계산 없이 퍼지 멤버십 값들을 단지 사용한다. 프로토타입들을 계산하기 위한 설계된 기준은 하기와 같이 요약된다:With adaptive weighting parameters, the disadvantage of manually selecting the weighting parameters in advance can be avoided and knowledge of the structural characteristics of each cluster is considered. In addition, the parameter selection scheme only uses the fuzzy membership values without any additional computation. Design criteria for computing prototypes are summarized as follows:

상기에서

이고,In the above,

ego,

이며,

Lt;

이다.

to be.

상기에서

는 클러스터 i를 나타내고,

이며,

는 클러스터 i의 하한 근사이고,

는 클러스터 i의 경계 영역이며,

는 오브젝트 j를 나타내며,

이고, n은 양의 정수이며,

는 클러스터 i의 하한 근사에 있는 오브젝트들의 수이고,

는 클러스터 i의 가중 파라미터이다. 상기 가중 파라미터는 각 클러스터의 밀집도 특징에 의존하여 계산되고 클러스터 마다 상이하다.In the above,

Represents cluster i,

Lt;

Is the lower bound approximation of cluster i,

Is the boundary region of the cluster i,

Represents an object j,

, N is a positive integer,

Is the number of objects in the lower bound approximation of cluster i,

Is a weighting parameter of cluster i. The weighting parameters are calculated depending on the density characteristics of each cluster and are different for each cluster.

본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법에서는, 파라미터 선택 전략을 러프셋-기반 클러스터링 알고리즘에 도입하고 클러스터링 프로세스 동안 각 클러스터에 대한 더 적합한 가중 파라미터를 적응적으로 선택하는 러프셋-퍼지 c-평균 클러스터링 알고리즘의 향상된 버전을 개발하였다. 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법에서 제안된 적응적 러프-퍼지 c-평균 클러스터링 알고리즘의 주된 단계들은 하기와 같다:In the object clustering method for image segmentation according to an embodiment of the present invention, a parameter set strategy is introduced to a rough set-based clustering algorithm and a rough set-based clustering algorithm, which adaptively selects a more suitable weighting parameter for each cluster during a clustering process, An improved version of the fuzzy c-mean clustering algorithm was developed. The main steps of the adaptive rough-fuzzy c-means clustering algorithm proposed in the object clustering method for image segmentation according to an embodiment of the present invention are as follows:

).

단계 2: 클러스터 프로토타입들을 랜덤하게 초기화한다.Step 2: Initialize the cluster prototypes randomly.

단계 3: 루프 카운터 b=0으로 설정한다.Step 3: Set the loop counter b = 0.

단계 4: 클러스터들에 대한 각 오브젝트를 위한 멤버십 값들을 계산한다.Step 4: Compute the membership values for each object for the clusters.

단계 5: 각 오브젝트를 대응하는 근사들에 할당한다.Step 5: Allocate each object to corresponding approximations.

단계 6: 각 클러스터에 대해 수학식 9에 의해 프로토타입을 계산한다.Step 6: Calculate the prototype by equation (9) for each cluster.

단계 7:

인 경우 중단한다. 그렇지 않으면

로 설정하고 단계 4로 진행한다.Step 7:

If it is. Otherwise

And proceeds to Step 4.

여기에서,

는 클러스터 프로토타입들의 벡터들이다. 단계 7에서의

는 현재의 반복 단계에서 획득된 프로토타입들이 이전의 반복 단계에서 생성된 것들과 동일하다는 것을 의미한다.From here,

Are vectors of cluster prototypes. In step 7,

Means that the prototypes obtained in the current iteration step are the same as those generated in the previous iteration step.

최종 클러스터링 결과를 획득하기 위하여, 각 오브젝트는 퍼지 멤버십 값들에 따라 클러스터에 할당된다. 일반적으로, 하기와 같이 요약될 수 있는 최대 멤버십 절차 방법이 채택된다:To obtain the final clustering result, each object is assigned to a cluster according to fuzzy membership values. In general, the maximum membership procedure method that can be summarized as follows is adopted:

-

, i≠k에 대해,

인 경우,

는 클러스터 i에 할당된다.-

, for i? k,

Quot;

Is assigned to cluster i.

이 절차는 가장 높은 멤버십 값을 가지고 오브젝트 j를 클러스터 i에 할당한다. 퍼지 분할 매트릭스를 최종 크리스프 분할(crisp partition)로 변환하기 위하여 디퍼지화 프로세스가 수행된다.This procedure assigns object j to cluster i with the highest membership value. A defuzzification process is performed to transform the fuzzy partition matrix into a final crisp partition.

도 12는 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법의 흐름도이다.12 is a flowchart of an object clustering method for image segmentation according to an embodiment of the present invention.

도 12를 참조하면, 단계 S100에서, 클러스터 프로토타입들의 수(c), 퍼지화 파라미터(m) 및 중단 조건(

)을 설정한다.Referring to FIG. 12, in step S100, the number (c) of cluster prototypes, the fuzzification parameter (m)

).

단계 S102에서, 클러스터 프로토타입들을 초기화한다.In step S102, cluster prototypes are initialized.

단계 S104에서, 클러스터들에 대한 각 오브젝트를 위한 퍼지 멤버십 값들을 계산한다.In step S104, fuzzy membership values for each object for the clusters are calculated.

단계 S106에서, 상기 퍼지 멤버십 값에 기반하여 각 오브젝트를 러프셋의 대응하는 근사 영역에 할당한다.In step S106, each object is assigned to a corresponding approximate region of the rough set based on the fuzzy membership value.

단계 S108에서, 각 클러스터의 분포 특징에 의존하는 가중 파라미터에 기반하여 각 클러스터에 대한 프로토타입을 계산한다.In step S108, a prototype is calculated for each cluster based on weighting parameters that depend on the distribution characteristics of each cluster.

단계 S110에서,

인지가 판단된다.In step S110,

.

상기에서

는 클러스터 프로토타입들의 벡터들이고,

는 현재의 반복 단계에서 획득된 프로토타입들이 이전의 반복 단계에서 생성된 프로토타입들과 동일하다는 것을 의미한다. 상기에서

는 현재의 반복 단계에서 획득된 프로토타입들이고,

는 이전의 반복 단계에서 획득된 프로토타입들이다.In the above,

Are vectors of cluster prototypes,

Means that the prototypes obtained in the current iteration step are the same as the prototypes generated in the previous iteration step. In the above,

Are the prototypes obtained in the current iteration step,

Are the prototypes obtained in the previous iteration step.

인 경우, 단계 S112에서 현재의 반복 단계에서 결정된 프로토타입들을 각 클러스터에 대한 프로토타입으로 결정한다.

, The prototypes determined in the current iteration step are determined as prototypes for each cluster in step S112.

의 조건이 만족하지 않는 경우, 각 클러스터에 대한 프로토타입이 결정될 때까지 단계 S104 내지 단계 S108이 반복된다.

Is satisfied, steps S104 to S108 are repeated until a prototype for each cluster is determined.

단계 S114에서, 상기 결정된 각 클러스터에 대한 프로토타입에 기반하여 퍼지 멤버십 값에 따라 각 오브젝트를 대응하는 클러스터에 할당한다.In step S114, each object is assigned to a corresponding cluster according to the determined fuzzy membership value based on the prototype for each cluster.

상기 각 클러스터의 분포 특징은 각 클러스터의 밀집도(compactness)를 포함할 수 있다.The distribution characteristic of each cluster may include compactness of each cluster.

상기 단계 S108에서 각 클러스터에 대한 프로토타입은,

에 의해 계산되고,The prototype for each cluster in step < RTI ID = 0.0 > S108 &

Lt; / RTI >

상기에서

이며,

이고,

이며,In the above,

Lt;

ego,

Lt;

상기에서

는 클러스터 i의 가중 파라미터로서,

에 의해 계산되고,

이며, 상기에서

는 클러스터 i를 나타내고,

이며,

는 클러스터 i의 하한 근사이고,

는 클러스터 i의 경계 영역이며,

는 오브젝트 j를 나타내며,

이고, n은 양의 정수이며,

Is a weighting parameter of cluster i,

Lt; / RTI >

, And

Represents cluster i,

Lt;

Is the lower bound approximation of cluster i,

Is the boundary region of the cluster i,

Represents an object j,

, N is a positive integer,

May be the number of objects in the lower bound approximation of cluster i.

실험 결과Experiment result

설계된 파라미터 선택 전략은 간략히 ARFCM으로서 지칭되는 적응적 러프-퍼지 c-평균 클러스터링 알고리즘을 형성하는 하이브리드 알고리즘에 도입된다. ARFCM은 RCM, RFCM I 및 RFCM II를 포함하는 3개의 러프셋-기반 알고리즘들과 비교된다. 실험적인 연구는, 4개의 합성 데이터세트, 8개의 실세계 데이터세트 및 6개의 이미지 분할 문제들을 포함하는, 3개의 부분들로 구성된다. 모든 비교된 알고리즘들은 아마도 잘못된 시작점을 보상하기 위하여 랜덤하게 초기화되고 30번 실행된다. 통계적인 결과가 상자 그림에 의해 도시된다(McGill, 1978). 상자 그림 내의 중간 라인은 통계적인 샘플들의 평균 결과를 나타내는데 사용되고, 기호 '+'는 샘플 분포의 안정성 특징을 기술하는데 사용된다.The designed parameter selection strategy is introduced in a hybrid algorithm which forms an adaptive rough-fuzzy c-means clustering algorithm, briefly referred to as ARFCM. The ARFCM is compared with three rough set-based algorithms including RCM, RFCM I and RFCM II. Experimental work consists of three parts, including four composite data sets, eight real world data sets and six image partitioning problems. All compared algorithms are randomly initialized and run 30 times to compensate for the wrong starting point. Statistical results are shown by the box picture (McGill, 1978). The middle line in the box figure is used to represent the average result of the statistical samples, and the sign '+' is used to describe the stability characteristics of the sample distribution.

성능 평가 메트릭에 대한 요약은 이미 몇몇 논문들, Pakhira(2004); Wang(2007); Tasdemir(2011); Xu(2012)에 제시되었다. CA, ARI 및 MS를 포함하여, 평가 지수들은 제안된 알고리즘의 효과를 평가하는데 채택된다. 3개의 메트릭들은 진정한 레이블(label)의 지식이 이용가능한 조건 하에서만 사용될 수 있다, 하지만, 실제 검증은 이미지 분할의 분야에서 용이하게 획득되지 않는다. 이 경우, 분할 결과를 평가하기 위하여 PC Trauwaert(1988), PE Bezdek(1974), 및 Xie-Beni 지수(XB)(Xie, 1991)와 같은 진정한 참조 셋이 없는 메트릭이 선택된다.A summary of performance metrics is already available in several papers, Pakhira (2004); Wang (2007); Tasdemir (2011); Xu (2012). The evaluation indices, including CA, ARI and MS, are employed to evaluate the effectiveness of the proposed algorithm. The three metrics can only be used under conditions where the true label knowledge is available, but the actual verification is not readily obtained in the field of image segmentation. In this case, metrics that do not have a true reference set, such as PC Trauwaert (1988), PE Bezdek (1974), and Xie-Beni Index (XB) (Xie,

비교된 알고리즘들의 파라미터 설정은 원래의 참조, Mitra(2006); Peters(2006); Maji(2007a)의 파라미터 설정과 일관되고 적은 수의 사소한 적합한 조정을 갖는다. 표현을 단순화하기 위하여, 아라비아 숫자 1, 2, 3, 및 4를 사용하여 RCM, RFCM I, RFCM II 및 ARFCM을 표기한다. 특정 파라미터 설정은 표 3에 표시된다.The parameter setting of the compared algorithms is described in the original reference, Mitra (2006); Peters (2006); Maji (2007a) has a consistent and small number of minor adjustments to the parameter settings. To simplify the representation, use the Arabic numerals 1, 2, 3, and 4 to denote RCM, RFCM I, RFCM II, and ARFCM. Specific parameter settings are shown in Table 3.

합성 데이터세트에 대한 결과Results for composite datasets

3개의 3차원 가우시안 클래스들로부터 도출된 4개의 합성 데이터세트가 설계된다. 도 5는 4개의 합성 데이터세트의 산포도를 도시한 것이다. 각 데이터세트가 3개의 구별되는 클래스들을 갖는다는 것은 명백하다. 3개의 프로토타입들의 좌표들은 각각 (2.5, 4.5), (0.5, 1) 그리고 (3.5, 1)이다. 합성 데이터세트의 크기 및 밀집도는 이리저리 변한다.Four composite data sets derived from three three-dimensional Gaussian classes are designed. Figure 5 shows a scatter diagram of four synthetic data sets. It is clear that each data set has three distinct classes. The coordinates of the three prototypes are (2.5, 4.5), (0.5, 1) and (3.5, 1), respectively. The size and density of the composite data sets change back and forth.

도 6은 각 알고리즘에 의해 달성된 CA, ARI 및 MS의 통계값들을 도시한 것이다. 4개의 합성 데이터세트를 해결하는데 있어서의 CA의 값은 도 6a에 도시된 바와 같이 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법에 의한 거의 모든 독립적인 30 독립적인 실행에서 1에 도달할 수 있다. 합성 데이터세트 3에 대한 RFCM I과 RFCM II에 의해 획득된 CA, ARI 및 MS의 관점에서의 상자 그림들은 매우 넓은데, 이것은 그들이 이러한 문제를 처리하는데 있어서 효과적인 안정성이 결여됨을 나타내고, 이것은 이들 알고리즘들에 의해 획득된 해법들이 쉽게 로컬 최적화의 함정에 빠질 수 있기 때문이다. RFCM II가 또한 합성 데이터세트 2에 대해 ARI의 관점에서 큰 값을 획득하였을지라도, RFCM II에 의해 달성된 결과는 도 6b에 도시된 바와 같은 ARFCM에 의해 획득된 결과만큼 안정적이지 않다. 일반적으로, RFCM I과 RFCM II에 의해 획득된 결과들은 RCM의 결과보다 더 양호하므로, 러프 근사를 갖는 퍼지 멤버십 하이브리드화의 확인은 더 양호한 클러스터링 결과를 초래할 수 있다. ARFCM에 의해 획득된 통계 결과들은 다른 3개의 알고리즘들의 결과들보다, ARI 및 MS의 관점에서 더 안정적이고 우수하다.FIG. 6 shows statistical values of CA, ARI and MS achieved by each algorithm. The value of CA in solving the four composite data sets is 1 in nearly all independent 30 independent runs by object clustering method for image segmentation according to an embodiment of the present invention as shown in FIG. can do. The box pictures from the perspective of CA, ARI and MS obtained by RFCM I and RFCM II for composite data set 3 are very wide indicating that they lack effective stability in handling this problem, Can easily fall into the trap of local optimization. Although the RFCM II has also obtained a large value in terms of ARI for composite data set 2, the results achieved by RFCM II are not as stable as those obtained by ARFCM as shown in FIG. 6B. In general, the results obtained by RFCM I and RFCM II are better than those of RCM, so validation of fuzzy membership hybridization with rough approximation may result in better clustering results. The statistical results obtained by the ARFCM are more stable and superior in terms of ARI and MS than the results of the other three algorithms.

본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법에서, 가중 파라미터들은 클러스터링 프로세스 동안 적응적으로 조정된다. 도 7은 가중 파라미터들이 합성 데이터세트 2를 해결하는데 있어서 각 발생에 따라 변하는 것을 보여준다. 가중 파라미터들은 각 클러스터에 대해 구별되고 안정적인 조건에 도달할 때까지 반복마다 변한다는 것을 알 수 있다.In the object clustering method for image segmentation according to an embodiment of the present invention, the weighting parameters are adaptively adjusted during the clustering process. Figure 7 shows that the weighting parameters vary with each occurrence in solving the composite data set 2. It can be seen that the weighting parameters change per iteration until a distinct and stable condition is reached for each cluster.

UCIUCI 데이터세트에 대한 결과 Results for datasets

8 UCI 데이터세트에 대한 4개의 알고리즘의 비교 테스트가 제공된다. 데이터세트들은 Asunction(2007)로부터 다운로드될 수 있다. CA, ARI 및 MS의 관점에서 통계 결과들은 도 8에 상자 그림으로 도시된다.A comparison test of four algorithms for 8 UCI data sets is provided. The data sets may be downloaded from the Asunction 2007. The statistical results in terms of CA, ARI and MS are shown in box in Fig.

통계 결과들을 고려하여, 하기의 결론이 내려질 수 있다:Taking into account the statistical results, the following conclusions can be drawn:

1. 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법은 New-thyroid, WBC 및 Pima에 대해 다른 3개의 알고리즘들보다 가장 낮은 MS 값, 가장 높은 ARI 및 CA 값들을 갖는 최상의 결과를 제공하였다.1. An object clustering method for image segmentation according to an exemplary embodiment of the present invention provides the best result with the lowest MS value, highest ARI, and CA values for the New-thyroid, WBC, and Pima than the other three algorithms Respectively.

2. Balance라고 불리우는 복잡하게 얽히고설킨 문제를 해결하는데 있어서, 모든 4개의 알고리즘들은 안정적인 클러스터링 결과를 획득할 수 없었다(결과는 넓은 상자 그림을 갖는다). 이러한 상황에서 조차, 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법은 3개의 지수들의 관점에서 최상의 결과를 획득하였다.2. In solving the complex entangled problem called Balance, all four algorithms could not obtain stable clustering results (the results have a broad box picture). Even in this situation, the object clustering method for image segmentation according to an embodiment of the present invention has obtained the best result in terms of three indices.

3. Wine, Breast, Weather 및 Vote를 해결하는데 있어서, 다른 3개의 알고리즘들이 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법의 결과와 유사한 결과를 획득하였을지라도, 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법인 ARFCM은 모든 4개의 알고리즘들 중에서 가장 안정적인 해법을 제공하였다.3. In resolving Wine, Breast, Weather, and Vote, although three other algorithms have obtained results similar to those of the object clustering method for image segmentation according to an embodiment of the present invention, ARFCM, which is an object clustering method for image segmentation, provides the most stable solution among all four algorithms.

도 8에 도시된 통계 결과들은 ARFCM에 의해 획득된 해법들이 다른 3개의 러프셋-기반 알고리즘들보다 CA, ARI 및 MS의 관점에서 더 안정적이고 우수하다는 것을 입증한다.The statistical results shown in Figure 8 demonstrate that the solutions obtained by the ARFCM are more stable and superior in terms of CA, ARI and MS than the other three rough set-based algorithms.

이미지 분할 결과Image segmentation result

최근에, 소프트 컴퓨팅 기법이 이미지 분할을 위해 광범위하게 연구되고 있는데, 그것은 하드(hard) 클러스터링 알고리즘보다 더 많은 정보를 보유할 수 있기 때문이다. 이미지 분할은 픽셀 인텐시티 클러스터링의 프로세스로서 간주될 수 있는데, 픽셀 인텐시티는 입력 샘플들로서 사용된다. 그레이 레벨이 범위 [0, 1]에 존재하도록 스케일링되는 경우, 그들 간의 영역들 및 관계는 유사하게 이미지들의 퍼지 서브셋으로서 간주될 것이다. 동일한 영역 내의 픽셀들은 상이한 영역들에 속한 픽셀들보다 서로 더 많은 유사성을 공유한다. 예비 실험을 위해 공간 정보나 질감 정보와 같은 다른 유형의 이미지 정보 없이 그레이 정보만이 고려된다.Recently, soft computing techniques have been extensively studied for image segmentation because they can hold more information than hard clustering algorithms. Image segmentation can be viewed as a process of pixel intensity clustering, where pixel intensities are used as input samples. If the gray levels are scaled to be in the range [0, 1], the regions and relationships between them will similarly be regarded as a fuzzy subset of images. Pixels in the same region share more similarity with each other than pixels belonging to different regions. For preliminary experiments, only gray information is considered without other types of image information, such as spatial or texture information.

실험은 상이한 러프셋-기반 클러스터링 알고리즘들의 분할 성능을 비교하기 위하여 이미지 분할 알고리즘들의 분야에서 사용되는 6개의 이미지들에 대해 수행된다. 각 알고리즘의 성능을 입증하기 위하여, XB 뿐만 아니라, 분할 계수, 분할 엔트로피와 같은 지수들이 분할 결과를 평가하기 위하여 채택된다. 큰 PC 값 및 작은 PE 및 XB 값은 더 양호한 분할 결과를 나타낸다. 비교된 알고리즘들에 의해 달성된 분할 결과의 정량적인 기술이 표 4에 제시된다.Experiments are performed on six images used in the field of image segmentation algorithms to compare segmentation performance of different rough set-based clustering algorithms. To demonstrate the performance of each algorithm, not only XB but also indexes such as partitioning factor, partitioning entropy are employed to evaluate partitioning results. Large PC values and small PE and XB values indicate better splitting results. A quantitative description of the segmentation results achieved by the compared algorithms is presented in Table 4. < tb > < TABLE >

러프-퍼지 c-평균 클러스터링 알고리즘들은 퍼지셋과 러프셋 양자의 장점을 통합한다. RFCM I과 RFCM II는 표 4에 표시된 바와 같이, 대부분의 이미지들에 대해 RCM과 비교하여 3개의 지수들의 관점에서 양호한 분할 결과를 달성한다. 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법은 RFCM I과 RFCM II보다 훨씬 더 높은 PC 값 및 더 낮은 PE 및 XB 값을 달성한다. 일반적으로, ARFCM에 의해 획득된 평균 결과는 4개의 알고리즘들 중에서 최상의 것이다.Rough-fuzzy c-mean clustering algorithms integrate the advantages of both fuzzy sets and rough sets. RFCM I and RFCM II achieve good segmentation results in terms of three indices compared to the RCM for most images, as shown in Table 4. The object clustering method for image segmentation according to an embodiment of the present invention achieves much higher PC values and lower PE and XB values than RFCM I and RFCM II. In general, the average result obtained by the ARFCM is the best of the four algorithms.

상대적인 알고리즘들에 의해 획득된 분할 결과들의 비교가 또한 조사된다. 도 9는 각 알고리즘에 의해 획득된 분할 결과들의 비주얼 비교를 도시한 것이다. RCM에 의해 야기된 이미지 상세의 손실은 더 심각한데, 이것은 RCM이 뇌 MR 이미지들 상의 몇몇 위치들에서 분류를 잘못하기 때문이다. RFCM의 2개의 변종은 그들이 RCM보다 더 양호할지라도 여전히 강건함이 결여된다. ARFCM은 이미지 상세를 더 양호하게 보존할 수 있는데, 이것은 사진작가의 우측 크로스(cross) 및 좌측 다리 아래의 장소로부터 입증된다.A comparison of the segmentation results obtained by the relative algorithms is also investigated. Figure 9 shows a visual comparison of the segmentation results obtained by each algorithm. The loss of image detail caused by the RCM is more severe because the RCM miscategorizes in some locations on brain MR images. Two variants of RFCM still lack robustness, although they are better than RCM. The ARFCM can preserve image details better, which is evidenced by the photographer's right cross and the location under the left leg.

실행 시간 연구Run Time Research

상이한 크기를 갖는 합성 데이터세트에 대한 해법을 구하는데 있어서 4개의 알고리즘들에 의한 평균 실행 시간(초 단위)이 조사된다. 이 실험에서, 비교되는 알고리즘들의 코드들은 매트랩 9.0.1에서 프로그래밍되고 기계는 HP 워크스테이션 xw9300(2.19 GHz, 16GB, RAM; 휴렛 패커드, 팔로 알토, CA)이다. 운영 체제는 마이크로소프트 XP 프로페셔널 x64 에디션이다. 상이한 크기를 갖는 합성 데이터세트들이 도 10에 도시된다.The average execution time (in seconds) by four algorithms is examined in finding a solution for a composite data set of different sizes. In this experiment, the codes of the algorithms compared are programmed in MATLAB 9.0.1 and the machine is HP Workstation xw9300 (2.19 GHz, 16GB, RAM; Hewlett Packard, Palo Alto, Calif.). The operating system is Microsoft XP Professional x64 Edition. Composite data sets with different sizes are shown in FIG.

도 11로부터, RCM이 계산 시간의 관점에서 4개의 알고리즘들 중에서 최상의 것임을 알 수 있다. 그것은 RCM이 구현 동안 아무런 멤버십 값들도 사용하지 않기 때문이다. RFCM I과 RFCM II 간의 차이는 명확하지 않다. ARFCM의 계산 비용은 RFCM I과 RFCM II의 것보다 더 적다. 표 5는 상이한 알고리즘들의 수렴 조건에 도달하기 위한 평균 반복 횟수를 도시한 것이다. 표 5 및 도 11로부터, ARFCM은 수렴 조건에 도달하기 위하여 RFCM I과 RFCM II보다 더 적은 수의 반복 횟수를 필요로 함을 알 수 있다. 적응적인 가중 파라미터의 도움으로, 프로토타입은 신속하게 바람직한 위치에 가깝게 위치할 수 있다.From Fig. 11, it can be seen that RCM is the best of the four algorithms in terms of computation time. This is because RCM does not use any membership values during implementation. The distinction between RFCM I and RFCM II is unclear. ARFCM calculation costs are lower than those of RFCM I and RFCM II. Table 5 shows the average number of iterations to reach the convergence condition of the different algorithms. From Table 5 and Figure 11, it can be seen that ARFCM requires fewer iterations than RFCM I and RFCM II to reach the convergence condition. With the aid of adaptive weighting parameters, prototypes can quickly be placed close to the desired position.

결론conclusion

러프셋-기반 클러스터링 알고리즘에서, 분포 특징은 클러스터 마다 변하기 때문에, 상이한 클러스터들의 하한 근사 및 상한 근사를 기술하기 위하여 일정한 파라미터를 채택하는 것은 바람직하지 않다. 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법에서, 일정한 파라미터를 수동으로 선택하는 대신에 각 클러스터의 분포 특징에 의존하여 가중 파라미터를 적응적으로 조정하기 위하여 파라미터 선택 전략이 설계된다. 이러한 접근 방법에 의하면, 계산된 가중 파라미터들은 클러스터 마다 상이하고 각 반복 단계에서 갱신된다. 본 발명의 일 실시예에 의한 이미지 분할을 위한 오브젝트 클러스터링 방법에서 제안된 파라미터 선택 전략은 오브젝트들 사이의 글로벌 관계에 대한 지식을 이용하므로, 형성된 2개의 근사들은 클러스터의 실제 분포 특징을 기술하는 것에 더 근접할 것이다. 퍼지셋 및 러프셋을 포함하는 다양한 소프트 컴퓨팅 기법들을 결합함으로써, 설계된 패턴 인식 방법은 동시에 그들의 장점들을 획득한다.In a rough set-based clustering algorithm, it is not desirable to adopt certain parameters to describe the lower and upper bound approximations of different clusters, since the distribution characteristics vary from cluster to cluster. In the object clustering method for image segmentation according to an embodiment of the present invention, a parameter selection strategy is designed to adaptively adjust the weighting parameters depending on distribution characteristics of each cluster, instead of manually selecting certain parameters. According to this approach, the calculated weighting parameters are different for each cluster and are updated at each iteration step. In the object clustering method for image segmentation according to an embodiment of the present invention, since the proposed parameter selection strategy utilizes the knowledge of the global relation between objects, the two generated approximations are used to describe the actual distribution characteristics of clusters It will be close. By combining various soft computing techniques, including fuzzy sets and rough sets, designed pattern recognition methods simultaneously gain their advantages.

많은 수정된 버전의 c-평균 클러스터링 알고리즘에 있어서, 유클리드 거리가 유사성 척도로서 널리 사용된다. 그 성능은 일정하지 않은 밀도 또는 비초구(non-hyper spherical) 형상을 갖는 데이터세트의 해법을 찾을 때 저하될 것이다. 유사성 척도의 선택은 주어진 문제에 매우 의존하고, 클러스터링 결과에 대한 중대한 영향을 미친다.For many modified versions of the c-means clustering algorithm, the Euclidean distance is widely used as a similarity measure. Its performance will deteriorate when looking for a solution of a data set with non-constant density or non-hyper spherical shape. The choice of the similarity measure is highly dependent on the given problem and has a significant impact on the clustering results.

이상 본 발명을 구체적인 실시예를 통하여 상세하게 설명하였으나, 이는 본 발명을 구체적으로 설명하기 위한 것으로, 본 발명은 이에 한정되지 않으며, 본 발명의 기술적 사상 내에서 당 분야의 통상의 지식을 가진 자에 의해 그 변형이나 개량이 가능함은 명백하다고 할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. It is clear that the present invention can be modified or improved.

본 발명의 단순한 변형 내지 변경은 모두 본 발명의 영역에 속하는 것으로, 본 발명의 구체적인 보호 범위는 첨부된 청구범위에 의하여 명확해질 것이다.It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

S100 : 초기 설정값 설정 단계
S102 : 프로토타입 초기화 단계
S104 : 퍼지 멤버십 값 계산 단계
S106 : 오브젝트를 근사 영역에 할당하는 단계
S108 : 클러스터의 분포 특징에 의존하는 가중 파라미터에 기반하여 각 클러스터에 대한 프로토타입을 계산하는 단계
S110 : 반복 조건 판단 단계
S112 : 각 클러스터에 대한 프로토타입 결정 단계
S114 : 각 오브젝트를 대응하는 클러스터에 할당하는 단계S100: Initial setting value setting step
S102: prototype initialization step
S104: Fuzzy membership value calculation step
S106: Assigning the object to the approximate area
S108: Calculating the prototype for each cluster based on the weighting parameters dependent on the distribution characteristics of the cluster
S110: Repeat condition determination step
S112: Prototype determination step for each cluster
S114: Assigning each object to a corresponding cluster

Claims

(a) the number of cluster prototypes (c), the fuzzy parameter (m)

);
(b) initializing cluster prototypes;
(c) calculating fuzzy membership values for each object for the clusters;
(d) assigning each object to a corresponding approximate region of the rough set based on the fuzzy membership value;
(e) computing a prototype for each cluster based on a weighting parameter that is dependent on a distribution characteristic of each cluster;
(f)

, Determining prototypes determined in the current iteration step as prototypes for each cluster, and if not, repeating steps (c) through (e) until a prototype for each cluster is determined; And
(g) assigning each object to a corresponding cluster according to a fuzzy membership value based on the determined prototype for each cluster,
In the above,

Are vectors of cluster prototypes,

Means that the prototypes obtained in the current iteration step are the same as the prototypes generated in the previous iteration step,
The distribution characteristic of each cluster includes the compactness of each cluster,
The prototype for each cluster in step (e)