KR100895481B1

KR100895481B1 - Method for Region Based on Image Retrieval Using Multi-Class Support Vector Machine

Info

Publication number: KR100895481B1
Application number: KR1020070043007A
Authority: KR
Inventors: 김덕환; 송재원; 이주홍; 유승훈
Original assignee: 인하대학교 산학협력단
Priority date: 2007-05-03
Filing date: 2007-05-03
Publication date: 2009-05-06
Also published as: KR20080097753A

Abstract

본 발명은 다중 클래스 SVM(Support Vector Machine)을 이용한 영역 기반 이미지 검색 방법에 관한 것으로서, 사용자의 적합성 판단에 따른 피드백을 받아 적합한 이미지에 가중치를 부과하여 사용자의 의도를 보다 높게 반영할 수 있으며, 초기 단계에서 학습된 다중 클래스 SVM 분류기를 이용함으로써, 적합한 이미지에 포함된 영역을 재분류하지 않아 각 반복 단계에서 발생하는 중복 단계를 효과적으로 감소시킬 수 있으며, 이에 따라 검색 비용을 줄일 수 있고, 분류의 정확도가 높아 정확성에 있어서 좋은 성능을 보이는 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법을 제공하기 위한 것으로서, 그 기술적 구성은 사용자가 입력한 질의 이미지와 데이터 베이스 내의 이미지와 매칭시켜 유사한 이미지를 출력하면, 유사한 이미지 중 사용자의 적합성 평가로 적합한 이미지 집합을 생성하는 제1 단계; 초기 단계에서 상기 적합한 이미지 집합에 계층적 군집 알고리즘 및 영역 병합으로 최적의 영역 클래스를 생성하여 다중 클래스 SVM 분류기를 학습하는 제2 단계; 상기 제1 단계를 반복하여 얻은 적합한 이미지를 상기 학습된 다중 클래스 SVM 분류기로 영역 기반 분류하고, 유사도가 높은 영역 클래스를 병합하여 정제된 영역 클래스를 생성하는 제3 단계; 상기 정제된 영역 클래스로 다중 클래스 SVM 분류기를 학습하고, 제3 단계로부터 산출된 새로운 질의점을 입력하는 제1 단계로 이동하는 제4 단계; 를 포함한다.The present invention relates to a region-based image retrieval method using a multi-class support vector machine (SVM). The present invention is able to reflect a user's intention by weighting a suitable image by receiving feedback according to a user's suitability determination. By using multi-class SVM classifiers learned in stages, we can effectively reduce the redundancy stages occurring in each iteration stage by not reclassifying the regions included in the appropriate image, thus reducing the search cost and accuracy of classification. Its purpose is to provide an area-based image retrieval method using multi-class SVM that shows high performance in high accuracy. The technical configuration is similar to the query image input by the user and the image in the database. Assess Your Conformance During Images Generating a horizontally suitable image set; Learning a multi-class SVM classifier by generating an optimal region class using a hierarchical clustering algorithm and region merging on the appropriate image set in an initial stage; A third step of region-based classification of a suitable image obtained by repeating the first step with the learned multi-class SVM classifier, and merging a high similarity region class to generate a refined region class; A fourth step of learning a multi-class SVM classifier using the refined region class and moving to a first step of inputting a new query point calculated from the third step; It includes.

RBIR, 영역 기반 이미지 검색, SVM, 적합성 피드백, RF, 클래스 RBIR, Area Based Image Retrieval, SVM, Conformance Feedback, RF, Class

Description

Method for Region Based on Image Retrieval Using Multi-Class Support Vector Machine}

도 1은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법을 개략적으로 도시한 개념도.1 is a conceptual diagram schematically illustrating a region-based image retrieval method using a multi-class SVM according to the present invention.

도 2는 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법을 개략적으로 도시한 흐름도.2 is a flowchart schematically illustrating a region-based image retrieval method using a multi-class SVM according to the present invention.

도 3은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 검색 단계를 도시한 흐름도.3 is a flowchart illustrating a search step of a region-based image search method using a multi-class SVM according to the present invention;

도 4는 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 사용자 피드백 단계를 도시한 흐름도.4 is a flowchart illustrating a user feedback step of a region-based image retrieval method using a multi-class SVM according to the present invention.

도 5는 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 SVM 학습 단계를 도시한 개념도.5 is a conceptual diagram illustrating an SVM learning step of a region-based image retrieval method using a multi-class SVM according to the present invention.

도 6은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 SVM 모델링 단계를 도시한 흐름도.6 is a flowchart illustrating an SVM modeling step of a region-based image retrieval method using a multi-class SVM according to the present invention.

도 7은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 SVM 모델링 단계를 도시한 흐름도.7 is a flowchart illustrating an SVM modeling step of a region-based image retrieval method using a multi-class SVM according to the present invention.

도 8은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방 법과 계층적 클러스터링 방법의 평균 재현율 및 검색 시간을 비교한 그래프.8 is a graph comparing average reproducibility and retrieval time of a region-based image retrieval method using a multi-class SVM and a hierarchical clustering method according to the present invention.

도 9은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법에서 검색 단계의 출력과 사용자 피드백 단계의 출력을 도시한 도.9 is a diagram illustrating an output of a search step and an output of a user feedback step in a region-based image retrieval method using a multi-class SVM according to the present invention.

도 10은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법과 계층적 클러스터링 방법을 비교한 개념도.10 is a conceptual diagram comparing a region-based image retrieval method using a multi-class SVM and a hierarchical clustering method according to the present invention.

본 발명은 다중 클래스 SVM(Support Vector Machine)을 이용한 영역 기반 이미지 검색 방법에 관한 것으로, 더욱 상세하게는 사용자의 적합성 판단에 따른 피드백을 받아 적합한 이미지에 가중치를 부과하여 사용자의 의도를 보다 높게 반영할 수 있으며, 초기 단계에서 학습된 다중 클래스 SVM 분류기를 이용함으로써, 적합한 이미지에 포함된 영역을 재분류하지 않아 각 반복 단계에서 발생하는 중복 단계를 효과적으로 감소시킬 수 있으며, 이에 따라 검색 비용을 줄일 수 있고, 분류의 정확도가 높아 정확성에 있어서 좋은 성능을 보이는 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법에 관한 것이다.The present invention relates to a region-based image retrieval method using a multi-class support vector machine (SVM). More particularly, the present invention relates to a user's intention by weighting a suitable image by receiving feedback according to a user's suitability determination. By using the multi-class SVM classifier trained in the early stages, it is possible to effectively reduce the redundancy stages occurring at each iteration stage by not reclassifying the regions included in the appropriate image, thus reducing the search cost. In addition, the present invention relates to a region-based image retrieval method using a multi-class SVM which has high accuracy in classification.

일반적으로, 영역 기반 이미지 검색은 사용자가 스캔 또는 저장한 이미지를 업로드하여 질의 이미지로 입력하면, 질의 이미지를 각 영역으로 분할(Segment)하 여 이를 클러스터링 되도록 클래스로 분류하고, 이를 바탕으로 데이터 베이스 내의 이미지와 질의 이미지의 특징점 사이의 거리를 측정하여 유사한 이미지들을 사용자에게 출력한다.In general, area-based image retrieval, when a user uploads a scanned or saved image and inputs it as a query image, classifies the query image into areas to be clustered and classifies it into a class to be clustered. The distance between the feature point of the image and the query image is measured and similar images are output to the user.

여기서, 사용자의 요구를 반영하기 위하여 사용자의 적합성 피드백을 받아 알고리즘에 이용하는데, 사용자가 영역 기반 이미지 검색 알고리즘에 참여하여 시스템과 사용자 간의 상호 작용으로 질의 이미지를 해석하는 과정에서 발생할 수 있는 오류를 감소시키도록 사용자에게 출력되는 이미지의 적합성을 판단하도록 한다.Here, the user's suitability feedback is used in the algorithm to reflect the user's needs, and the user participates in the area-based image retrieval algorithm to reduce errors that may occur in the process of interpreting the query image by the interaction between the system and the user. To determine the suitability of the output image to the user.

그래서, 검색 결과에 대하여 적합 및 비적합 등의 판단을 시스템에 제공하면, 이를 기반으로 사용자가 원하는 이미지의 내용 속성을 학습하고, 사용자가 출력 결과에 대하여 만족할 때까지 반복적으로 검색을 시도한다.Thus, if the system provides a determination of whether the search result is suitable or not suitable, the system learns the content property of the image desired by the user and attempts to search repeatedly until the user is satisfied with the output result.

그러나, 이미지를 분할한 영역을 클러스터링 되도록 분류하는 과정에서, 각 반복 단계마다 클러스터링을 계속적으로 수행함으로써, 적합한 이미지에 포함된 영역을 재분류하고, 이에 따라 각 반복 단계에서 재분류가 중복되도록 발생하고, 검색 비용이 증가하는 등의 문제점이 있었다.However, in the process of classifying the region where the image is divided into clusters, clustering is continuously performed at each repetition step, thereby reclassifying the areas included in the appropriate image, and thus reclassification occurs at each repetition step. There was a problem such as an increase in search cost.

본 발명은 상기한 문제점을 해결하기 위하여 안출한 것으로, 사용자의 적합성 판단에 따른 피드백을 받아 적합한 이미지에 가중치를 부과하여 사용자의 의도를 보다 높게 반영할 수 있으며, 초기 단계에서 학습된 다중 클래스 SVM 분류기를 이용함으로써, 적합한 이미지에 포함된 영역을 재분류하지 않아 각 반복 단계에서 발생하는 중복 단계를 효과적으로 감소시킬 수 있으며, 이에 따라 검색 비용을 줄일 수 있고, 분류의 정확도가 높아 정확성에 있어서 좋은 성능을 보이는 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법을 제공하는 것을 목적으로 한다.The present invention has been made to solve the above-mentioned problems, by receiving a feedback according to the user's suitability determination to weight the appropriate image to reflect the user's intention higher, multi-class SVM classifier learned in the early stage By using, it is possible to effectively reduce the redundancy step generated in each iteration step by not reclassifying the area included in the appropriate image, thereby reducing the search cost and high classification accuracy for good performance in accuracy. An object of the present invention is to provide a region-based image retrieval method using a visible multi-class SVM.

상기한 바와 같은 목적을 달성하기 위하여 본 발명은 사용자가 입력한 질의 이미지와 데이터 베이스 내의 이미지와 매칭시켜 유사한 이미지를 출력하면, 유사한 이미지 중 사용자의 적합성 평가로 적합한 이미지 집합을 생성하는 제1 단계; 초기 단계에서 상기 적합한 이미지 집합에 계층적 군집 알고리즘 및 영역 병합으로 최적의 영역 클래스를 생성하여 다중 클래스 SVM 분류기를 학습하는 제2 단계; 상기 제1 단계를 반복하여 얻은 적합한 이미지를 상기 학습된 다중 클래스 SVM 분류기로 영역 기반 분류하고, 유사도가 높은 영역 클래스를 병합하여 정제된 영역 클래스를 생성하는 제3 단계; 상기 정제된 영역 클래스로 다중 클래스 SVM 분류기를 학습하고, 제3 단계로부터 산출된 새로운 질의점을 입력하는 제1 단계로 이동하는 제4 단계; 를 포함한다.In order to achieve the above object, the present invention provides a first step of generating an image set suitable for evaluation of a user's suitability among similar images when a user outputs a similar image by matching a query image input with an image in a database; Learning a multi-class SVM classifier by generating an optimal region class using a hierarchical clustering algorithm and region merging on the appropriate image set in an initial stage; A third step of region-based classification of a suitable image obtained by repeating the first step with the learned multi-class SVM classifier, and merging a high similarity region class to generate a refined region class; A fourth step of learning a multi-class SVM classifier using the refined region class and moving to a first step of inputting a new query point calculated from the third step; It includes.

그리고, 상기 제1 단계는 입력된 질의 이미지를 파싱(Parsing)하는 단계; 특징 공간에서의 다중 질의점과 거리 함수와 질의점들의 가중치와 질의 결과 이미지 개수로 다중 대표값을 생성하는 단계; 이미지 데이터 베이스 내의 이미지와 EMD 매칭하여 비교하는 단계; 유사도가 높은 이미지를 출력하는 단계; 로 이루어지는 검 색 단계를 포함하는 것을 특징으로 한다.The first step may include parsing an input query image; Generating multiple representative values from the multiple query points and distance functions in the feature space, the weights of the query points, and the number of query result images; Comparing and comparing the images in the image database with EMD; Outputting a high similarity image; Characterized in that it comprises a search step consisting of.

여기서, 상기 제1 단계는 출력된 이미지와 사용자가 입력한 질의 이미지를 비교하는 단계; 적합 이미지 또는 비적합 이미지로 분류하는 2진 피드백을 실행하는 단계; 상기 적합 이미지에 가중치를 부여하거나 또는 전 단계의 적합 이미지에 감쇠 요소를 적용하여 합집합시켜 적합한 이미지 집합을 생성하는 단계; 로 이루어지는 사용자 피드백 단계를 포함하는 것을 특징으로 하는 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법.The first step may include comparing the output image with the query image input by the user; Performing binary feedback that classifies a good image or a non-compliant image; Weighting the fit image or applying attenuation elements to the fit image of the previous step to combine to generate a suitable set of images; A region-based image retrieval method using a multi-class SVM comprising a user feedback step consisting of.

또한, 상기 2진 피드백은 다중 피드백으로 대체 가능한 것을 특징으로 한다.In addition, the binary feedback may be replaced with multiple feedback.

이때, 상기 제2 단계는 상기 적합한 이미지 집합 내의 적합한 이미지를 각각의 영역으로 분할하는 단계; 분할된 각 영역을 클래스에 포함시켜 초기 클래스를 형성하는 단계; 상기 초기 클래스에 계층적 군집 알고리즘을 적용하고, 영역을 병합하는 단계; 병합이 불가한 수준의 클래스를 최적의 영역 클래스로 산출하는 단계; 최적의 영역 클래스 내의 모든 영역들로 다중 클래스 SVM 분류기를 학습시키는 단계; 로 이루어지는 SVM 학습 단계를 포함하는 것을 특징으로 한다.In this case, the second step may include dividing a suitable image in the suitable image set into respective regions; Including each divided region in a class to form an initial class; Applying a hierarchical clustering algorithm to the initial class and merging regions; Calculating a class of a level that cannot be merged as an optimal region class; Training the multi-class SVM classifier with all regions in the optimal region class; Characterized in that it comprises a SVM learning step consisting of.

그리고, 상기 제3 단계는 적합한 이미지 집합 내의 적합한 이미지를 각각의 영역으로 분할하는 단계; 분할된 각 영역을 학습된 다중 클래스 SVM 분류기로 클래스로 할당되도록 분류하는 단계; 같은 수준에 있는 유사한 클래스를 병합하는 단계; 영역 클러스터의 대표값으로 새로운 질의점을 생성하는 단계; 수정된 질의 이미지를 산출하는 단계; 로 이루어지는 SVM 모델링 단계를 포함하는 것을 특징으로 한다.The third step may include dividing a suitable image in each suitable image set into respective regions; Classifying each partitioned area to be assigned to a class by the trained multi-class SVM classifier; Merging similar classes at the same level; Generating a new query point as a representative value of an area cluster; Calculating a modified query image; Characterized in that it comprises a SVM modeling step consisting of.

여기서, 상기 제4 단계는 사용자의 설정값 또는 최적값에 도달할 때까지 반복 구동시키는 것을 특징으로 한다.In this case, the fourth step may be repeated until the user's set value or optimum value is reached.

이하, 본 발명에 따른 실시예를 첨부된 예시도면을 참고로 하여 상세하게 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법을 개략적으로 도시한 개념도이고, 도 2는 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법을 개략적으로 도시한 흐름도이다.1 is a conceptual diagram schematically illustrating a region based image retrieval method using a multi-class SVM according to the present invention, and FIG. 2 is a flowchart schematically illustrating a region-based image retrieval method using a multi-class SVM according to the present invention.

도면에서 도시하고 있는 바와 같이, 본 발명에 의한 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법은 검색 단계와, 사용자 피드백 단계와, 초기 단계일 경우 계층적 군집 알고리즘을 적용하여 다중 클래스 SVM 분류기를 학습하는 단계와, 반복 단계일 경우에는 학습된 다중 클래스 SVM 분류기로 정제된 영역 클래스를 생성하는 SVM 모델링 단계로 이루어진다.As shown in the figure, in the region-based image retrieval method using the multi-class SVM according to the present invention, a multi-class SVM classifier is learned by applying a search step, a user feedback step, and a hierarchical clustering algorithm in the initial step. In the case of the iterative step, the SVM modeling step is performed to generate the refined region class with the learned multi-class SVM classifier.

여기서, 전처리 단계에서 데이터 베이스 내의 모든 이미지들은 이미지 분할 방법(Image Segmentation)이 적용되어 다수의 영역으로 구획되고, 상기 다수의 영역에 대하여 특징 벡터들(feature vectors)가 추출되어 데이터 베이스에 저장된다.Here, in the preprocessing step, all images in the database are divided into a plurality of areas by applying an image segmentation method, and feature vectors are extracted for the plurality of areas and stored in the database.

그리고, 사용자에 의하여 질의 이미지가 입력되면(S10), 입력되는 질의 이미지는 초기 질의 다중 대표값(Q=(q, d, w, k))을 생성하기 위하여 분석되며, q는 질의 이미지를 구성하는 영역들의 특징값으로 표현되므로 특징 공간상의 다수의 질의 점(query point)으로 구성되고, k는 시스템에 의하여 출력되는 질의 결과에 포함된 이미지의 수이며, d는 거리 함수이고, 상기 질의점은 거리 함수를 적용시켜 데이터 베이스에 포함된 이미지들과 비교된다.When the query image is input by the user (S10), the input query image is analyzed to generate an initial query multiple representative value (Q = (q, d, w, k)), and q constitutes the query image. It is expressed as feature values of the regions to be composed of a plurality of query points in the feature space, k is the number of images included in the query results output by the system, d is a distance function, The distance function is applied and compared with the images contained in the database.

즉, 상기 질의 이미지와 전처리 단계에서 분석된 데이터 베이스 내의 모든 이미지 간의 EMD(Earth Mover Distance) 매칭을 실시한다(S15).That is, EMD (Earth Mover Distance) matching is performed between the query image and all images in the database analyzed in the preprocessing step (S15).

이때, 상기 EMD는 지구 중력 거리(Earth Mover Distance)를 지칭한다.In this case, the EMD refers to Earth Mover Distance.

여기서, 매칭 결과 k 개의 검색 이미지가 출력되면(S19), 사용자는 결과 집합(Result(Q))에 포함되어 있는 각 이미지에 적합성 점수를 적용시켜 적합성을 평가하고(S20), 상기 적합성 점수에 기초하여 적합한 이미지 집합을 생성하게 된다(S26).Here, when k search images are output (S19), the user evaluates the suitability by applying a suitability score to each image included in the result set (Result (Q)) (S20) and based on the suitability score By generating a suitable image set (S26).

여기서, 초기 질의 이미지가 입력된 단계가 아니라, 반복 단계의 적합성 평가인 경우에는, 이전 단계의 적합한 이미지(적합한 이미지_previous)와 현재 단계의 적합한 이미지를 합집합시키되, 현재 단계의 적합한 이미지에 가중치를 주거나 또는 이전 단계의 적합한 이미지(적합한 이미지_previous)에 감쇠 요소(Decay Factor)를 적용시켜, 최근의 사용자의 적합성 평가가 더 잘 드러나도록 이루어진다.Here, when the initial query image is not an input step but a suitability evaluation of the repetition step, the appropriate image of the previous step (the suitable image _previous ) and the appropriate image of the current step are combined, but the weighted value of the appropriate image of the current step is Or by applying a decay factor to a suitable image of the previous step (a suitable image _previous ), so that a recent user's suitability evaluation is made more visible.

즉, 상기 적합한 이미지 집합은 새로 추가된 적합한 질의 이미지 및 전단계의 적합한 이미지 집합을 포함함으로써 생성되는데, 상기 추가된 적합한 질의 이미지는 사용자의 질의 개념을 보다 명확하게 반영하므로 전단계의 적합한 이미지 집합에 적용되는 가중치보다 더 큰 가중치를 적용시킨다.That is, the suitable image set is generated by including the newly added suitable query image and the previous suitable image set, and the added suitable query image more clearly reflects the concept of the user's query and thus is applied to the appropriate image set in the previous stage. Apply a weight greater than the weight.

또한, 초기 단계일 경우에는 전단계의 적합한 이미지 집합이 형성되지 않은 상태이므로, 검색 단계에서 출력된 k개의 이미지에 가중치를 주어 그 상태로 적합한 이미지 집합을 형성하도록 하며, 다음 단계의 적합한 이미지 집합의 전단계 적합한 이미지 집합의 역할을 수행하도록 이루어진다.In addition, in the initial stage, since no suitable image set is formed in the previous stage, weights are given to the k images output in the searching stage to form a suitable image set in that state, and the previous stage of the suitable image set in the next stage. To serve as a suitable set of images.

그리고, 초기 단계인지의 여부를 파악하고(S30), 초기 단계일 경우, 계층적 군집 알고리즘 및 영역 병합을 이용하여 상기 적합한 이미지 집합 내의 적합한 이미지들을 분할하고, 분할된 각 영역들을 하나씩 포함하는 초기 클래스를 형성시켜 계층적으로 군집시킨 후(S44), 영역을 병합시켜(S46) 최적 수준의 영역 클래스를 산출하도록 과정을 수행한다(S47).Then, it is determined whether it is in an initial stage (S30), and in the initial stage, an initial class including segmented regions of each of the divided images by using a hierarchical clustering algorithm and region merging, and including each divided region one by one. After forming and clustering hierarchically (S44), merging the regions (S46) is performed to calculate the region class of the optimal level (S47).

더불어, 최적 수준의 영역 클래스들로 다중 클래스 SVM 분류기를 학습시키고, 사용자가 원하는 최적값에 근사한지를 묻는다(S60).In addition, the multi-class SVM classifier is trained with the optimal level class and the user is asked whether it is close to the optimal value desired (S60).

상기 단계(S60)에서 최적값에 이르지 못했을 경우, 수정된 질의 이미지(Q'=(q', d', w', k'))로 변경시켜 질의 이미지로 입력시키고, 상기 단계(S15)로 이동한다.If the optimal value has not been reached in step S60, the modified query image Q '= (q', d ', w', k ') is changed into a query image and entered into step S15. Move.

새로운 질의 이미지로 데이터 베이스 이미지 간의 EMD 매칭시켜, 검색 이미지를 출력하고, 사용자의 피드백을 거쳐 적합한 이미지 집합을 생성시킨 후에 초기 단계 여부를 묻는데, 한번의 질의 이미지 입력으로 본 발명의 과정을 한번 실행하였으므로 초기 단계가 아니고, 이에 따라 다중 클래스 SVM 분류기를 이용하여 초기 클래스들을 영역 분류시킨다(S53).EMD matching between database images with a new query image, the search image is output, a suitable image set is generated through user feedback, and then asked whether it is an initial step. In step S53, the initial classes are not classified, using the multi-class SVM classifier.

이때, 초기 단계에서만 계층적 군집 알고리즘이 이용되고, 그 후의 반복 단 계에서는 학습된 다중 클래스 SVM 분류기를 이용하여 영역 기반하여 각 이미지로부터 분할된 영역을 클러스터링으로 집합되도록 클래스를 형성시키고(S53), 이를 다시 영역 병합을 이용하여 유사한 클래스끼리 합쳐지도록 영역을 병합시킨다(S55)At this time, the hierarchical clustering algorithm is used only in the initial stage, and in the subsequent iteration step, a class is formed such that the regions divided from each image are clustered based on the region using the learned multi-class SVM classifier (S53). The regions are merged again using region merging so that similar classes are merged (S55).

그리고, 정제된 영역 클래스가 형성되면(S56), 다중 클래스 SVM 분류기 학습 단계(S48)를 거쳐 최적값에 근사한지의 여부를 물으며(S60), 이때 최적값에 근사하지 않으면 다시 수정된 질의 이미지(Q')로 입력값을 갱신하여 다시 상기 단계(S15)로 돌아가서 최적값에 근사할 때까지 계속적으로 상기 단계(S15, S19, S20, S26, S30, S53, S55, S56, S60, S59)를 실시한다.When the refined region class is formed (S56), the multi-class SVM classifier learning step (S48) asks whether or not the approximation is the optimal value (S60). Update the input value with ') and go back to step S15 again and continuously carry out the above steps (S15, S19, S20, S26, S30, S53, S55, S56, S60, S59) until the optimum value is approximated. do.

도 3은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 검색 단계를 도시한 흐름도이다. 도면에서 도시하고 있는 바와 같이, 본 발명에 의한 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 검색 단계(Retrieval Phase)는 다음과 같다.3 is a flowchart illustrating a search step of a region-based image search method using a multi-class SVM according to the present invention. As shown in the figure, a retrieval phase of a region-based image retrieval method using a multi-class SVM according to the present invention is as follows.

사용자가 질의 이미지를 입력하면(S10), 검색을 위해 사용자가 제시한 질의 이미지는 분석(Parsing)되어 질의 이미지의 특징 공간에서의 다중 질의점(q)과, 거리 함수(d)와, 질의점들의 가중치(w)와, 질의 결과 이미지 개수(k)를 생성한다(S12).When the user inputs a query image (S10), the query image presented by the user for searching is parsed to allow multiple query points (q), distance functions (d), and query points in the feature space of the query image. The weight w and the query result image number k are generated (S12).

그래서, 초기 질의 이미지((Q)=(q,d,w,k))를 생성한다(S13).Thus, an initial query image (Q) = (q, d, w, k) is generated (S13).

그리고, 질의 이미지와 데이터 베이스 내의 이미지 간의 거리(Distance)를 측정하기 위해 EMD(Earth Mover's Distance)함수 d가 사용된다. 질의점(Query Point)은 거리 함수(d)를 이용하여 데이터 베이스에 있는 각 이미지들과 비교된다(S15).The EMD (Earth Mover's Distance) function d is used to measure the distance between the query image and the image in the database. The query point is compared with each image in the database using the distance function d (S15).

전체 이미지의 영역들에서 특징 정보를 사용하는 두 이미지 사이의 거리를 측정하기 위한 EMD는 수학식 1과 같다.EMD for measuring the distance between two images using feature information in the regions of the entire image is represented by Equation 1.

여기서, 질의 이미지 I_p의 m개의 영역들은 P={(p_i, w_pl), ‥‥,(p_m,w_pm)} 와 같이 표현되며, 이미지 I_Q의 n개의 영역들은 Q={(q_i, w_ql), ‥‥,(q_m,w_qn)}로 표현되며, w_pl _,w_pl는 각각 이미지 I_p, I_Q의 i번째 가중치를 의미한다.Here, m regions of the query image I _p are represented as P = {(p _i , w _pl ), ..., (p _m , w _pm )}, and n regions of the image I _Q are represented by Q = {( q _i , w _ql ), ..., (q _m , w _qn )}, and w _pl _and w _pl denote the i th weight of the images I _p and I _Q , respectively.

또한, d(p_i,q_j)는 영역 p_i,q_j사이의 기본 거리(Ground Distance)를 나타내고, f_ij는 영역 p_i와 영역 q_j 사이의 흐름을 나타내며, 비교하는 두 이미지의 영역 중요성(Region Importance)를 반영하기 위해 영역의 가중치 w_ql _,w_qn 가 각각 사용된다.In addition, d (p _i , q _j ) represents the basic distance (Ground Distance) between the regions p _i , q _j , and f _ij represents the region p _i and the region q _j Represents the flow between and weights the region w _ql _, w _qn to reflect the Region Importance of the two images being compared Are used respectively.

그리고, EMD 함수는

이므로

와 같이 변형되어 사용할 수 있다.And the EMD function

Because of

It can be modified and used as follows.

여기서, 거리 함수 d에 의해서 질의점 q에 근접한 상위 k개의 이미지로 구성된 결과 집합 Result(Q)이 사용자에게 반환된다(S19).Here, the result set Result (Q) consisting of the top k images close to the query point q is returned to the user by the distance function d (S19).

도 4는 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 사용자 피드백 단계를 도시한 흐름도이다. 도면에서 도시하고 있는 바와 같이, 본 발명에 의한 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 사용자 픽드백 단계(User Feedback Phase)는 다음과 같다.4 is a flowchart illustrating a user feedback step in a region-based image retrieval method using a multi-class SVM according to the present invention. As shown in the figure, the user feedback phase of the region-based image retrieval method using the multi-class SVM according to the present invention is as follows.

여기서, 사용자는 출력된 k개의 이미지를 사용자가 확인하고(S21), 사용자가 입력한 질의 이미지가 분석된 다중 대표값(Q)=(q,d,w,k)과 데이터 베이스 내의 이미지의 EMD 매칭과는 별도로 적합성을 판단하는데, 이는 사용자의 판단에 근거한 피드백으로 사용자의 만족도 및 사용자의 주관적인 판단에 근거하여 적합성을 판단한다(S23).Here, the user checks the output k images (S21), and the multiple representative values (Q) = (q, d, w, k) from which the query image input by the user is analyzed and the EMD of the images in the database. Apart from the matching, suitability is determined, which is determined based on the user's satisfaction and the user's subjective judgment as feedback based on the user's judgment (S23).

예를 들어, 검색 단계에서 EMD 매칭을 통하여 데이터 베이스 내의 이미지와 비교를 하고, EMD 매칭 결과 상위 k개의 이미지를 사용자에게 출력하면, 사용자는 k개의 이미지를 확인하고, 자신이 제시한 질의 이미지와 비교한 후, 질의 이미지와 유사하다고 판단하면 적합한 이미지에 클릭하여 적합한 이미지 집합으로 포함되도록 종속시키고(S24), 자신이 제시한 질의 이미지와 비교한 후, 질의 이미지와 유사하지 않다고 판단하면 비적합 이미지에 클릭하여 적합한 이미지 집합에 포함되지 않도록 한다(S25).For example, if the search step compares the images in the database through EMD matching, and outputs the top k images to the user as a result of the EMD matching, the user checks the k images and compares them with the query images presented by the user. After that, if it is determined that the image is similar to the query image, it is clicked on the appropriate image to be included as a suitable image set (S24). After comparing with the query image presented by the user, if it is determined that it is not similar to the query image, the image is not suitable. Click to prevent inclusion in a suitable image set (S25).

여기서, 사용자의 예 또는 아니오를 피드백으로 받는 2진 피드백만을 사용했지만, 매우 적합, 보통, 매우 비적합 등의 다중 피드백으로 피드백을 받는 것도 바람직하다.Here, although only the binary feedback that receives the user's yes or no as the feedback is used, it is also preferable to receive the feedback in multiple feedbacks such as very good, normal, and very unsuitable.

그리고, 상기 단계(S24)의 적합한 이미지를 적합한 이미지 집합에 포함시키며(S26), 사용자의 피드백으로 새롭게 추가된 적합한 이미지에 가중치를 부여하거나 또는 전단계의 적합한 이미지에 감쇠 요소(Decay Factor)를 적용하여 현재 사용자의 만족도를 최대한 반영할 수 있도록 한다(S27).In addition, by including a suitable image of the step (S24) in a suitable image set (S26), by weighting a suitable image newly added by the user's feedback or by applying a decay factor to a suitable image of the previous step It is possible to reflect the maximum satisfaction of the current user (S27).

도 5는 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 SVM 학습 단계를 도시한 개념도이다. 도면에서 도시하고 있는 바와 같이, 본 발명에 의한 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 SVM 학습 단계는 다음과 같다.5 is a conceptual diagram illustrating an SVM learning step in a region-based image retrieval method using a multi-class SVM according to the present invention. As shown in the figure, the SVM learning step of the region-based image retrieval method using the multi-class SVM according to the present invention is as follows.

도 4의 사용자 피드백 단계에서 사용자의 적합성 판단으로 가중치가 부여되어 포함된 적합한 이미지 집합 내의 적합한 이미지를 각각의 영역으로 분할한다(S41).In the user feedback step of FIG. 4, a weighted image is weighted according to a user's suitability determination to divide a suitable image into a corresponding region included in each region (S41).

예를 들어, 다람쥐 등의 동물이 있는 이미지라고 가정하면, 상기 이미지를 다람쥐인 영역과, 다람쥐의 배경이 되는 상위 영역, 하위 영역 등으로 분리시키는 것이다.For example, assuming that an image includes an animal such as a squirrel, the image is divided into an area that is a squirrel, an upper area and a lower area that serve as a background of the squirrel.

그리고, 분할된 각 영역을 클래스에 일대일로 포함시켜(S42) 초기 클래스를 형성시킨다(S43).Each divided area is included in the class one-to-one (S42) to form an initial class (S43).

즉, 클래스 하나에 각각의 분할된 영역 한개가 포함된 것을 초기 클래스라고 정의한다.That is, an initial class is defined to include one divided area in one class.

또한, 초기 클래스를 형성시키고 나면, 계층적 군집 알고리즘으로 초기 클래스들을 계층적으로 배열되도록 하고, 초기 클래스들을 병합하여 보다 적은 수의 영역 클래스들로 그룹을 형성시킨다(S44).In addition, after forming the initial class, the hierarchical clustering algorithm allows the initial classes to be arranged hierarchically, and merges the initial classes to form a group of fewer area classes (S44).

여기서, 병합이 더 이상 되지 않는 상태인지를 파악하는데(S46), 이는 병합이 되지 않는 상태이면 최적의 영역 클래스라고 판단하고(S47), 최적의 영역 클래스 내의 모든 영역들로 다중 클래스 SVM 분류기를 학습시킨다(S48).In this case, it is determined whether the merge is no longer performed (S46). If it is not merged, it is determined as an optimal region class (S47), and the multi-class SVM classifier is trained with all regions in the optimal region class. (S48).

이를 수식과 함께 다시 설명한다.This is explained again with the formula.

사용자 피드백을 거친 초기 단계에서, 적합한 이미지 집합 내의 이미지들을 각 영역으로 분할하여 이를 클래스 내로 삽입하여 초기 클래스를 구성하면, 이들을 군집하여 계층 구조를 형성한다.In the initial stage after the user feedback, the images in the appropriate image set are divided into regions and inserted into the class to form the initial class, and the groups are clustered to form a hierarchical structure.

여기서, 계층적 군집 알고리즘은 하나의 영역만을 포함하는 초기 클래스들을 병합하여 보다 적은 수의 영역 클래스들로 그룹 짓기 위해 적용되는데, 각각의 클 래스는 유사한 영역들을 군집한 영역의 집합으로 다음 단계의 질의를 구성하는 가상-영역(Pseudo-Region)에 대응되게 된다.Here, a hierarchical clustering algorithm is applied for merging initial classes that contain only one region and grouping them into a smaller number of region classes. Each class is a set of regions clustered with similar regions, and the next step is a query. Corresponds to the pseudo-Region constituting the.

그래서, 계층적 군집 알고리즘으로 형성된 클래스 계층 구조에서 g번째 수준은 g개의 클래스에 대응되는데, 이는 g번째 수준에 나열된 클래스의 개수가 g 개임을 의미하며, 클래스 내에 포함된 영역의 개수는 클래스가 병합될수록 증가한다.Thus, in the class hierarchy formed by the hierarchical clustering algorithm, the g-th level corresponds to g classes, which means that the number of classes listed in the g-th level is g, and the number of regions included in the class is merged by the class. Increases as possible.

그리고, 적절한 클래스의 개수를 구하기 위하여 영역 병합 방법을 적용하는데, 호텔링 함수 T²은 같은 수준에 있는 임의의 두 클래스, 즉 연속된 임의의 클래스가 유사한지의 여부를 검사하되, 임의의 두 클래스는 같은 클래스가 아니고, 클래스 Ci, 클래스 Cj에서 T²는 다음과 같이 정의한다.In addition, the region merging method is applied to obtain the appropriate number of classes. The hoteling function T ² checks whether two arbitrary classes at the same level, that is, any consecutive classes are similar, In class Ci and class Cj, T ² is defined as follows.

여기서, 두 클래스 C_i, C_j들은 p차원의 평균벡터

, 두 클래스의 공분산 행렬 S_i, S_j, 두 클래스에 속한 영역들의 개수 n_i, n_j로 나타내고, 최적화 클래스들의 수를 계산하기 위하여 계층을 구성하는 수준들 중에 최적화 수준을 결정해 야 하므로, g번째 클래스 수준에서 병합될 클래스들의 쌍을 결정하기 위해 (

)개의 클래스 쌍에 대해 호텔링 함수 T²들이 사용된다. Here, two classes C _i and C _j are p-dimensional average vectors

The covariance matrix S _i , S _j , and the number n _i , n _j of the classes belonging to the two classes, and the optimization level must be determined among the levels constituting the hierarchy in order to calculate the number of optimization classes, To determine the pairs of classes to be merged at the g class level (

Hotelling functions T ² are used for) class pairs.

이때, 병합이 일어나지 않는 경우에는, g번째 수준은 (g-1)번째 수준보다 최적인 수준이며, 병합이 발생하면 g번째 수준보다 (g-1)번째 수준이 최적인 수준이 된다.In this case, when no merge occurs, the g th level is an optimal level than the (g-1) th level, and when the merge occurs, the (g-1) th level is the optimal level than the g th level.

더불어, 계층은 각각의 수준들로 이루어지고, 상기 수준은 각각의 클래스로 이루어지며, 각각의 클래스는 영역들로 이루어진다.In addition, the hierarchy is made up of respective levels, the level being made up of each class, and each class being made up of regions.

또한, 영역 병합을 통해 최적의 영역 클래스들이 얻어지면, 최적 수준의 영역 클래스들에 있는 모든 영역들을 이용하여 다중 클래스 SVM 분류기들을 학습할 수 있는데, 다중 클래스 SVM 분류기의 학습을 위하여 one-versus-one 방법을 사용한다.In addition, once the optimal zone classes are obtained through zone merge, we can train the multi-class SVM classifiers using all the zones in the optimal zone classes, one-versus-one for learning the multi-class SVM classifier. Use the method.

여기서, 다른 SVM 방법들 또한 동일하게 적용 가능하다.Here, other SVM methods are equally applicable.

D는 학습에 사용되는 m개의 영역 데이터의 집합으로서, 다음과 같이 나타낼 수 있다.D is a set of m area data used for learning and can be expressed as follows.

D = {(x₁, y₁), ‥‥,(x_m,y_m)}D = {(x ₁ , y ₁ ), ‥‥, (x _m , y _m )}

여기서, 입력 패턴은 x_i∈R^p, i = 1,‥‥, m이고, 출력y_i∈{1,‥‥,g}은 x_i의 클래스인데, g개의 클래스에 대하여 g(g-1)/2쌍의 이진 SVM 분류기가 학습되고, i번째와 j번째 클래스들의 데이터를 이용한 학습을 통해, (i,j)번째 쌍의 결정 함수는 다음과 같이 산출 가능하다.Here, the input pattern is x _i ∈ R ^p , i = 1, ..., m, and the output y _i ∈ {1, ..., g} is a class of x _i , with g (g-1 for g classes. A pair of binary SVM classifiers are learned, and through the learning using data of the i th and j th classes, the (i, j) th pair of decision functions can be calculated as follows.

여기서, Minimizing 1/2(w_i _,j)^Tw_i,j은 두 클래스 C_i, C_j 간의 간격(Margin)인 2/||w_i,j||를 최대화하는 것을 의미하고, 클래스의 데이터들을 선형적으로 분리하지 못하는 경우, 학습 오류의 수를 줄이기 위하여

항이 사용된다. Here, Minimizing 1/2 (w _i _{, j} ) ^T w _{i, j} means maximizing 2 / || w _{i, j} || _{, the} margin between two classes C _i , C _j , In order to reduce the number of learning errors when data cannot be linearly separated

Term is used.

여기서,

은 구간 변수를 나타내고, C는 학습 오류와 함수의 복잡성 사이의 상관 관계(trade-off)를 조절하기 위한 매개 변수를 나타내며, φ(·)는 입력 공간 R^p에서 고차원 공간 F로의 비선형 사상 함수를 나타내고, b_i _,j는 바이어스를 나 타내고, w_i _,j는 학습에 의해 구해지는 (i,j)번째 쌍의 결정 함수

의 초평면(Hyperplane)에 수직인 벡터 성분이다.here,

Denotes the interval variable, C denotes a parameter for adjusting the trade-off between the learning error and the complexity of the function, and φ (·) denotes the nonlinear mapping function from the input space R ^p to the higher-dimensional space F Where b _i _{, j} represent bias and w _i _{, j} are the (i, j) th pair of decision functions found by learning

A vector component perpendicular to the hyperplane of.

도 6은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 SVM 모델링 단계를 도시한 흐름도이고, 도 7은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 SVM 모델링 단계를 도시한 흐름도이다.6 is a flowchart illustrating an SVM modeling step of a region-based image retrieval method using a multi-class SVM according to the present invention, and FIG. 7 illustrates an SVM modeling step of a region-based image retrieval method using a multi-class SVM according to the present invention. One flow chart.

도면에서 도시하고 있는 바와 같이, 본 발명에 의한 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법 중 SVM 모델링 단계는 다음과 같다.As shown in the figure, the SVM modeling step of the region-based image retrieval method using the multi-class SVM according to the present invention is as follows.

초기 단계를 거쳐 최적 수준의 영역 클래스로 다중 클래스 SVM 분류기를 학습시키고, 다시 검색 단계로 돌아와 새로운 질의 이미지를 입력하여, 검색 단계 및 사용자 피드백 단계를 거치면서 생성된 적합한 이미지 집합 내의 적합한 이미지를 각각의 영역으로 분할한다(S51).Through the initial steps, we train the multi-class SVM classifier with the optimal level class, then return to the search phase and enter new query images to find the appropriate images in the appropriate image set generated during the search and user feedback phases. The area is divided into areas (S51).

그리고, 분할된 각 영역을 학습된 다중 클래스 SVM 분류기를 이용하여 클래스로 할당되도록 분류하고(S53), 같은 수준에 있는 유사한 클래스를 병합하여(S55) 영역 클러스터의 대표값으로 새로운 질의점을 생성하고(S57), 수정된 질의 이미지로 입력시킨다(S59).Then, each divided region is classified to be assigned to a class using the trained multi-class SVM classifier (S53), and similar classes at the same level are merged (S55) to generate a new query point as a representative value of the region cluster. In operation S57, the modified query image is input.

여기서, 사용자가 만족할만한 최적값에 이를 때까지 반복적으로 구동시킨다.Here, it is repeatedly driven until the user reaches a satisfactory optimum value.

초기 단계를 거친 반복 단계는 영역 기반 분류 과정과 영역 병합 과정으로 구성된 SVM 모델링 단계가 실행된다.In the initial stage, the iterative phase is executed by the SVM modeling phase, which consists of the region-based classification process and the region merging process.

여기서, 반복될수록 보다 많은 적합한 이미지를 이용할 수 있고, 이에 따라 적합한 이미지들 안에 있는 영역의 수는 증가하게 되고, 적합한 이미지들의 영역을 이용하여 질의를 구성하므로, 질의 이미지와 데이터 베이스 내의 이미지 간의 EMD 거리를 계산하는데 소요되는 시간은 질의를 구성하는 영역들의 수에 비례한다.Here, as iterates, more suitable images are available, thus increasing the number of regions in the suitable images and constructing the query using the regions of the appropriate images, thus the EMD distance between the query image and the image in the database. The time taken to compute is proportional to the number of regions that make up the query.

그리고, 검색 시간을 감소시키기 위하여 학습된 SVM 분류기가 이용되는데, 다중 클래스 SVM 분류기는 적합한 이미지들의 각 영역을 g 개의 클래스들 중 하나로 분류한다.A trained SVM classifier is then used to reduce the retrieval time, where the multi-class SVM classifier classifies each region of suitable images into one of g classes.

그래서, 새로운 영역이 클래스 C_i와 C_j 중 어디에 속하는지를 파악하기 위하여 (i,j) 번째 SVM 결정 함수를 이용할 수 있는데, 이때 (i,j)번째 결정 함수

가 영역 데이터 x_l을 i 번째 클래스로 분류한다면, i 번째 클래스에 한표가 추가되고, x_l을 i번째 클래스로 분류하지 않는다면, j번째 클래스에 한표가 증가되며, 이와 같은 투표에 의하여 x_l이 가장 많은 표를 얻은 클래스로 분류될 수 있다.So, we can use the (i, j) th SVM decision function to find out whether the new region belongs to classes C _i or C _j , where the (i, j) th decision function

When classifying the region data x _l a i th class, i the vote is added to the second class, x _l a i does not classified as the second class, the vote is increased to the j-th class, by this vote x _l is Can be classified into the class with the most votes.

또한, 분류 단계 이후에 영역 클래스들은 병합되어 더 많은 영역들을 포함할 수 있으며, 예를 들어 g개의 클래스가 주어졌다면, 영역 병합 과정은 다음 단계에서 질의점 수를 감소시키기 위하여, 동일 수준에 존재하는 유사한 클래스들을 병합 하는데, 병합이 일어나면 g개의 클래스는 g번째 수준이므로, g-1개의 클래스로 줄어들게 되고, g-1 번째 수준이 되며, 호텔링 T²의 통계량을 이용하여 클래스를 측정할 수 있다.In addition, after the classification step, the domain classes can be merged to include more domains. For example, if g classes are given, the domain merging process is present at the same level to reduce the number of query points in the next step. Similar classes are merged. When the merge occurs, the g classes are at the g-th level, so that they are reduced to g-1 classes, the g-1 th level, and the class can be measured using the hotelling T ² statistics. .

그리고, To는 g번째 수준에서 연속된 임의의 클래스인 Ci와 Cj 클래스를 비교하기 위한 호텔링 함수 T² _ij의 결과값이며, 확률치인 p-값, 즉 p_ij는 다음과 같이 정의된다.To is a result of the _hoteling function T ² _ij for comparing the classes Ci and Cj, which are continuous classes at the g-th level, and the probability p-value, that is, p _ij is defined as follows.

여기서,

는 자유도(Degree of freedom)가 p와 n_i+n_j-p-1인 F분포를 나타내고, p_ij는 두 개의 클래스의 분리가 잘 되었다는 것을 나타내는데, 특히 max i<i≠j<_gp_ij가 주어진 임계값(Significant level : α) 보다 작으면, 모든 g 개의 클래스들은 분리되고, g 번째 수준이 (g-1) 번째 수준보다 최적이 되는데, 적절한 클래스들의 수를 측정하는 알고리즘은 다음과 같다.here,

Denotes the F distribution with the degree of freedom p and n _i + n _j -p-1, and p _ij indicates that the two classes are well separated, in particular max i <i ≠ j < _g p _{If ij} is less than the given critical level (α), then all g classes are separated, and the g th level is more optimal than the (g-1) th level. same.

단계 1. 초기 클래스들의 개수를 g개로 지정Step 1. Specify the number of initial classes as g

단계 2. g 클래스들과 함께 주어진 데이터를 클러스터링 한다.Step 2. Cluster the given data with the g classes.

단계 3. (

)개의 두 클래스의 쌍을 각각 비교한다.Step 3. (

Compare two pairs of classes, respectively.

단계 4. 각 클러스터링 계층에서 (

)개의 모든 p-값들이 주어진 신뢰수준 α보다 작으면, g를 클래스의 개수로 받아들인다.Step 4. At each clustering layer,

If all p-values are less than the given confidence level α, then g is taken as the number of classes.

단계 5. 만약 그렇지 않으면 g-1을 클래스들의 개수로 바꾸어 단계 2에서 단계 4를 반복한다.Step 5. If not, replace step g-1 with the number of classes and repeat step 2 through step 4.

그리고, SVM 모델링 단계에서 구해진 영역 클러스터들의 대표값은 새로운 질의점 q'을 구성하고, 이에 따라 새로운 대표값의 가중치 w'가 계산되며, 새로운 질의 이미지 Q'=(q',d',w',k)가 생성되어 다음 반복 단계를 위한 입력으로 사용되는데, 새로운 질의점(q'), 새로운 가중치(w'), 새롭게 조절된 가중치를 반영하기 위한 거리 함수(d')를 포함한다.In addition, the representative values of the region clusters obtained in the SVM modeling step constitute a new query point q ', and thus the weight w' of the new representative value is calculated, and the new query image Q '= (q', d ', w' , k) is generated and used as an input for the next iteration step, and includes a new query point q ', a new weight w', and a distance function d 'to reflect the newly adjusted weight.

도 8은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법과 계층적 클러스터링 방법의 평균 재현율 및 검색 시간을 비교한 그래프이다. 도면에서 도시하고 있는 바와 같이, 본 발명에 의한 평균 재현율 및 검색 시간을 비교한 그래프는 다음과 같다.8 is a graph comparing average reproducibility and retrieval time of a region-based image retrieval method using a multi-class SVM and a hierarchical clustering method according to the present invention. As shown in the figure, a graph comparing the average recall and search time according to the present invention is as follows.

여기서, 영역 기반 이미지 검색(RBIR: Region Based Image Retrieval)을 위하여, 사용자의 피드백 단계의 2진 피드백인 적합성 피드백 기법을 평가하기 위해 실험하며, 본 발명에 의한 영역 기반 이미지 검색과 계층적 클러스터링 방법의 영역 기반 이미지 검색 사이의 성능에 대해 비교한다.Here, for region-based image retrieval (RBIR), we experiment to evaluate the relevance feedback technique, which is a binary feedback of the user's feedback stage, and the method of region-based image retrieval and hierarchical clustering method according to the present invention. Compare for performance between area-based image retrieval.

그리고, COREL의 이미지 데이터베이스로부터 10,000개의 범용 이미지를 사용하고, 선택된 10개의 카테고리로부터 100개의 임의의 초기 질의를 생성하며, 선택된 카테고리는 일몰, 해안, 동물, 비행기, 새, 나무, 꽃, 자동차, 사람, 과일 등이다.Then, we use 10,000 general-purpose images from COREL's image database, generate 100 random initial queries from 10 selected categories, and the selected categories are sunset, coastal, animal, airplane, bird, tree, flower, car, and human. , Fruit and so on.

또한, 적합성 피드백을 얻기 위해 고수준의 카테고리 정보를 이용하는데, 사용자가 저수준의 전역적인 특징이 두드러지는 이미지가 아닌 고수준의 질의 이미지에 내포된 사용자의 의미적 개념에 기초한 이미지를 검색하기 때문이다.In addition, high-level category information is used to obtain suitability feedback, because the user searches for images based on the user's semantic concept embedded in the high-level query image, not the low-level global features.

즉, 초기 질의 이미지와 같은 카테고리의 이미지는 적합한 이미지로 고려되어 질의 이미지 각각에 대하여, 초기 질의 실행 외에 추가적인 5번의 피드백 반복이 수행되며, 모든 측정값들은 100개의 질의에 대한 평균값을 나타낸다.That is, an image of the same category as the initial query image is considered as a suitable image, and for each of the query images, an additional five times of feedback repetition is performed in addition to the initial query execution, and all the measured values represent average values for 100 queries.

본 발명에 따른 영역 기반 이미지 검색은 초기 단계 이후에 더 좋은 성능을 보이며, 5번째 반복 단계 이후의 평균 재현율은 계층적 클러스터링 방법의 영역 기반 이미지 검색에 비해 16%가량 높다.The area-based image retrieval according to the present invention shows better performance after the initial stage, and the average reproducibility after the fifth repetition stage is about 16% higher than the area-based image retrieval of the hierarchical clustering method.

본 발명에 따른 영역 기반 이미지 검색 시간은 계층적 클러스터링 방법의 영역 기반 이미지 검색 시간의 1/2 이 소요되는데, 계층적 클러스터링 방법의 영역 기반 이미지 검색이 최적의 클래스 수준을 결정하기 위하여 많은 시간을 소비하기 때문이다.The area-based image retrieval time according to the present invention takes 1/2 of the area-based image retrieval time of the hierarchical clustering method, and the area-based image retrieval of the hierarchical clustering method consumes a lot of time to determine the optimal class level. Because.

그리고, 훈련 데이터(Training Data)의 약 8%가 서포트 벡터(Support Vector)들로 선택될 때, 테스트 집합에 대한 평균 정확도는 92% 이다.And when about 8% of the training data is selected as the support vectors, the average accuracy for the test set is 92%.

도 9은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법에서 검색 단계의 출력과 사용자 피드백 단계의 출력을 도시한 도이다. 도면에서 도시하고 있는 바와 같이, 본 발명에 의한 영역 기반 이미지 검색에서 사용자 피드백이 일어나기 전에 데이터 베이스의 이미지와 EMD 매칭으로 출력한 상위 k개의 이미지와 사용자의 피드백으로 가중치가 부여되어 적합한 이미지 집합에 포함된 이미지를 비교한 그림이다.9 is a diagram illustrating an output of a search step and an output of a user feedback step in a region-based image retrieval method using a multi-class SVM according to the present invention. As shown in the figure, in the area-based image retrieval according to the present invention, weights are assigned to the top k images output by the EMD matching and the user's feedback and the user's feedback before the user feedback occurs, and included in a suitable image set. This is a comparison of the images.

여기서, 사용자 피드백 단계를 거침으로써, 사용자의 질의 이미지인 사용자의 의도를 보다 정확하게 반영되는 것을 알 수 있다.Here, it can be seen that the user's intention, which is a query image of the user, is more accurately reflected by the user feedback step.

도 10은 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법과 계층적 클러스터링 방법을 비교한 개념도이다. 도면에서 도시하고 있는 바와 같이, 계층적 클러스터링 방법은 적합한 이미지 집합으로부터 새로운 질의점들을 찾기 위하여 각 반복 단계에서 영역 군집 과정(Clustering Process)를 수행한다.10 is a conceptual diagram comparing a region-based image retrieval method using a multi-class SVM and a hierarchical clustering method according to the present invention. As shown in the figure, the hierarchical clustering method performs an area clustering process at each iteration step to find new query points from a suitable set of images.

이와 같은 영역 군집 과정은 많은 시간적 비용을 소비하는데, 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법은 SVM 모델링 단계를 포함하여 학습을 이용하여 계층적 군집에서 발생하는 시간적 비용 문제를 해결하고, 사용자의 의도를 정확히 반영할 수 있도록 군집을 분류한다.Such a region clustering process consumes a lot of time, and the region-based image retrieval method using the multi-class SVM according to the present invention includes an SVM modeling step to solve the temporal cost problem in hierarchical clustering using learning. The clusters are categorized to accurately reflect the user's intentions.

그리고, 계층적 클러스터링 방법은 적합한 이미지 집합으로부터 새로운 질의점을 찾기 위하여 적합한 이미지들의 각 영역들로부터 유사한 특성을 가지는 영역들끼리 영역 군집을 하도록 이루어지는데, 상기와 같은 과정은 반복 단계에서 계속적으로 실행되므로, 검색 비용이 증가하지만, 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법은 적합한 이미지 집합으로부터 분류된 영역들에 대한 SVM 모델링 단계를 적용하여 다음 단계의 영역 분류를 적용하므로 검색 비용을 줄일 수 있다.In addition, the hierarchical clustering method is to perform region clustering between regions having similar characteristics from respective regions of the appropriate images in order to find a new query point from the appropriate image set. However, the search cost increases, but the area-based image retrieval method using the multi-class SVM according to the present invention reduces the search cost by applying the SVM modeling step for the areas classified from the appropriate image set. Can be.

더불어, 군집 병합 과정에 있어서, 계층적 클러스터링 방법은 각 반복 단계마다 호텔링의 T²를 이용하므로 n(n-1)/2 군집쌍 간의 T² 계산 비용이 추가적으로 발생하지만, 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법은 군집 병합된 상태를 이용하여 SVM 모델링 단계를 적용함으로써, 다음 단계에서 적은 계산 비용으로 군집쌍 간의 T²을 계산할 수 있다.In addition, in the cluster merging process, since the hierarchical clustering method uses the T ² of the hoteling in each iteration step, a T ² calculation cost between n (n-1) / 2 cluster pairs is additionally generated. In the region-based image retrieval method using the class SVM, by applying the SVM modeling step using the cluster merged state, in the next step, T ² between cluster pairs can be calculated with low computational cost.

이와 같이, 본 발명에 따른 다중 클래스 SVM을 이용한 영역 기반 이미지 검색 방법은 중복 발생되는 단계를 효과적으로 줄일 수 있고, 검색 비용을 감소시킬 수 있으며, 분류의 정확도가 높아 검색의 정확성에 있어서 좋은 성능을 보인다.As described above, the area-based image retrieval method using the multi-class SVM according to the present invention can effectively reduce the overlapping step, reduce the retrieval cost, and show a good performance in retrieval accuracy due to high classification accuracy. .

이상에서는 본 발명의 바람직한 실시예를 예시적으로 설명하였으나, 본 발명의 범위는 이같은 특정 실시예에만 한정되지 않으며 해당 분야에서 통상의 지식을 가진자라면 본 발명의 특허 청구 범위내에 기재된 범주 내에서 적절하게 변경이 가 능할 것이다.Although the preferred embodiments of the present invention have been described above by way of example, the scope of the present invention is not limited to such specific embodiments, and those skilled in the art are appropriate within the scope described in the claims of the present invention. It will be possible to change it.

이상에서 설명한 바와 같이 상기와 같은 구성을 갖는 본 발명은 사용자의 적합성 판단에 따른 피드백을 받아 적합한 이미지에 가중치를 부과하여 사용자의 의도를 보다 높게 반영할 수 있으며, 초기 단계에서 학습된 다중 클래스 SVM 분류기를 이용함으로써, 적합한 이미지에 포함된 영역을 재분류하지 않아 각 반복 단계에서 발생하는 중복 단계를 효과적으로 감소시킬 수 있으며, 이에 따라 검색 비용을 줄일 수 있고, 분류의 정확도가 높아 정확성에 있어서 좋은 성능을 보이는 등의 효과를 거둘 수 있다.As described above, the present invention having the configuration as described above may receive a feedback according to the user's suitability determination and weight the appropriate image to reflect the intention of the user higher, and the multi-class SVM classifier learned in the early stage. By using, it is possible to effectively reduce the redundancy step generated in each iteration step by not reclassifying the area included in the appropriate image, thereby reducing the search cost and high classification accuracy for good performance in accuracy. You can see the effect.

Claims

A first step of matching a query image input by a user with an image in a DB and outputting a similar image, and generating an image set suitable for the user's suitability evaluation among the images;

After splitting the appropriate images in the appropriate image set and forming an initial class including each of the divided regions one by one, hierarchical clustering through a hierarchical clustering algorithm, the regions are merged to produce an optimal level class and multiplying them. A second step of learning a class SVM classifier and examining whether the class SVM classifier is approximated within a preset value to an optimum value desired by a user;

If the optimal value is not reached, a new query image consisting of query points representing each cluster is output as a suitable image through matching with the DB image, and then the region is classified using the learned multi-class SVM classifier. Thereafter, a third step of obtaining a refined region class by merging regions of similar classes; And

After learning the multi-class SVM classifier by using the refined region class of the third step, if the approximation within the predetermined value to the optimum value is not approximated, the fourth step of returning to the third step again; Region-based image retrieval using multi-class SVM.

The method of claim 1,

The first step is

Parsing the input query image;

Generating multiple representative values from the multiple query points and distance functions in the feature space, the weights of the query points, and the number of query result images;

Comparing and comparing the images in the image database with EMD;

Outputting a high similarity image;

A region-based image retrieval method using a multi-class SVM comprising a search step consisting of.

The method of claim 1,

The first step is

Comparing the output image with the query image input by the user;

Performing binary feedback that classifies a good image or a non-compliant image;

Weighting the fit image or applying attenuation elements to the fit image of the previous step to combine to generate a suitable set of images;

A region-based image retrieval method using a multi-class SVM comprising a user feedback step consisting of.

The method of claim 3,

The binary feedback may be replaced with multiple feedbacks. The area-based image retrieval method using the multi-class SVM.

delete