KR101559457B1

KR101559457B1 - Method and apparatus of class-specific weighting based for image retrieval based on bag-of-word model

Info

Publication number: KR101559457B1
Application number: KR1020140006892A
Authority: KR
Inventors: 권인소; 유동근
Original assignee: 한국과학기술원
Priority date: 2014-01-20
Filing date: 2014-01-20
Publication date: 2015-10-13
Also published as: KR20150086812A

Abstract

시각 단어 기반의 이미지 검색에서 클래스별 가중치를 학습하는 장치로서, 복수의 이미지 각각을 시각 단어 벡터로 저장하고, 각 이미지를 적어도 하나의 이미지 클래스로 분류하여 관리하는 이미지 저장부, 시각 단어의 중요도 지표에 가중치를 매핑한 가중치 할당 함수를 생성하는 가중치 할당 함수 학습부, 상기 가중치 할당 함수를 기초로 이미지 클래스별로 시각 단어의 중요도 지표에 해당하는 가중치 값을 추출하여, 각 이미지 클래스의 시각 단어 가중치 벡터를 계산하는 이미지 클래스 가중치 계산부를 포함한다.There is provided an apparatus for learning a weight for each class in a visual word based image search, the apparatus comprising: an image storage unit for storing each of a plurality of images as a visual word vector and classifying and managing each image into at least one image class; A weight assignment function learning unit for assigning a weight to a weight word of each image class by extracting a weight value corresponding to the importance index of the visual word for each image class based on the weight assignment function, And an image class weight calculation unit for calculating the image class weight.

Description

[0001] METHOD AND APPARATUS OF CLASS-SPECIFIC WEIGHTING BASED FOR IMAGE RETRIEVAL BASED ON BAG-OF-WORD MODEL [0002]

본 발명은 가중치 학습 방법 및 장치에 관한 것이다.The present invention relates to a weight learning method and apparatus.

문서 검색은 단어 가방(Bag of Words) 방법론을 활용하여 문서 간 유사도를 측정할 수 있다. 단어 가방이란 하나의 문서를 단어들의 집합으로 간주하는 것으로써, 단어의 출현 빈도수를 히스토그램으로 표시하여 벡터화한다. 히스토그램들 간의 거리를 측정하여 문서간의 유사도를 계산한다.Document retrieval can measure the similarity between documents using the Bag of Words methodology. A word bag refers to a document as a set of words, and expresses the appearance frequencies of words as a histogram and vectors them. Calculate the similarity between documents by measuring the distance between histograms.

컴퓨터 비전의 이미지 검색 분야에서, 단어 가방 방법론을 그대로 이용할 수 있다. 그러나, 이미지라는 도메인은 문서와 다르게 단어라는 개념이 존재하지 않는다. 따라서, 이미지에서도 단어를 생성하기 위한 방법이 제안되었다. In the field of image retrieval of computer vision, word bag methodology can be used as it is. However, the image domain does not have the concept of a word unlike a document. Therefore, a method for generating words in images has been proposed.

이미지의 단어 가방 방법은, 이미지의 지역 기술자(local descriptor)들을 모아 케이민즈 군집화(kmeans clustering)하여 지역 기술자 공간을 양자화한다. 양자화된 한 군집의 중심을 시각 단어(visual word)로 여긴다. 이미지의 단어 가방 방법은 문서에서와 마찬가지로, 한 장의 이미지를 시각 단어들의 출현 빈도수 히스토그램으로 표현하게 되고, 이 히스토그램들 간의 거리를 측정함으로써 이미지간 유사도를 수치화한다. 이때, 엄청난 양의 데이터베이스 이미지들을 검색하기 위하여 자료 검색 구조인 역파일 구조(inverted file structure)를 검색 알고리즘으로 사용한다.The word bag method of images quantizes the local descriptor space by clustering the local descriptors of the image and kmeans clustering. The center of a quantized cluster is taken as a visual word. As in the case of the document, the word bag method of an image expresses one image as an appearance frequency histogram of visual words, and the similarity between images is quantified by measuring the distance between the histograms. At this time, an inverted file structure, which is a data retrieval structure, is used as a retrieval algorithm in order to retrieve a huge amount of database images.

시각 단어를 통한 이미지의 표현 모델(Bag-of-Words Representation)은 시각단어를 생성하기 위하여 지역 기술자 공간을 양자화하기 때문에 정보의 손실이 발생한다. 정보의 손실은 곧 이미지간 구별성의 저하를 가져오기 때문에, 최대한 양자화로 인한 오차를 줄이는 것이 중요하다. 지역 기술자 공간을 보통 백만 개 정도의 공간으로 매우 잘게 양자화함으로써 오차를 최소화한다. 하지만, 이는 곧 히스토그램의 차원수가 백만 차원이 됨을 의미하기 때문에 히스토그램간 구별성을 높이기 위한 가중치 벡터의 학습이 매우 어렵다.The Bag-of-Words Representation through visual words quantizes the local descriptor space to generate visual words, resulting in loss of information. Since the loss of information leads to the degradation of the distinction between images, it is important to reduce errors due to quantization as much as possible. Minimize the error by quantizing the local technician space very finely, usually to about one million spaces. However, this means that the number of dimensions of the histogram becomes one million dimensions, so it is very difficult to learn the weight vector to improve the distinction between histograms.

따라서, 히스토그램간 구별성을 높이기 위한 가중치 벡터의 학습을 가능하게 하기 위한 새로운 방법이 필요하다.Therefore, a new method is needed to enable the learning of weight vectors to improve the distinction between histograms.

본 발명이 해결하고자 하는 과제는 표본 이미지 클래스의 시각 단어들의 통계 정보를 기초로 가중치 할당 함수를 학습하고, 가중치 할당 함수를 기초로 계산한 이미지 클래스의 가중치 벡터를 이용하여 이미지간 거리를 계산하는 방법 및 장치를 제공하는 것이다.A problem to be solved by the present invention is to learn a weight assignment function based on statistical information of visual words in a sample image class and calculate a distance between images using a weight vector of an image class calculated on the basis of a weight assignment function And an apparatus.

본 발명의 한 실시예에 따른 가중치 학습 장치로서, 복수의 이미지 각각을 시각 단어 벡터로 저장하고, 각 이미지를 적어도 하나의 이미지 클래스로 분류하여 관리하는 이미지 저장부, 시각 단어의 중요도 지표에 가중치를 매핑한 가중치 할당 함수를 생성하는 가중치 할당 함수 학습부, 그리고 상기 가중치 할당 함수를 기초로 이미지 클래스별로 시각 단어의 중요도 지표에 해당하는 가중치 값을 추출하여, 각 이미지 클래스의 시각 단어 가중치 벡터를 계산하는 이미지 클래스 가중치 계산부를 포함한다.An image storage unit for storing each of a plurality of images as visual word vectors and classifying each image into at least one image class and managing the weight; A weight assignment function learning unit for generating a weight assignment function mapped to each of the image classes and a weight value corresponding to importance indexes of visual words for each image class on the basis of the weight assignment function to calculate a visual word weight vector of each image class And an image class weight calculation unit.

상기 가중치 할당 함수는 시각 단어가 중요할수록 높은 가중치 값을 할당하는 모양의 함수일 수 있다.The weight assignment function may be a function of assigning a higher weight value as the time words become more important.

상기 가중치 할당 함수 학습부는 임의 갯수의 이미지 클래스들을 학습 클래스로 결정하고, 상기 학습 클래스에 포함된 이미지들을 기초로 목적 함수가 최대가 되는 함수를 상기 가중치 할당 함수로 결정하고, 상기 목적 함수는 같은 클래스의 벡터들을 서로 모으고, 서로 다른 클래스의 벡터들을 서로 멀어지게 하는 함수일 수 있다.Wherein the weighting function learning unit decides a certain number of image classes as a learning class and determines a function that maximizes an objective function based on the images included in the learning class as the weighting assignment function, May be a function of collecting vectors of different classes and moving vectors of different classes away from each other.

상기 시각 단어의 중요도 지표는 각 이미지 클래스에 포함된 이미지들 중에서 각 시각 단어의 출현 빈도에 관련된 통계 지표일 수 있다.The importance index of the visual word may be an index related to the appearance frequency of each visual word among the images included in each image class.

상기 통계 지표는 표준 편차 그리고 분산 중 어느 하나를 포함하고, 상기 통계 지표와 시각 단어의 중요도는 반비례할 수 있다.The statistical index includes any one of standard deviation and variance, and the importance of the statistical index and the visual word may be inversely proportional.

상기 이미지 클래스 가중치 계산부는 임의 이미지 클래스에 포함된 이미지들의 시각 단어 벡터를 비교하고, 상기 임의 이미지 클래스 안에서의 각 시각 단어의 출현 빈도를 기초로 각 시각 단어의 중요도 지표를 계산할 수 있다.The image class weight calculation unit compares the visual word vectors of the images included in the arbitrary image class and calculates the importance index of each visual word based on the appearance frequency of each visual word in the arbitrary image class.

상기 가중치 학습 장치는 질의 이미지와 상기 이미지 저장부에 저장된 저장 이미지 각각의 거리를 계산하고, 상기 질의 이미지와의 거리가 최소인 저장 이미지를 검색 결과로 출력하는 검색부를 더 포함하고, 상기 검색부는 각 저장 이미지가 분류된 이미지 클래스의 시각 단어 가중치 벡터를 이용하여 두 이미지 사이의 거리를 계산할 수 있다.Wherein the weight learning apparatus further comprises a search unit for calculating a distance between each of the query image and the stored image stored in the image storage unit and outputting a stored image having a minimum distance from the query image as a search result, The distance between two images can be calculated using the visual word weight vector of the image class in which the stored image is classified.

본 발명의 다른 실시예에 따른 장치가 이미지 클래스의 가중치를 학습하는 방법으로써, 복수의 이미지 각각을 시각 단어 벡터로 저장하고, 각 이미지를 적어도 하나의 이미지 클래스로 분류하여 관리하는 단계, 시각 단어의 중요도 지표에 가중치를 매핑한 가중치 할당 함수를 생성하는 단계, 상기 가중치 할당 함수를 기초로 이미지 클래스별로 시각 단어의 중요도 지표에 해당하는 가중치 값을 추출하여, 각 이미지 클래스의 시각 단어 가중치 벡터를 계산하는 단계, 그리고 각 이미지 클래스의 시각 단어 가중치 벡터를 기초로 질의 이미지와 상기 복수 이미지 각각의 거리를 계산하는 단계를 포함한다.According to another embodiment of the present invention, there is provided a method of learning a weight of an image class, the method comprising: storing each of a plurality of images as a visual word vector, classifying each image into at least one image class, Generating a weight assignment function by mapping a weight to the importance index; extracting a weight value corresponding to the importance index of the visual word for each image class based on the weight assignment function; and calculating a visual word weight vector of each image class And calculating a distance between each of the plurality of images and the query image based on the visual word weight vector of each image class.

상기 가중치 할당 함수를 생성하는 단계는 시각 단어가 중요할수록 높은 가중치 값을 할당하는 모양으로 상기 가중치 할당 함수를 생성할 수 있다.The step of generating the weighting assignment function may generate the weighting assignment function in such a manner that a higher weighting value is assigned to a more important visual word.

상기 통계 지표는 표준 편차 그리고 분산 중 어느 하나를 포함할 수 있다.The statistical indicator may include either standard deviation or variance.

상기 거리를 계산하는 단계는 상기 복수 이미지 중 제1 이미지와 상기 질의 이미지 사이의 제1 거리를 구하는 경우, 상기 제1 이미지가 분류된 이미지 클래스의 시각 단어 가중치 벡터를 이용하여 상기 제1 거리를 계산할 수 있다.Wherein the step of calculating the distance comprises calculating a first distance between the first image and the query image of the plurality of images using the visual word weight vector of the image class in which the first image is classified .

상기 가중치 학습 방법은 상기 복수 이미지 중에서 상기 질의 이미지와의 거리가 최소인 이미지를 검색 결과로 출력하는 단계를 더 포함할 수 있다.The weight learning method may further include outputting an image having a minimum distance from the query image among the plurality of images as a search result.

본 발명의 실시예에 따르면 가중치 할당 함수가 적은 수의 매개 변수만 가지고 있으므로, 가중치 할당 함수를 학습하는 것이 간단하다. 따라서, 본 발명의 실시예에 따르면 기존의 이미지 검색이 가진 초고차원적 표현에 따른 학습의 어려움을 해결할 수 있다.According to the embodiment of the present invention, since the weight assignment function has only a small number of parameters, it is simple to learn the weight assignment function. Therefore, according to the embodiment of the present invention, it is possible to solve the difficulty of learning according to the high-dimensional representation of the existing image search.

본 발명의 실시예에 따르면 이미지의 클래스별로 서로 다른 가중치를 학습한다. 따라서, 본 발명의 실시예에 따르면 기존의 이미지 검색이 학습의 어려움으로 인해 클래스별로 서로 다른 가중치를 학습할 수 없었던 문제를 해결할 수 있다.According to the embodiment of the present invention, different weights are learned for each class of images. Therefore, according to the embodiment of the present invention, it is possible to solve the problem that the conventional image search can not learn different weights for each class due to the difficulty of learning.

본 발명의 실시예에 따르면 데이터베이스 이미지는 그들의 클래스내에서 간단한 통계적 수치만 계산하면 학습된 가중치 모델을 이용해 가중치 벡터를 계산할 수 있다. 따라서, 본 발명의 실시예에 따르면 매우 적은 계산량만으로 가중치 벡터를 계산할 수 있고, 정확도를 현저히 향상시킬 수 있다.According to the embodiment of the present invention, the database image can calculate the weight vector using the learned weight model by calculating only a simple statistical value within the class. Therefore, according to the embodiment of the present invention, the weight vector can be calculated only with a very small amount of calculation, and the accuracy can be remarkably improved.

도 1은 본 발명의 한 실시예에 따른 이미지를 시각 단어로 표현하는 방법을 설명하는 도면이다.
도 2는 본 발명의 한 실시예에 따른 이미지 검색 장치의 블록도이다.
도 3은 본 발명의 한 실시예에 따른 가중치 할당 함수의 한 예시 그래프이다.
도 4는 본 발명의 한 실시예에 따른 가중치 할당 함수를 구하는 방법을 설명하는 도면이다.
도 5는 본 발명의 한 실시예에 따른 이미지 클래스의 시각 단어 가중치 벡터를 구하는 방법을 도식적으로 설명하는 도면이다.
도 6은 본 발명의 한 실시예에 따른 이미지 검색 방법의 흐름도이다.FIG. 1 is a diagram for explaining a method of expressing an image according to an embodiment of the present invention with visual words.
2 is a block diagram of an image retrieving apparatus according to an embodiment of the present invention.
3 is an exemplary graph of a weight assignment function according to an embodiment of the present invention.
4 is a diagram for explaining a method for obtaining a weight allocation function according to an embodiment of the present invention.
5 is a diagram illustrating a method of obtaining a visual word weight vector of an image class according to an embodiment of the present invention.
6 is a flowchart of an image retrieval method according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise.

도 1은 본 발명의 한 실시예에 따른 이미지를 시각 단어로 표현하는 방법을 설명하는 도면이다.FIG. 1 is a diagram for explaining a method of expressing an image according to an embodiment of the present invention with visual words.

도 1을 참고하면, 단어 가방(Bag-of-Words) 모델은 하나의 문서를 단어들의 집합으로 간주하고, 문서에 포함된 단어별 출현 빈도수를 히스토그램으로 표시하여 벡터화한다. 문서간의 유사도는 문서의 히스토그램들 간의 거리를 측정하여 계산된다. Referring to FIG. 1, a Bag-of-Words model considers a document as a set of words, and displays the frequency of occurrences of words included in the document as a histogram to vectorize. The similarity between documents is calculated by measuring the distance between the histograms of the document.

이미지 분류 및 검색 분야는 텍스트 단어 대신, 시각 단어(visual word)를 이용한 단어 가방 모델을 이용한다. 즉, 이미지는 도 1과 같이, 시각 단어별 출현 빈도수를 히스토그램으로 표시하여 벡터화된다. In the image classification and search field, a word bag model using a visual word is used instead of a text word. That is, as shown in FIG. 1, the image is vectorized by displaying the frequency of occurrence of each visual word as a histogram.

시각 단어는 기술자(descriptor), 예를 들면 SIFT 기술자를 이용하여 이미지 패치들을 128차원의 벡터로 표현한 것을 말한다. 한 장의 이미지는 여러 개의 시각 단어들(128차원의 벡터들)로 표현된다.A visual word is a 128-dimensional vector representation of image patches using a descriptor, for example, a SIFT descriptor. A single image is represented by several visual words (128-dimensional vectors).

단어 가방 모델을 이용하여 이미지를 하나의 고정된 차원의 벡터로 표현하기 위해, 시각 단어 사전이라 할 수 있는 코드북이 이용된다. 코드북은 여러 장의 학습 이미지들로부터 추출된 시각단어들을 128차원상의 공간에 뿌리고, 그들간의 유클리디언 거리를 기반으로 산술적 군집화(K-means clustering)를 통해 K개의 시각 어휘를 생성한다. K개의 시각 어휘를 갖는 시각 단어 사전을 기초로, 한 장의 이미지는 어떤 시각 어휘가 몇 번 발생했는지를 나타내는 히스토그램 벡터로 표현된다. In order to express the image as a vector of one fixed dimension using the word bag model, a codebook which can be called a visual word dictionary is used. The codebook spatters the visual words extracted from the multiple learning images into a 128-dimensional space and generates K visual vocabularies through arithmetic clustering based on the Euclidean distance between them. Based on a visual word dictionary with K visual vocabularies, a single image is represented by a histogram vector that indicates how many visual vocabularies occurred.

이미지 검색은 질의 이미지가 포함하고 있는 물체와 같은 물체를 찾기 위해서 엄청난 수의 시각 어휘가 필요하다. 예를 들어, 이미지 검색은 백만 개의 시각 어휘(K=백만)를 포함하는 시각 단어 사전을 기초로 수행된다.Image retrieval requires a vast number of visual vocabularies to find objects such as objects that the query image contains. For example, image retrieval is performed based on a visual word dictionary that contains one million visual vocabularies (K = million).

도 2는 본 발명의 한 실시예에 따른 이미지 검색 장치의 블록도이고, 도 3은 본 발명의 한 실시예에 따른 가중치 할당 함수의 한 예시 그래프이며, 도 4는 본 발명의 한 실시예에 따른 가중치 할당 함수를 구하는 방법을 설명하는 도면이다.FIG. 2 is a block diagram of an image retrieval apparatus according to an embodiment of the present invention. FIG. 3 is an exemplary graph of a weight allocation function according to an embodiment of the present invention. FIG. And a method of obtaining a weight allocation function.

도 2를 참고하면, 이미지 검색 장치(100)는 가중치 할당 함수(Weighting Mapping Function, WMF)를 기초로 이미지 클래스별 시각 단어 가중치 벡터를 생성한다. 그리고, 이미지 검색 장치(100)는 질의 이미지가 입력되면, 저장 이미지의 시각 단어 가중치 벡터를 적용하여 질의 이미지와 저장 이미지의 거리를 계산한다.Referring to FIG. 2, the image search apparatus 100 generates a visual word weight vector for each image class based on a weighting mapping function (WMF). When the query image is input, the image search apparatus 100 calculates the distance between the query image and the stored image by applying the visual word weight vector of the stored image.

이미지 검색 장치(100)는 이미지 저장부(110), 가중치 할당 함수 학습부(130), 이미지 클래스 가중치 계산부(150) 그리고 검색부(170)를 포함한다.The image search apparatus 100 includes an image storage unit 110, a weight allocation function learning unit 130, an image class weight calculation unit 150, and a search unit 170.

이미지 저장부(110)는 복수의 이미지들을 저장한다. 이미지 저장부(110)는 저장된 이미지들을 일정 기준으로 분류하여 관리할 수 있다. 각 이미지는 적어도 하나의 이미지 클래스로 분류되고, 각 이미지 클래스는 복수의 이미지를 포함한다. 이미지는 시각 단어 사전에 포함된 K개의 시각 단어들의 출현 빈도를 나타내는 시각 단어 벡터[

]로 표현된다. The image storage unit 110 stores a plurality of images. The image storage unit 110 can classify and manage the stored images according to a certain criterion. Each image is classified into at least one image class, and each image class includes a plurality of images. The image is a visual word vector representing the appearance frequency of K visual words included in the visual word dictionary [

].

가중치 할당 함수 학습부(130)는 시각 단어의 중요도 지표에 가중치를 매핑한 가중치 할당 함수를 생성한다. 가중치 할당 함수는 중요도가 큰 시각 단어에 높은 가중치를 할당하기 위한 함수이다. The weight assignment function learning unit 130 generates a weight assignment function by mapping a weight to an importance index of a visual word. The weight assignment function is a function for assigning a high weight to a visual word having a high importance.

가중치 할당 함수는 증가 함수 모양이거나, 도 3과 같이 감소 함수 모양일 수 있다. 감소 함수 모양의 가중치 할당 함수는 가로축의 값이 적을수록 가중치 값이 크다. 따라서, 감소 함수 모양의 가중치 할당 함수에서, 가로축의 지표는 값이 클수록 시각 단어의 중요도가 낮은 지표이다. 예를 들면, 가로축의 지표는 시각 단어의 출현 빈도 표준 편차(

)나 분산일 수 있다. 한 시각 단어에 대한 클래스내 출현 빈도 표준편차는 클래스내의 이미지들에서 그 시각 단어가 가장 많이 출현한 빈도수로 모두 나누는 방식으로 그 크기를 정규화한 후, 표준편차를 계산할 수 있다. 즉, 출현 빈도 편차가 적은 시각 단어는, 이미지에서 해당 시각 단어가 비슷한 빈도로 출현하므로, 해당 이미지 클래스의 특징을 포함하는 중요한 시각 단어로 볼 수 있다. 따라서, 가중치 할당 함수 학습부(130)는 출현 빈도 편차가 적은 시각 단어에 높은 가중치를 할당하는 가중치 할당 함수를 생성한다.The weight assignment function may be an increasing function form or a decreasing function form as shown in FIG. The decreasing function of the weight assignment function has a larger weight value as the value of the horizontal axis is smaller. Therefore, in the weight allocation function of the decreasing function shape, the index of the abscissa is an index whose importance of the visual word is lower as the value is larger. For example, the abscissa indicates the appearance frequency standard deviation of visual words

) Or may be dispersed. The standard deviation of the appearance frequency in a class for a visual word can be calculated by normalizing the size of the image in the class by dividing it by the frequency at which the visual word occurs most frequently. In other words, a visual word having a small deviation in appearance frequency appears as an important visual word including the characteristic of the corresponding image class because the visual word appears in the image at a similar frequency. Accordingly, the weighting assignment function learning unit 130 generates a weighting assignment function for assigning a high weighting to a visual word having a small appearance frequency deviation.

가중치 할당 함수를 다양한 방법을 통해 만들 수 있다. 예를 들어, 가중치 할당 함수 학습부(130)는 임의 개수의 이미지 클래스들을 학습 클래스(표본 클래스)로 결정한다. 도 4를 참고하면, 가중치 할당 함수 학습부(130)는 학습 클래스에 포함된 이미지들을 기초로 목적 함수가 최대가 되는 함수를 가중치 할당 함수로 결정한다. 목적 함수는 같은 클래스의 벡터들을 서로 모으고, 서로 다른 클래스의 벡터들을 서로 멀어지게 하는 함수이다. 목적 함수는 수학식 1과 같이 정의된다. The weighting function can be created in various ways. For example, the weight assignment function learning unit 130 determines a certain number of image classes as a learning class (sample class). Referring to FIG. 4, the weight assignment function learning unit 130 determines a function that maximizes the objective function based on the images included in the learning class as a weight assignment function. The objective function is a function that collects vectors of the same class and distances vectors of different classes. The objective function is defined as Equation (1).

수학식 1에서

는 가중치 할당 함수의 매개변수들, D^tr은 학습 클래스들, c는 하나의 클래스, q는 c 클래스내에서 샘플링된 질의 이미지, i는 c클래스내에 있는 임의의 이미지를 의미한다. 수학식 1에서 WG(

)는

의 변화에 따라 얼마나 클래스 내부의 이미지들이 모이는지, BC(

)는

의 변화에 따라 얼마나 클래스간 이미지들이 서로 흩어지는지를 정량적으로 계산한다. 클래스 내부의 이미지들이 모일수록, WG(

)는 큰 값을 가지게 되고, 클래스간 이미지들이 서로 흩어질수록, BC(

)는 큰 값을 가지게 된다. 목적 함수를 최대화함으로써 최적의 매개변수를 갖는 가중치 할당 함수를 갖게 된다.In Equation (1)

Are the parameters of the function, the variable weight assigned, D ^tr is the learning class, c is a class, q is the query image sampled in the class c, i refers to any image in the class c. In Equation (1), WG

)

, How the images in the class are gathered, BC

)

Quantitatively calculate how the inter-class images are scattered with each other according to the change of the class. The more images the class contains, the more the WG

) Has a large value, and as the inter-class images are scattered, BC (

) Has a large value. By maximizing the objective function, we have a weight assignment function with optimal parameters.

이미지 클래스 가중치 계산부(150)는 가중치 할당 함수를 기초로 각 이미지 클래스의 시각 단어 가중치 벡터를 계산한다. 시각 단어 가중치 벡터는 수학식 2와 같이, 시각 단어 벡터[

]에 대응하는 가중치 값을 포함하는 벡터이다. 시각 단어 가중치 벡터는 이미지 클래스마다 계산된다.The image class weight calculation unit 150 calculates a visual word weight vector of each image class based on the weight assignment function. The visual word weight vector is expressed by the visual word vector [

]. &Lt; / RTI > The visual word weight vector is calculated for each image class.

이미지 클래스 가중치 계산부(150)는 이미지 클래스에 포함된 이미지들의 시각 단어 벡터를 비교하여, 각 시각 단어의 출현 빈도에 관련된 통계 지표를 계산한다. 통계 지표는 다양할 수 있으며, 앞으로 표준 편차를 예로 들어 설명한다. The image class weight calculation unit 150 compares the visual word vectors of the images included in the image class and calculates an statistical index related to the appearance frequency of each visual word. Statistical indicators can vary, and future standard deviations are used as examples.

이미지 클래스 가중치 계산부(150)는 각 시각 단어의 출현 빈도 표준 편차(

)를 가중치 할당 함수에 매핑한다. 이미지 클래스 가중치 계산부(150)는 각 시각 단어에 대응하는 가중치 값을 추출한다. 이미지 클래스 가중치 계산부(150) 수학식 2와 같이, K개의 시각 단어 각각에 대응하는 가중치 값을 가중치 벡터로 생성한다. 이와 같이, 이미지 클래스 가중치 계산부(150)는 시각 단어들의 출현 빈도 통계를 기초로, 이미지 클래스마다 중요한 시각 단어를 간단히 계산할 수 있다.The image class weight calculation unit 150 calculates the appearance frequency standard deviation (

) To a weight assignment function. The image class weight calculation unit 150 extracts a weight value corresponding to each visual word. The image class weight calculation unit 150 generates a weight value corresponding to each of the K time words as a weight vector, as shown in Equation (2). In this manner, the image class weight calculation unit 150 can easily calculate important visual words for each image class based on the appearance frequency statistics of visual words.

검색부(170)는 질의 이미지가 입력되면, 이미지 저장부(110)에 저장된 저장 이미지들 중에서 질의 이미지에 해당하는 이미지를 검색한다. 질의 이미지는 시각 단어 벡터[

]로 표현될 수 있다. When a query image is input, the search unit 170 searches an image corresponding to the query image among the stored images stored in the image storage unit 110. The query image is a visual word vector [

].

검색부(170)는 질의 이미지의 시각 단어 벡터와 각 저장 이미지의 시각 단어 벡터[

]의 거리를 계산한다. 이때, 검색부(170)는 저장 이미지가 포함된 이미지 클래스의 시각 단어 가중치 벡터를 기초로 각 시각 단어에 가중치를 부여한다. 여기서, 저장 이미지의 시각 단어 가중치 벡터는 저장 이미지가 속한 클래스의 시각 단어 가중치 벡터이다.The retrieval unit 170 retrieves the visual word vector of the query image and the visual word vector [

] Is calculated. At this time, the search unit 170 assigns weights to the respective visual words based on the visual word weight vector of the image class including the stored image. Here, the visual word weight vector of the stored image is a visual word weight vector of the class to which the stored image belongs.

검색부(170)는 수학식 3과 같이 시각 단어 가중치 벡터를 이용하여 두 이미지 사이의 거리를 계산한다.The search unit 170 calculates the distance between two images using the visual word weight vector as shown in Equation (3).

검색부(170)는 질의 이미지와 이미지 저장부(110)에 저장된 저장 이미지들의 거리를 비교하여 질의 이미지에 해당하는 이미지를 검색한다. 검색부(170)는 거리가 가장 가까운 저장 이미지를 출력한다.The search unit 170 compares the query image with the distance of the stored images stored in the image storage unit 110 to search for an image corresponding to the query image. The search unit 170 outputs the stored image having the closest distance.

종래에, 클래스별로 중요한 시각 단어에 가중치를 할당하는 방법으로 메트릭 학습(metric learning)과 같은 방법이 있다. 그러나, 고차원 표현법 기반의 이미지 검색은 이미지가 백만 차원 정도의 매우 많은 차원의 벡터로 표현되기 때문에 기존의 학습 프레임워크로는 현실적인 어려움이 많다. 특히, 백만 차원의 가중치 벡터를 클래스마다 생성하는 것은 계산비용이 커서 실현 불가능하고, 충분한 학습용 데이터가 없다면 오버 피팅(over-fitting)되기 쉬우며, 고비용으로 클래스마다 각 가중치 벡터를 학습해주어야 한다.Conventionally, there is a method such as metric learning as a method of assigning weights to important visual words per class. However, image retrieval based on high-level expression is difficult because of the existing learning framework, because the image is represented by a very large number of dimensions of about one million dimensions. In particular, it is difficult to generate a million-dimensional weight vector for each class because it is computationally expensive, and if there is not enough learning data, it is likely to be over-fitting and each weight vector must be learned for each class at a high cost.

그러나, 이미지 검색 장치(100)는 출현 빈도에 관련된 통계 지표(예를 들면, 표준편차)를 이용하여, 백만 차원의 벡터라고 하더라도 이미지 클래스마다 시각 단어의 중요성을 표현하는 가중치를 쉽게 계산할 수 있다. 또한, 이미지 검색 장치(100)는 하나의 가중치 할당 함수를 생성하면 되고, 이미지 클래스별 통계 지표를 가중치 할당 함수에 대응하여, 각 이미지 클래스의 시각 단어 가중치 벡터를 구할 수 있다.However, the image search apparatus 100 can easily calculate a weight representing the importance of a visual word for each image class, even if the vector is a million-dimensional vector, by using an statistical index (for example, standard deviation) related to the appearance frequency. In addition, the image search apparatus 100 may generate one weight assignment function, and may calculate the visual word weight vector of each image class in correspondence with the weighting function of the statistical index for each image class.

도 5는 본 발명의 한 실시예에 따른 이미지 클래스의 시각 단어 가중치 벡터를 구하는 방법을 도식적으로 설명하는 도면이다.5 is a diagram illustrating a method of obtaining a visual word weight vector of an image class according to an embodiment of the present invention.

도 5를 참고하면, 예를 들어, M개의 이미지가 이미지 클래스1에 포함되어 있다. M개의 이미지 각각은 각 시각 단어의 출현 빈도를 나타내는 시각 단어 벡터로 표현된다. Referring to FIG. 5, for example, M images are included in image class 1. Each of the M images is represented by a visual word vector representing the appearance frequency of each visual word.

이미지 검색 장치(100)는 시각 단어별로 M개의 이미지에서 출현한 출현 빈도 관련 통계 지표를 계산한다. 여기서, 출현 빈도 관련 통계 지표는 이미지 클래스에서 각 시각 단어의 중요도를 나타내는 정보이다. 통계 지표는 예를 들면, 표준 편차 또는 분산일 수 있다. 한 이미지 클래스에서, 어느 시각 단어의 출현 빈도가 비슷하면, 해당 시각 단어는 이미지 클래스의 특징을 포함하는 중요 시각 단어이다. 반대로 한 이미지 클래스에서, 어느 시각 단어의 출현 빈도가 이미지마다 차이를 보인다면, 해당 시각 단어는 이미지 클래스에서 중요하지 않은 시각 단어일 수 있다. The image search apparatus 100 calculates an appearance frequency-related statistical index appearing in M images by visual words. Here, the statistical index related to the appearance frequency is information indicating importance of each visual word in the image class. The statistical indicator may be, for example, standard deviation or variance. In an image class, if the appearance frequencies of certain visual words are similar, the visual words are important visual words containing the characteristics of the image class. Conversely, in an image class, if the appearance frequency of a visual word varies from image to image, the visual word may be a visual word that is not significant in the image class.

이미지 검색 장치(100)는 이미지 클래스1에서, 시각 단어별 표준 편차를 계산한다. 즉, 이미지 검색 장치(100)는 시각 단어1(

)의 출현 빈도 표준 편차(

)부터 시각 단어K(

)의 출현 빈도 표준 편차(

)까지 계산한다. The image search apparatus 100 calculates the standard deviation of each visual word in image class 1. That is, the image search apparatus 100 searches the visual word 1 (

) Frequency standard deviation (

) To the visual word K (

) Frequency standard deviation (

).

이미지 검색 장치(100)는 가중치 할당 함수를 기초로 각 시각 단어의 표준 편차에 해당하는 가중치 값을 추출한다. 이미지 검색 장치(100)는 각 시각 단어의 가중치 값을 기초로 이미지 클래스1의 시각 단어 가중치 벡터(

)를 계산한다.The image search apparatus 100 extracts a weight value corresponding to the standard deviation of each visual word based on the weight assignment function. The image retrieval apparatus 100 calculates the visual word weight vector of image class 1 based on the weight value of each visual word

).

이와 같이, 이미지 검색 장치(100)는 시각 단어별 통계 지표를 기초로 각 이미지 클래스의 시각 단어 가중치 벡터를 계산한다. In this manner, the image search apparatus 100 calculates the visual word weight vector of each image class based on the statistical index for each visual word.

도 6은 본 발명의 한 실시예에 따른 이미지 검색 방법의 흐름도이다.6 is a flowchart of an image retrieval method according to an embodiment of the present invention.

도 6을 참고하면, 이미지 검색 장치(100)는 목적 함수를 기초로 시각 단어의 중요도 지표에 가중치를 매핑한 가중치 할당 함수를 생성한다(S110). 목적 함수는 같은 클래스의 벡터들을 서로 모으고, 서로 다른 클래스의 벡터들을 서로 멀어지게 하는 함수이다. 이미지 검색 장치(100)는 목적 함수가 최대가 되는 함수를 가중치 할당 함수로 결정한다. 시각 단어의 중요도 지표는 시각 단어의 출현 빈도에 관련된 통계 지표이며, 예를 들면, 출현 빈도 표준 편차나 분산일 수 있다.Referring to FIG. 6, the image search apparatus 100 generates a weight assignment function by mapping a weight to a significance index of a visual word based on an objective function (S110). The objective function is a function that collects vectors of the same class and distances vectors of different classes. The image search apparatus 100 determines a function that maximizes an objective function as a weight allocation function. The significance index of the visual word is an statistical index related to the appearance frequency of the visual word, for example, the appearance frequency standard deviation or variance.

이미지 검색 장치(100)는 가중치 할당 함수를 기초로 각 이미지 클래스의 시각 단어 가중치 벡터를 계산한다(S120). 이미지 검색 장치(100)는 이미지 클래스의 시각 단어별 출현 빈도에 관련된 통계 지표를 계산한다. 그리고, 이미지 검색 장치(100)는 가중치 할당 함수를 이용하여 통계 지표에 대응하는 가중치 값을 추출한다. The image search apparatus 100 calculates a visual word weight vector of each image class based on the weight assignment function (S120). The image retrieval apparatus 100 calculates an index of statistics related to the frequency of appearance of the image class per visual word. Then, the image search apparatus 100 extracts a weight value corresponding to the statistical index using the weight allocation function.

이미지 검색 장치(100)는 질의 이미지를 수신한다(S130).The image search apparatus 100 receives the query image (S130).

이미지 검색 장치(100)는 질의 이미지와 각 저장 이미지의 거리를 계산한다(S140). 이때, 이미지 검색 장치(100)는 저장 이미지가 포함된 이미지 클래스의 시각 단어 가중치 벡터를 이용하여, 시각 단어 사이의 거리에 가중치를 적용한다. 즉, 중요한 시각 단어 사이의 거리는 중요하지 않은 시각 단어 사이의 거리보다 큰 가중치 값을 적용한다.The image searching apparatus 100 calculates the distance between the query image and each stored image (S140). At this time, the image search apparatus 100 applies the weight to the distance between the visual words using the visual word weight vector of the image class including the stored image. That is, the distance between the important visual words is greater than the distance between the non-critical visual words.

이미지 검색 장치(100)는 거리가 가장 가까운 저장 이미지를 검색 결과로 출력한다(S150).The image search apparatus 100 outputs a stored image having the closest distance as a search result (S150).

이와 같이, 본 발명의 실시예에 따르면 가중치 할당 함수가 적은 수의 매개 변수만 가지고 있으므로, 가중치 할당 함수를 학습하는 것이 간단하다. 따라서, 본 발명의 실시예에 따르면 기존의 이미지 검색이 가진 초고차원적 표현에 따른 학습의 어려움을 해결할 수 있다.As described above, according to the embodiment of the present invention, since the weight assignment function has only a small number of parameters, it is simple to learn the weight assignment function. Therefore, according to the embodiment of the present invention, it is possible to solve the difficulty of learning according to the high-dimensional representation of the existing image search.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present invention described above are not implemented only by the apparatus and method, but may be implemented through a program for realizing the function corresponding to the configuration of the embodiment of the present invention or a recording medium on which the program is recorded.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It belongs to the scope of right.

Claims

An apparatus for learning a weight of an image class based on a visual word,
An image storage unit that stores each of the plurality of images as a visual word vector, classifies each image into at least one image class,
A weight assignment function learning unit for generating a weight assignment function in which a weight is mapped to a significance index of a visual word, and
An image class weight calculation unit for extracting a weight value corresponding to an importance index of a visual word for each image class based on the weight assignment function and calculating a visual word weight vector of each image class,
Lt; / RTI >
The image class weight calculation unit
And compares the visual word vectors of images included in the arbitrary image class and calculates the importance index of each visual word based on the appearance frequency of each visual word in the arbitrary image class.

delete

The method of claim 1,
The weight assignment function learning unit
Determining a certain number of image classes as a learning class, determining a function that maximizes an objective function based on the images included in the learning class as the weight assignment function,
Wherein the objective function is a function of collecting vectors of the same class and distancing vectors of different classes from each other.

The method of claim 1,
The importance index of the visual word
A weight learning apparatus, which is an statistical index related to occurrence frequency of each visual word among images included in each image class.

5. The method of claim 4,
Wherein the statistical indicator includes any one of standard deviation and variance,
Wherein the importance of the statistical index and the visual word is in inverse proportion.

delete

The method of claim 1,
Further comprising a search unit for calculating a distance between each of the query images and each of the stored images stored in the image storage unit and outputting a stored image having a minimum distance from the query image as a search result,
Wherein the searching unit calculates a distance between two images using a visual word weight vector of an image class in which each stored image is classified.

delete