KR101279484B1

KR101279484B1 - Apparatus and method for processing image

Info

Publication number: KR101279484B1
Application number: KR1020120040191A
Authority: KR
Inventors: 전혁준; 이동한; 서두천; 이선구
Original assignee: 한국항공우주연구원
Priority date: 2012-04-18
Filing date: 2012-04-18
Publication date: 2013-06-27

Abstract

PURPOSE: An image processing apparatus and a method thereof are provided to expand a Euclidean model using an interest point extracting technique based on a similar shape model without drastic correction. CONSTITUTION: An image processing unit(130) respectively generates an L-scale space expanding or minimizing a scale about two images and generates a normalized local patch by resampling a local patch of the L-scale space with a predetermined ratio. The image processing unit extracts an interest point by applying an interest point function to the normalized local patch and generating an M-scale space. [Reference numerals] (110) Input unit; (120) Database; (130) Image processing unit; (140) Output unit

Description

Image processing apparatus and method {APPARATUS AND METHOD FOR PROCESSING IMAGE}

본 발명은 영상 처리 장치 및 방법에 관한 것으로서, 더욱 상세하게는 영상의 관심점을 추출하는 영상 처리 장치 및 방법에 관한 것이다.The present invention relates to an image processing apparatus and method, and more particularly, to an image processing apparatus and method for extracting a point of interest of an image.

영상 관심점(interest point) 기법은 영상으로부터 코너점, 대칭점 등을 추출하는 기법이다. 이 관심점 기법을 활용하여 영상 검색, 영상 정합, 영상 파노라마를 행한다. 종래의 기술로는 회전, 이동, 노이즈에 강인한 유클리드 모델 기반 관심점 추출 기법(harris corner detector, KLT corner detector, GST)이 있으며, 회전, 이동, 스케일, 노이즈에 강인한 닮은 꼴 모델 기반 관심점 추출 기법(LoG, Harris-Laplacian, Hessian-Laplacian, SIFT, SURF)이 있다. 기타 모델로 아핀 변환에 강인한 관심점 추출 기법(Affine-point, ASIFT)이 있다.The image interest point technique is a technique for extracting corner points and symmetry points from an image. Image search, image registration, and image panorama are performed using this method of interest. Conventional techniques include a Euclidean model based point of interest extraction technique (harris corner detector, KLT corner detector, GST) that is robust against rotation, movement, and noise, and a similar model based point of interest extraction technique that is robust against rotation, movement, scale, and noise. (LoG, Harris-Laplacian, Hessian-Laplacian, SIFT, SURF). Another model is the point-of-interest extraction technique (Affine-point, ASIFT) that is robust to affine transformation.

이중에서 개발이 가장 쉬운 것은 유클리드 모델 기반이다. 그 외에는 개발의 어려움으로 많은 종류의 관심점 추출 기법이 개발되지 못하고 있다. 특히, 닮은 꼴 모델 기반 관심점 추출 기법은 작은 크기를 다루는 관심점 기법에서는 다른 모델보다 더 유용하고 효율적이다. 관심점 기법의 핵심은 관심점의 강도를 계산해내는 관심점 함수의 설계에 있다. 닮은 꼴 모델에서 종래 기술들은 스케일을 시뮬레이션 하는 스케일 스페이스(scale space) 방법에 스케일 정규화 항을 추가한 기존의 유클리드 모델 기반 관심점 추출 기법을 확장하는 형태가 많다.The easiest to develop is based on the Euclidean model. Besides, due to the difficulty of development, many kinds of interest extraction techniques have not been developed. In particular, the similar model-based interest extraction technique is more useful and efficient than other models in the small-size interest scheme. The key to the point of interest technique is the design of the point of interest function, which calculates the strength of the point of interest. In the similar model, the conventional techniques have many forms of extending the existing Euclidean model-based point of interest extraction technique that adds scale normalization terms to the scale space method for simulating scale.

그러나 기존 유클리드 모델 기반 관심점 추출 기법을 닮은 꼴 변환 모델로 확장하는데 있어서 가장 큰 문제는 정규화 항 처리이다. 이것은 스케일 스페이스 기법을 적용하여 관심점을 추출할 경우 관심점 함수의 입력값인 로컬 영역의 크기가 변함에 따라서, 관심점 함수의 강도 값이 크기에 의존적이기 때문이다.However, the biggest problem in extending the form transformation model that resembles the existing Euclidean model-based point of interest extraction is the normalization term processing. This is because the intensity value of the point of interest function is dependent on the size of the local area, which is an input value of the point of interest function, when the point of interest is extracted by applying the scale space technique.

따라서 본 발명이 해결하고자 하는 과제는 유클리드 모델 기반 관심점 추출 기법을 큰 수정 없이 닮은 꼴 모델 기반 관심점 추출 기법으로 확장함으로써 더 다양한 타입의 관심점을 추출할 수 있는 영상 처리 장치 및 방법을 제공하는 것이다.Accordingly, an object of the present invention is to provide an image processing apparatus and method capable of extracting more types of points of interest by extending the Euclidean model based point of interest extraction method to a similar model based point of interest extraction method without significant modification. will be.

이러한 과제를 해결하기 위한 본 발명의 한 실시예에 따른 두 개의 영상으로부터 관심점을 매칭하는 영상 처리 장치는, 상기 두 개의 영상에 대하여 스케일을 확대 또는 축소한 L-스케일 스페이스를 각각 생성하고, 상기 L-스케일 스페이스의 로컬 패치에 대하여 미리 정해진 배율을 사용하여 리샘플링하여 정규화 로컬 패치를 생성하며, 상기 정규화 로컬 패치에 대하여 관심점 함수를 적용하여 M-스케일 스페이스를 생성함으로써 관심점을 추출하는 영상 처리부를 포함한다.In order to solve this problem, an image processing apparatus for matching a point of interest from two images according to an embodiment of the present invention generates an L-scale space in which scales are enlarged or reduced for the two images, respectively. An image processor extracting a point of interest by generating a normalized local patch by resampling a local patch of the L-scale space using a predetermined magnification and generating an M-scale space by applying a point of interest function to the normalized local patch. It includes.

상기 영상 처리부는 상기 M-스케일 스페이스로부터 복수의 극점을 추출하여 상기 추출된 복수의 극점을 그룹화하고, 상기 그룹화된 극점을 대비하여 상기 두 개의 영상의 관심점을 추출할 수 있다.The image processing unit may extract a plurality of poles from the M-scale space, group the extracted plurality of poles, and extract points of interest of the two images in preparation for the grouped poles.

상기 영상 처리부는 상기 추출된 극점을 상기 M-스케일 스페이스의 상위 레벨에서 하위 레벨로 순회하면서 상기 추출된 극점을 그룹화하고, 상기 그룹화된 극점에 대하여 대표 위치, 대표 강도 및 스케일 지속성을 구하여 상기 그룹화된 극점의 대푯값으로 설정할 수 있다.The image processing unit groups the extracted poles while traversing the extracted poles from an upper level to a lower level of the M-scale space, and obtains representative positions, representative intensities, and scale persistences of the grouped poles. It can be set as the representative value of the pole.

상기 영상 처리부는 상기 극점 주변의 로컬 패치의 특징 벡터인 로컬 디스크립터를 추출하고 상기 두 개의 영상의 로컬 디스크립터 사이의 거리를 측정하여 측정된 거리에 따라 상기 두 개의 영상의 그룹화된 극점의 대응 관계를 생성할 수 있다.The image processor extracts a local descriptor that is a feature vector of a local patch around the pole, measures a distance between local descriptors of the two images, and generates a corresponding relationship between the grouped poles of the two images according to the measured distance. can do.

상기 영상 처리부는 상기 그룹화된 극점의 대응 관계 개수에 따라 상기 두 개의 영상의 관심점을 매칭할 수 있다.The image processor may match the points of interest of the two images according to the number of correspondence relations of the grouped poles.

상기 영상 처리부는 상기 두 개의 영상의 그룹화된 극점의 대응 관계에 따라 유사도 테이블을 생성하고 상기 유사도 테이블에 기초하여 상기 두 개의 영상의 관심점을 매칭할 수 있다.The image processor may generate a similarity table according to a corresponding relationship between the grouped poles of the two images and match the points of interest of the two images based on the similarity table.

상기 영상 처리부는 상기 그룹화된 극점의 대응 관계 정도와 임계값을 비교하여 상기 임계값보다 큰 대응 관계 정도를 가지는 그룹화된 극점을 상기 두 개의 영상의 관심점으로 매칭할 수 있다.The image processor may compare the degree of correspondence between the grouped poles with a threshold and match the grouped poles having a degree of correspondence larger than the threshold with the points of interest of the two images.

상기 L-스케일 스페이스는 스케일 스페이스를 생성할 때마다 각 옥타브 사이에 2배의 축소 및/또는 확대가 발생할 수 있다.The L-scale space may generate two times reduction and / or expansion between each octave each time a scale space is generated.

상기 관심점 함수는 상기 정규화 로컬 패치에 대한 강도를 계산하며 Harris 코너 검출기(Harris corner detector), KLT(Kanade-Lucas-Tomasi) 검출기, GST(generalized symmetric transform) 중 어느 하나일 수 있다.The function of interest calculates the intensity for the normalized local patch and may be any one of a Harris corner detector, a Kanade-Lucas-Tomasi (KLT) detector, and a generalized symmetric transform (GST).

상기 L-스케일 스페이스의 로컬 패치를 스케일과 비례하여 정규화 처리를 수행함으로써 P×P로 정규화된 정규화 로컬 패치를 생성할 수 있다.By normalizing the local patch of the L-scale space in proportion to the scale, a normalized local patch normalized to P × P can be generated.

본 발명의 다른 실시예에 따른 두 개의 영상으로부터 관심점을 매칭하는 영상 처리 방법은, 상기 두 개의 영상에 대하여 스케일을 확대 또는 축소한 L-스케일 스페이스를 각각 생성하는 단계, 상기 L-스케일 스페이스의 로컬 패치에 대하여 미리 정해진 배율을 사용하여 리샘플링하여 정규화 로컬 패치를 생성하는 단계, 그리고 상기 정규화 로컬 패치에 대하여 관심점 함수를 적용하여 M-스케일 스페이스를 생성함으로써 관심점을 추출하는 단계를 포함한다.According to another aspect of the present invention, there is provided an image processing method for matching a point of interest from two images, the method comprising: generating an L-scale space in which the scale is enlarged or reduced for each of the two images; Resampling using a predetermined magnification for the local patch to generate a normalized local patch, and extracting the interest by generating an M-scale space by applying a point of interest function to the normalized local patch.

상기 관심점 추출 단계는, 상기 M-스케일 스페이스로부터 복수의 극점을 추출하여 상기 추출된 복수의 극점을 그룹화하는 단계, 그리고 상기 그룹화된 극점을 대비하여 상기 두 개의 영상의 관심점을 추출하는 단계를 포함할 수 있다.The extracting the points of interest may include extracting a plurality of poles from the M-scale space, grouping the extracted plurality of poles, and extracting the points of interest of the two images in preparation for the grouped poles. It may include.

상기 극점 그룹화 단계는 상기 추출된 극점을 상기 M-스케일 스페이스의 상위 레벨에서 하위 레벨로 순회하면서 상기 추출된 극점을 그룹화하고, 상기 그룹화된 극점에 대하여 대표 위치, 대표 강도 및 스케일 지속성을 구하여 상기 그룹화된 극점의 대푯값으로 설정하는 단계를 포함할 수 있다.The pole grouping step may be performed by grouping the extracted poles while traversing the extracted poles from a higher level to a lower level of the M-scale space, and obtaining representative positions, representative strengths, and scale persistence of the grouped poles. And setting the representative value of the pole.

상기 극점 주변의 로컬 패치의 특징 벡터인 로컬 디스크립터를 추출하는 단계, 그리고 상기 두 개의 영상의 로컬 디스크립터 사이의 거리를 측정하여 측정된 거리에 따라 상기 두 개의 영상의 그룹화된 극점의 대응 관계를 생성하는 단계를 더 포함할 수 있다.Extracting a local descriptor that is a feature vector of a local patch around the pole, and measuring a distance between local descriptors of the two images to generate a corresponding relationship between the grouped poles of the two images according to the measured distance It may further comprise a step.

상기 그룹화된 극점의 대응 관계 개수에 따라 상기 두 개의 영상의 관심점을 매칭하는 단계를 더 포함할 수 있다.The method may further include matching the points of interest of the two images according to the number of correspondence relations of the grouped poles.

상기 두 개의 영상의 그룹화된 극점의 대응 관계에 따라 유사도 테이블을 생성하는 단계, 그리고 상기 유사도 테이블에 기초하여 상기 두 개의 영상의 관심점을 매칭하는 단계를 더 포함할 수 있다.The method may further include generating a similarity table according to the correspondence relation of the grouped poles of the two images, and matching the points of interest of the two images based on the similarity table.

상기 그룹화된 극점의 대응 관계 정도와 임계값을 비교하여 상기 임계값보다 큰 대응 관계 정도를 가지는 그룹화된 극점을 상기 두 개의 영상의 관심점으로 매칭하는 단계를 더 포함할 수 있다.The method may further include matching the grouped poles having a degree of correspondence greater than the threshold with the points of interest of the two images by comparing the degree of correspondence of the grouped poles with a threshold.

본 발명의 다른 실시예에 따른 컴퓨터로 읽을 수 있는 매체는 상기한 방법 중 어느 하나를 실행시키기 위한 프로그램을 기록한다.A computer-readable medium according to another embodiment of the present invention records a program for executing any one of the above methods.

본 발명에 의하면 유클리드 모델 기반 관심점 추출 기법을 큰 수정 없이 닮은 꼴 모델 기반 관심점 추출 기법으로 확장함으로써 더 다양한 타입의 관심점을 용이하게 추출할 수 있다.According to the present invention, more various types of points of interest can be easily extracted by extending the Euclidean model based point of interest extraction technique to a similar model based point of interest extraction technique without significant modification.

또한 다양하게 추출된 관심점은 관심점 영상 매칭의 활용 범위를 더 넓게 할 수 있으며, 유클리드 변환 모델 확장 시의 근본 문제인 스케일 인자의 정규화 처리를 극복할 수 있다.In addition, the various extracted points of interest may widen the range of application of the point of interest image matching, and may overcome the normalization of scale factors, which is a fundamental problem when the Euclidean transform model is extended.

도 1은 본 발명의 한 실시예에 따른 영상 처리 장치를 설명하기 위한 블록도이다.
도 2는 본 발명의 실시예에 따른 영상 처리 장치가 수행하는 영상 처리 과정을 도시한 개략도이다.
도 3은 본 발명의 실시예에 따른 피라미드 L-스케일 스페이스의 개략도이다.
도 4는 본 발명의 실시예에 따른 M-스케일 스페이스를 생성하는 과정을 도시한 개략도이다.
도 5는 본 발명의 실시예에 따른 극점 그룹화 과정을 도시한 개략도이다.
도 6은 본 발명의 실시예에 따른 EPR 모형도이다.
도 7은 본 발명의 실시예에 따른 EPR 보우팅 과정을 도시한 개략도이다.1 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment of the present invention.
2 is a schematic diagram illustrating an image processing process performed by an image processing apparatus according to an exemplary embodiment of the present invention.
3 is a schematic diagram of a pyramid L-scale space according to an embodiment of the invention.
4 is a schematic diagram illustrating a process of generating an M-scale space according to an embodiment of the present invention.
5 is a schematic diagram illustrating a pole grouping process according to an embodiment of the present invention.
6 is an EPR model diagram according to an embodiment of the present invention.
7 is a schematic diagram illustrating an EPR bowling process according to an embodiment of the present invention.

이하의 상세한 설명에서 본원의 일부를 구성하는 첨부의 도면이 참조된다. 문맥에서 다르게 지시하지 않는 한, 도면에서 유사한 부호는 일반적으로 유사한 구성 요소를 나타낸다. 상세한 설명, 도면 및 청구범위에 기재된 예시적인 실시예들은 한정하고자 하는 의도가 아니다. 여기에서 제시된 사상 또는 범위를 벗어나지 않는 범위 내에서, 다른 실시예들이 이용될 수 있고 다른 변경들이 이루어질 수 있을 것이다. 본원의 구성 요소들은, 여기에서 일반적으로 설명되고 도면에서 도시된 바와 같이, 상이한 구성들의 폭넓은 다양성 내에서의 상이한 구성들로 배열되고, 치환되고, 결합되고, 설계될 수 있으며, 이 모두가 분명히 고려되었고 본원의 일부를 이루는 것임이 용이하게 이해될 수 있을 것이다.In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. Unless otherwise indicated in the context, like reference numerals in the drawings generally refer to like elements. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized and other changes may be made without departing from the spirit or scope presented herein. The components herein can be arranged, substituted, combined, and designed in different configurations within a wide variety of different configurations, as generally described herein and shown in the figures, all of which are clearly It will be readily understood that it has been considered and forms part of this application.

그러면 도면을 참고하여 본 발명의 실시예에 따른 영상 처리 장치에 대하여 상세하게 설명한다.Next, an image processing apparatus according to an exemplary embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 한 실시예에 따른 영상 처리 장치를 설명하기 위한 블록도이다.1 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment of the present invention.

도 1을 참고하면 본 발명의 실시예에 따른 영상 처리 장치(100)는 입력부(110), 데이터베이스(120), 영상 처리부(130), 그리고 출력부(140)를 포함한다.Referring to FIG. 1, an image processing apparatus 100 according to an exemplary embodiment of the present invention includes an input unit 110, a database 120, an image processor 130, and an output unit 140.

입력부(110)는 외부 장치(도시하지 않음)로부터 영상 데이터를 입력받아 영상 처리 장치(100) 내로 데이터를 전송하고 영상 처리 장치(100)에서 영상 데이터를 처리할 수 있도록 적절한 처리를 수행한다. 또한 입력부(110)는 사용자 입력에 따라 명령을 받아 해당 명령이 영상 처리 장치(100)에 전달되어 수행될 수 있도록 한다.The input unit 110 receives image data from an external device (not shown), transmits the data into the image processing apparatus 100, and performs appropriate processing so that the image processing apparatus 100 may process the image data. In addition, the input unit 110 receives a command according to a user input so that the command can be transmitted to the image processing apparatus 100 and executed.

데이터베이스(120)는 영상 처리 장치(100)에서 수행되는 영상 처리와 관련된 영상 데이터를 기억한다. 이 영상 데이터는 입력부(100)로부터 전송되거나 영상 처리 장치(100)에서 처리된 자료이다.The database 120 stores image data related to image processing performed by the image processing apparatus 100. The image data is data transmitted from the input unit 100 or processed by the image processing apparatus 100.

영상 처리부(130)는 입력부(110) 또는 데이터베이스(120)로부터 영상 데이터를 읽어 들여 영상으로부터 관심점(또는 특징점이라고도 한다)을 추출한다. 또한 영상 처리부(130)는 추출된 관심점을 이용하여 영상 검색, 영상 정합, 영상 파노라마를 수행할 수 있다.The image processor 130 reads image data from the input unit 110 or the database 120 and extracts a point of interest (or a feature point) from the image. In addition, the image processor 130 may perform image search, image registration, and image panorama by using the extracted points of interest.

출력부(140)는 영상 처리 장치(100)에서 처리된 결과물을 외부로 내보낸다.The output unit 140 exports the result processed by the image processing apparatus 100 to the outside.

그러면 영상 처리부(130)에서 수행되는 영상 관심점 추출 방법에 대하여 좀 더 상세하게 설명한다.Next, a method of extracting an image point of interest performed by the image processor 130 will be described in more detail.

도 2는 본 발명의 실시예에 따른 영상 처리 장치가 수행하는 영상 처리 과정을 도시한 개략도이다.2 is a schematic diagram illustrating an image processing process performed by an image processing apparatus according to an exemplary embodiment of the present invention.

도 2에 도시한 바와 같이, 영상 처리부(130)는 두 개의 영상(영상 A, B)에 대하여 특징 추출 과정과 특징 매칭 과정을 거쳐 영상 처리를 한다. 특징 추출 과정은 각 영상으로부터 로컬 특징을 추출하는 과정이고, 특징 매칭 과정은 로컬 특징들을 기반으로 두 영상의 대응 관계를 찾는 과정이다. 로컬 특징은 관심점과 로컬 디스크립터 등을 포함한다. 그리고 계층적 처리 구조 관점에서 볼 때, 관심점 매칭 처리 결과인 대응 정보(관심점의 위치, 로컬 디스크립터 등)를 기반하여, 이후에 오는 처리, 예를 들면, 모델 인자 추정, 영상 검색, 영상 정합, 3차원 복원 등이 진행된다.As shown in FIG. 2, the image processor 130 performs image processing on two images (images A and B) through a feature extraction process and a feature matching process. The feature extraction process is a process of extracting local features from each image, and the feature matching process is a process of finding a corresponding relationship between two images based on local features. Local features include points of interest, local descriptors, and so forth. From a hierarchical processing structure point of view, based on the correspondence information (location of interest, local descriptor, etc.) that is the result of the interest matching process, subsequent processing, for example, model factor estimation, image retrieval, image registration, etc. , 3D reconstruction, and the like.

관심점은 일반적으로 영상에서 관심이 될 만한 점을 의미하고, 사물의 모서리 또는 코너와 같이 형상을 특정시킬 수 있는 점 등이 관심점이 될 수 있으며, 또한 관심점은 닫힌 영역(closed region)의 윤곽선에서 추출된 라인 또는 점이 될 수 있다. 예를 들어, 코너점, 대칭점, 변곡점 등이 될 수 있으며, 원형이나 사각형의 윤곽선으로부터 라인이나 점 등이 관심점으로 사용될 수 있다.Points of interest generally refer to points of interest in the image, and points of interest can be specified, such as edges or corners of objects, and points of interest can also be outlines of closed regions. It can be a line or a point extracted from. For example, it may be a corner point, a symmetry point, an inflection point, or the like, and a line or a point may be used as a point of interest from a circular or rectangular outline.

특징 추출 과정은, 2차원 영상으로부터 스케일 변화를 시뮬레이션 하는 피라미드 L-스케일 스페이스(pyramid linear scale space, L-scale space)를 생성하는 과정, L-스케일 스페이스에 대하여 관심점 함수를 적용하여 M-스케일 스페이스(measurement scale space, M-scale space)를 생성하는 과정, M-스케일 스페이스로부터 극점(또는 관심점)을 추출하는 과정, 극점을 클러스터링 하여 극점 집합 관심점(Extreme Points and Route, 이하 'EPR'이라 함)을 생성한 후 EPR 내의 극점들의 로컬 디스크립터를 추출하는 과정을 포함한다.The feature extraction process involves generating a pyramid linear scale space (L-scale space) that simulates scale changes from a two-dimensional image, and applies an interest function to the L-scale space to apply an M-scale. Process of creating space (measurement scale space, M-scale space), extracting poles (or points of interest) from the M-scale space, clustering poles and using Extreme Points and Route (EPR) And extracting the local descriptors of the poles in the EPR.

특징 매칭 과정은, 로컬 디스크립터를 기반으로 하여 모든 극점들 간의 거리를 계산하는 로컬 디스크립터 매칭 과정, EPR 간의 대응 관계 정도를 수치적으로 나타낼 수 있는 보우팅 테이블(voting table)을 구하는 보우팅(voting) 과정 및 보우팅 테이블에 임계값을 적용하여 대응 관계가 강한 EPR들을 추출하여 두 영상의 EPR 간의 연결 관계를 생성하는 과정을 포함한다. 특징 매칭은 영상 사이의 연결 관계(관심점들 간의 대응 관계로서, 두 영상 사이의 변환 행렬 등)를 생성하는 영상 매칭의 한 종류이며, 특히 추출된 로컬 디스크립터(특징 벡터)를 기반으로 영상 사이의 관계를 생성하는 것을 의미한다.The feature matching process includes a local descriptor matching process for calculating the distance between all poles based on the local descriptor, and a voting table for obtaining a voting table that can numerically indicate the degree of correspondence between EPRs. And a process of generating a connection relationship between EPRs of two images by extracting EPRs having a strong correspondence by applying a threshold to the bowing table. Feature matching is a type of image matching that creates a linkage relationship between images (correspondence between points of interest, such as a transformation matrix between two images), and especially between images based on extracted local descriptors (feature vectors). It means creating a relationship.

그러면 각 과정에 대하여 도면을 참고하여 좀 더 상세하게 설명한다.Each process will be described in more detail with reference to the accompanying drawings.

도 3은 본 발명의 실시예에 따른 피라미드 L-스케일 스페이스의 개략도이다.3 is a schematic diagram of a pyramid L-scale space according to an embodiment of the invention.

스케일 스페이스 방법은 컴퓨터 비전, 영상 처리, 신호 처리 등에서 개발된 다중 스케일 신호 표현을 위한 프레임워크로서 2차원 영상을 대상으로 영상을 흐리게 만들어서 영상을 다른 스케일(또는 해상도)로 표현할 수 있도록 한다. 피라미드 스케일 스페이스는 2차원 영상을 스케일 스페이스로 생성할 때, 성능을 높이기 위하여 2의 배수일 때마다 영상을 2배 축소한다. 즉, x1.0, x1.3, x1.5, x1.7, x1.9, x2(이때 영상을 다운 샘플링), 다시 다운 샘플링을 대상으로 x1.0, x1.3, x1.5, x1.7, x1.9 배 한다. 이러한 과정을 거치다 보면, 마치 도 3과 같이 피라미드 모양으로 스케일 스페이스가 생성된다.The scale space method is a framework for expressing multi-scale signals developed in computer vision, image processing, signal processing, etc., so that images can be expressed at different scales (or resolutions) by blurring the image to a two-dimensional image. Pyramid scale space reduces the image by 2 times every 2 times to increase the performance when generating the 2D image in the scale space. That is, x1.0, x1.3, x1.5, x1.7, x1.9, x2 (downsampling the image at this time), and then again 1.0xx, x1.3, x1.5, x1 .7, x1.9 times Through this process, a scale space is created in a pyramid shape as shown in FIG. 3.

본 발명의 실시예에 따른 피라미드 L-스케일 스페이스는 영상으로부터 해상도 변화에 강인한 관심점을 추출하기 위하여 공간 정보인 2차원 영상으로부터 스케일 축을 확장시킨 3차원 영상인 스케일 스페이스를 이용한다. 컨볼루션(convolution)의 커널 함수를 사용하여 영상에 저역통과 필터(lowpass filter)를 점차적으로 강하게 하면서, 하위 스케일 레벨(fine scale level)에서 상위 스케일 레벨(coarse scale level)의 스케일 스페이스를 생성함으로써 L-스케일 스페이스를 처리한다. 이 저역통과 필터는 스케일 변화를 시뮬레이션한 효과를 낸다.The pyramid L-scale space according to an embodiment of the present invention uses a scale space that is a three-dimensional image in which a scale axis is expanded from a two-dimensional image, which is spatial information, to extract a point of interest that is robust to a change in resolution. By using the kernel function of convolution, the lowpass filter is gradually stronger in the image, while generating a scale space of coarse scale level at the fine scale level. Handle the scale space. This lowpass filter simulates a scale change.

L-스케일 스페이스를 생성하는 구체적인 방법은 다음과 같다. 초기 가우시안

부터

배로 점차적으로 증가시키면서, 다음 [수학식 1]과 같은 컨볼루션 연산을 적용하여 영상을 점차적으로 흐리게 만든다.

는 보통 0.5~1.2 범위에서 선택한다. 또한 스케일 스페이스 방법의 계산 속도를 높이기 위하여, 2배가 되는 시점에 디지털적으로 확대/축소를 행한다. 이렇게 디지털적으로 처리를 하여 피라미드 L-스케일 스페이스를 생성한다.The specific method for generating the L-scale space is as follows. Early gaussian

from

The image is gradually blurred by applying a convolution operation as shown in [Equation 1] while gradually increasing the fold.

Is usually selected in the range 0.5 to 1.2. In addition, in order to increase the calculation speed of the scale space method, digital enlargement / reduction is performed at a time point that doubles. This digital processing produces the pyramid L-scale space.

[수학식 1][Equation 1]

여기서 (x, y)는 영상의 2차원 좌표이고, I는 해당 좌표의 이미지 밝기이며, G는 2차원 스페셜 가우시안 커널이다.Where (x, y) is the two-dimensional coordinates of the image, I is the image brightness at that coordinate, and G is a two-dimensional special Gaussian kernel.

도 3에 도시한 바와 같이 한 옥타브는 시뮬레이션을 위한 전체 스케일에서 2배수 단위로 분리되며, 피라미드 구조는 2배수(또는 1/2배수)가 될 때마다 해상도를 2배로 축소(또는 확대)한다. 여기서 옥타브는 피라미드 스케일 스페이스를 구축할 때 영상 크기가 같은 레벨의 영상들의 집합을 말한다.As shown in FIG. 3, one octave is divided into two-fold units at the full scale for the simulation, and the pyramid structure reduces (or enlarges) the resolution by two times each time it becomes two-fold (or half-fold). Here, an octave refers to a set of images having the same level of image when constructing a pyramid scale space.

그러면 도 4를 참고하여 M-스케일 스페이스를 생성하기 위하여 관심점 함수를 적용하는 과정에 대하여 설명한다. 관심점 함수는 로컬 패치를 대상으로 강도를 측정하는 함수로서 보통 매우 큰 값일수록 관심점일 확률이 높다.Next, a process of applying an interest point function to generate an M-scale space will be described with reference to FIG. 4. A point of interest function is a function that measures the intensity of a local patch. Usually, a very large value is more likely to be a point of interest.

도 4는 본 발명의 실시예에 따른 M-스케일 스페이스를 생성하는 과정을 도시한 개략도이다.4 is a schematic diagram illustrating a process of generating an M-scale space according to an embodiment of the present invention.

본 발명의 실시예에 따른 M-스케일 스페이스는 L-스케일 스페이스에 있는 점들의 로컬 패치를 관심점 함수로 계산하여 구한 강도 스케일 스페이스이다. 즉 M-스케일 스페이스는 영상의 L-스케일 스페이스에서 코너, 에지, 블럽(blob)과 같은 가능성을 갖는 각 화소들에 대하여 관심점 함수(μ: R×R→R)를 적용하여 계산된 공간이다. M-스케일 스페이스를 생성하기 위하여 관심점 함수(μ)는 다음과 같은 닮은꼴 관심점 함수 공리(axioms of similarity interest point function)를 충족한다.The M-scale space according to the embodiment of the present invention is an intensity scale space obtained by calculating a local patch of points in the L-scale space as a point of interest function. That is, the M-scale space is a space calculated by applying a point of interest function (μ: R → R → R) to each pixel having the possibility of corner, edge, blob in the L-scale space of the image. . To generate the M-scale space, the point of interest function [mu] satisfies the following axioms of similarity interest point function.

1) 이동 불변:

1) Transfer Constant:

2) 회전 불변:

2) Rotation Invariant:

3) 크기 불변:

3) size constant:

여기서 기호

은 두 집합 A와 B는 육안으로 보았을 때 동일한 유사성을 가진다는 것을 의미하고, card()는 집합의 크기(size)이다.Here the sign

Means that both sets A and B have the same similarity with the naked eye, and card () is the size of the set.

4) Ordered group(순서 군):

또는

iff B가 A보다 관심점에 더 가깝다.4) Ordered group:

or

iff B is closer to the point of interest than A.

이산 공간에서 많은 휴리스틱(heuristic)한 유클리드 관심점 함수들은 서로 다른 스케일일 경우, 스케일이 정수배가 아닐 경우 데시메이션(decimation)을 하지 않기 때문에, 올바른 스케일의 맞는 크기로 관심점을 처리하기 위해서는 영상의 작은 일부 영역(일반적으로 사각형)을 의미하는 로컬 패치(local patch)의 크기를 증가해야 한다. 이로 인하여, 로컬 패치의 크기는 변화에 따른 정규화 요소가 필요하다. 일반적으로 유클리드 변환 모델이 수학적으로 잘 설계된 관심점 함수들은 정규화 요소가 쉽게 개발가능 하다. 하지만, GST(generalized symmetric transform), FRST(fast radial symmetric transform)과 같은 휴리스틱성 관심점 함수들은 정규화 요소를 찾기가 어렵다. 즉, 닮은꼴 관심점 함수 공리의 3번이 위배되기 때문에 대등한 비교가 불가능하다(스케일 정규화 문제). 하지만 기존 유클리드 관심점 함수들은 일반적으로 동일 스케일간의 비교는 가능하게 설계된다. 이 설계를 근거로 스케일 정규화 문제 해결 방법의 기본 개념은 다음과 같다. 이웃한 점들은 동일한 화소 수로 관심점 함수를 적용하므로, 비록 서로 다른 스케일의 해상도라 하더라도 강도를 측정하기 전에 다른 화소의 개수를 동일 개수의 화소로 리샘플링(re-sampling)하여, 유클리드 관심점 함수를 적용한다면, 스케일 정규화된 문제를 회피할 수 있다. 그러므로 사전에 로컬 패치(local patch)를 리샘플링하여 해상도가 정규화 처리된 로컬 패치(normalized local patch)를 취득 후 강도 관심점 함수(μ)를 적용한다.Many heuristic Euclidean points of interest functions in discrete space do not decimate if the scales are not integer multiples of different scales. You need to increase the size of a local patch, which means some small area (usually a square). For this reason, the size of the local patch needs a normalization factor according to the change. In general, points of interest functions that are mathematically well-designed by the Euclidean transform model can be easily developed by normalization. However, heuristic interest functions, such as generalized symmetric transform (GST) and fast radial symmetric transform (FRST), are difficult to find normalization elements. In other words, comparable comparisons are not possible because three times the similarity interest function axiom is violated (scale normalization problem). However, existing Euclidean point of interest functions are generally designed to allow comparisons between the same scale. Based on this design, the basic idea of how to solve the scale normalization problem is as follows. Neighboring points apply the point-of-interest function with the same number of pixels, so even if the resolutions are of different scales, re-sampling the number of different pixels to the same number of pixels before measuring the intensity, thus reducing the Euclidean point of interest function. If applied, scale normalized problems can be avoided. Therefore, the local patch is resampled in advance to obtain a normalized local patch whose resolution is normalized, and then the intensity interest function μ is applied.

도 4를 참고하면, 관심점 함수 적용 과정은 L-스케일 스페이스에 강도를 계산하여 스케일에 따른 관심점 강도를 보유한 M-스케일 스페이스를 생성한다. 여기서 관심점 함수는 입력값은 로컬 영역, 출력값은 강도값을 계산하는 과정으로 종래의 유클리드 모델 기반 관심점 함수들이 될 수 있다. 본 발명에서는 스케일에 강인한 관심점 함수를 생성하기 위하여, 사전에 로컬 영역을 스케일과 비례하여 정규화 처리한다. 이 스케일 비례에 맞게 크기가 P×P로 정규화된 로컬 패치를 정규화 로컬 패치라 한다.Referring to FIG. 4, the process of applying a point of interest function calculates the strength in the L-scale space to generate an M-scale space having the point of interest intensity according to the scale. Here, the point of interest function is a process of calculating an input value of a local area and an output value of an intensity value, which may be the conventional Euclidean model based point of interest functions. In the present invention, the local region is normalized in proportion to the scale in advance in order to generate a point of interest function that is robust to the scale. The local patch normalized to P × P according to this scale proportionality is called a normalized local patch.

도 4에서 왼쪽에 있는 L-스케일 스페이스 상의 두 점은 모두 동일한 (x, y, t₀)로부터 파생된 점들이고 두 점을 중심으로 빗금 친 영역은 로컬 패치를 나타낸다. 그리고 중간에 위치한 두 정규화 로컬 패치는 1/s₁와 1/s₂ 배율을 사용하여 P×P 크기로 각각 리샘플링한다. 도 4의 오른쪽에 있는 M-스케일 스페이스 상의 두 점은 정규화 로컬 패치에 관심점 함수 μ을 사용하여 강도 측정된 위치이다. 다음 [수학식 2]는 L-스케일 스페이스에 P×P 크기의 정규화 로컬 패치 함수(normalization local patch function, ω: R×R×R→R×R)와 관심점 함수(μ)를 적용하여 M-스케일 스페이스 생성 과정을 수식화한 것이다.In FIG. 4, the two points on the L-scale space on the left are points derived from the same (x, y, t ₀ ) and the area shaded around the two points represents a local patch. The two normalized local patches in the middle are resampled to P × P sizes using 1 / s ₁ and 1 / s ₂ magnification, respectively. The two points on the M-scale space on the right side of FIG. 4 are intensity measured locations using the point of interest function μ for normalized local patches. Equation 2 applies M to the L-scale space by applying a normalization local patch function (ω: R × R × R → R × R) and a point of interest function (μ) to the P × P size. It is a modification of the scale space generation process.

[수학식 2]&Quot; (2) "

여기서 P는 정규화 로컬 패치 크기로서 상수이고, t는 스케일 축으로서 3차원 가상 좌표를 만들기 위한 보조 변수이다. 관심점 함수(μ)는 Harris 코너 검출기(Harris corner detector), KLT(Kanade-Lucas-Tomasi) 검출기, GST(generalized symmetric transform) 등이 될 수 있다.Where P is a constant as the normalized local patch size, and t is an auxiliary variable for creating three-dimensional virtual coordinates as the scale axis. The point of interest function μ may be a Harris corner detector, a Kanade-Lucas-Tomasi (KLT) detector, a generalized symmetric transform (GST), or the like.

그러면 도 5를 참고하여 M-스케일 스페이스로부터 앞서 설명한 극점 집합 관심점(EPR)을 추출하는 방법에 대하여 설명한다. 도 5는 본 발명의 실시예에 따른 극점 그룹화 과정을 도시한 개략도이다. 극점은 영상에서 주변보다 강도가 강한 점으로서 일반적으로 왜곡에 강인한 점이이고, EPR은 이러한 극점들을 모아 놓은 집합체이다. 일반적으로 극점을 추출하기 위해서는 M-스케일 스페이스에 32/8 이웃점 극점을 추출하고 각각의 극점을 하나의 관심점으로 정의하지만, 본 발명에서는 하나의 점으로부터 시뮬레이션된 극점들을 하나의 관심점으로 그룹화하여 EPR을 추출한다.Next, a method of extracting the aforementioned pole set interest point (EPR) from the M-scale space will be described with reference to FIG. 5. 5 is a schematic diagram illustrating a pole grouping process according to an embodiment of the present invention. The pole is a point that is stronger than the surroundings in the image and is generally robust to distortion, and EPR is a collection of these poles. Generally, in order to extract poles, 32/8 neighbor poles are extracted in M-scale space and each pole is defined as one point of interest, but in the present invention, the poles simulated from one point are grouped into one point of interest. Extract the EPR.

일반적으로 극점을 추출하는 방식으로 28-이웃점 극점 추출 기법과 8-이웃점 극점 추출 기법이 있다. 본 발명에서는 8-이웃점 극점 추출 기법을 적용한다. 도 5에 도시한 바와 같이, 각 스케일 레벨에서 추출된 극점들은 상위 레벨(coarest scale level, 최소 스케일 레벨)에서 하위 레벨(finest scale level, 최대 스케일 레벨)로 순회하면서 극점들을 그룹화 한다. 피라미드 구조이므로 상위에서 하위 레벨로 그룹화하는 과정에서 상이한 옥타브 간에는 좌표를 2배하여 그룹화에 끊어지지 않도록 한다. 이와 같이 그룹화된 극점들이 EPR(Extreme Points and Routs)이다. EPR의 대푯값은 대표 위치, 대표 강도(Rep)와 스케일 지속성(nTrack)으로 이루어진다. 대표 위치는 최하위 레벨에 가장 가까운 극점을 대표 위치라 한다. 대표 강도는 그룹화된 극점들 중에서 최대, 최소, 중간, 평균이 될 수 있다. 스케일 지속성은 극점의 개수, 극점들 중에 최대 스케일-최소 스케일로 정의 할 수 있다.Generally, there are 28-neighbor pole extraction methods and 8-neighbor pole extraction techniques. The present invention applies an 8-neighbor pole extraction technique. As shown in FIG. 5, the poles extracted at each scale level group the poles while traversing from the high level (coarest scale level) to the low level (finest scale level). Because of the pyramid structure, the coordinates are doubled between the different octaves in the grouping from the upper level to the lower level so that the grouping is not broken. The poles thus grouped are Extreme Points and Routs. The representative value of the EPR consists of a representative position, representative intensity (Rep) and scale persistence (nTrack). The representative position is called the representative position of the pole closest to the lowest level. Representative intensities can be the maximum, minimum, medium, or average among the grouped poles. Scale persistence can be defined as the number of poles, the maximum scale-minimum scale among the poles.

먼저 M-스케일 스페이스로부터 다음과 같이 관심점의 강도를 계산한다. 본 실시예에서는 한 예로서, 해리스 코너 검출기를 관심점 함수로 사용하여 설명한다. 정규화 로컬 패치(W)에 대하여 x축 편미분 영상(dx)과 y축 편미분 영상(dy)을 생성하고, 이 두 개의 편미분 영상으로 다음 [수학식 3]을 이용하여 정규화 로컬 패치에 대한 해리스 코너 강도를 계산한다.First, the intensity of the point of interest is calculated from the M-scale space as follows. As an example, in the present embodiment, a Harris corner detector is used as a point of interest function. An x-axis partial differential image (dx) and a y-axis partial differential image (dy) are generated for the normalized local patch (W), and the Harris corner intensities for the normalized local patch using the following [Equation 3] Calculate

[수학식 3]&Quot; (3) "

여기서 매트릭스 M_c은

이고, Det()는 행렬식(determinant), Tr()은 행렬의 대각 원소들의 합,

는 사용자 입력 변수로 일반적으로 0.04 ~ 0.15 사이 값으로 한다.Where matrix M _c is

Det () is a determinant, Tr () is the sum of the diagonal elements of the matrix,

Is a user input variable, and generally is a value between 0.04 and 0.15.

다중 해상도 공간 스케일 스페이스에서 각각의 해상도에서 지속적으로 관심점이 관찰된다면, 이 관심점은 해상도가 변하더라도 강인한 점이 될 것이다. 즉 스케일 스페이스 상에서 관심점의 지속성이 높아지면, 서로 다른 해상도의 두 영상에서 교차성이 높아져 만약 유사한 객체가 존재 할 경우 한 영상에서 검출한 객체의 관심점들이 다른 영상에서도 동일하게 관심점으로 다시 발견될 가능성이 클 것이다. 결과적으로 관심점의 지속성이 높아지면 다중 해상도 상의 관심점의 재현성(repeatability)이 높아질 것이다.If a point of interest is continuously observed at each resolution in a multi-resolution spatial scale space, this point of interest will be robust even if the resolution changes. In other words, if the persistence of the point of interest increases in the scale space, the crossover is increased in the two images with different resolutions. If similar objects exist, the points of interest of the objects detected in one image are found again as points of interest in the other image. It is likely to be. As a result, the higher the persistence of the point of interest, the higher the repeatability of the point of interest on multiple resolutions.

도 5를 참고하면 EPR 추출을 위하여 지역 극점 검출, 인접 스케일 간에 극점 궤도 추적, 동일 궤도 상의 극점의 군집화 과정을 진행한다. 그런 후 EPR의 대표 위치, 대표 강도(Rep)와 스케일 지속성(nTrack)을 계산한다. 대표 위치는 보통 최상위 스케일로 처리될수록 위치가 흔들리므로, 최하위 스케일에 가까운 극점의 좌표를 정한다. 그리고 대표 강도는 추적한 극점들의 M-스케일 스페이스 강도들 중에 최소값, 최대값 또는 평균값 등으로 처리한다. 마지막으로 스케일 지속성은 극점들 중에서 최소와 최대의 스케일 범위로 정의한다.Referring to FIG. 5, local pole detection, pole trajectory tracking between adjacent scales, and clustering of poles on the same track are performed for EPR extraction. The representative position, representative strength (Rep) and scale persistence (nTrack) of the EPR are then calculated. Since the representative position is usually shaken as the highest scale is processed, the coordinates of the pole close to the lowest scale are determined. The representative intensity is treated as the minimum, maximum, or average value among the M-scale space intensities of the tracked poles. Finally, scale persistence is defined as the minimum and maximum scale range of the poles.

극점 검출 방법은 각 t레벨의 해상도에서 8개의 이웃 점으로 극점을 테스트한다. 이렇게 검출된 극점

은 동시의 관심점 위치

와 관심점 로컬 패치의 크기

를 결정시켜 준다. 극점들은 편의상

로 재표기한다. 여기서 아래 첨자 o는 스케일 스페이스 상의 옥타브 고유 번호, s는 한 옥타브 내의 스케일 고유 번호이고, 위 첨자 i는 (o, s) 내에 존재하는 극점들의 순서 번호이다.The pole detection method tests the pole with eight neighboring points at each t level of resolution. The detected pole

Points of interest at the same time

And points of interest local patch size

To determine. The poles are for convenience

Remark as Where the subscript o is the octave unique number on the scale space, s is the scale unique number in one octave, and the superscript i is the sequence number of the poles present in (o, s).

도 6은 본 발명의 실시예에 따른 EPR 모형도이다. 도 5 및 도 6을 참고하면, 각 해상도 별로 추출된 극점들을 그룹화 하기 위하여 스케일 스페이스 상에서 최상위 레벨(coarsest scale level)에서 최하위 레벨(finest scale level)까지 하향식 방식으로 인접한 극점들 사이에 클러스터링(clustering)을 진행한다. M-스케일 스페이스 기반 극점 클러스터링(Extreme Points Clustering)은 다음과 같은 방식으로 수행한다.6 is an EPR model diagram according to an embodiment of the present invention. 5 and 6, clustering between adjacent poles in a top-down manner from a coarsest scale level to a bottom scale level on a scale space in order to group the poles extracted for each resolution. Proceed. M-scale space-based Extreme Points Clustering is performed in the following manner.

(1) 극점 클러스터링(1) pole clustering

for each oi=[C-1:0], (여기서 oi는 옥타브 번호이고, C는 옥타브의 개수임)for each oi = [C-1: 0], where oi is an octave number and C is the number of octaves

i) 한 옥타브 내에서 상이한 레벨의 극점들끼리 클러스터링i) Clustering poles of different levels within an octave

for each si=[S-1:1], (여기서 si는 스케일 번호이고, S는 스케일의 개수임)for each si = [S-1: 1], where si is the scale number and S is the number of scales

① (oi, si) 레벨의 비 할당된 극점들 고유 그룹 번호 할당① Assigns unique group number to unassigned poles at (oi, si) level

② (oi, si)와 (oi, si-1) 레벨의 극점들의 연결: 모든 (oi, si) 레벨의 극점들의

에 대하여 (oi, si-1) 레벨의 극점들

과

의 거리가 k이하인 점들을

의 그룹 번호로 할당.Connection of poles of (oi, si) and (oi, si-1) levels: of poles of all (oi, si) levels

Poles of (oi, si-1) level with respect to

and

Points at distances k or less

Assigned to the group number.

ii) 다른 레벨의 옥타브의 극점끼리 클러스터링ii) clustering poles of different levels of octave

① (oi, 0)번째 레벨의 비 할당된 극점들에게 고유 그룹 번호 할당① Assign unique group number to unassigned poles at level (oi, 0)

② (oi,0)와 (oi-1, S-1)레벨의 극점 연결: 모든 (oi, 0)레벨의 극점들의

에 대하여 (oi-1, S-1) 레벨의 극점들

과

의 거리가 2×k이하인 점들을

의 그룹 번호로 할당.② Pole connection of (oi, 0) and (oi-1, S-1) levels: of poles of all (oi, 0) levels

Poles at (oi-1, S-1) level

and

Points whose distance is less than 2 × k

Assigned to the group number.

(2) 극점 궤도(EPR) 정보 생성(2) generation of pole orbit (EPR) information

gid번째 동일한 그룹 번호가 부여된 극점들을 하나의 EPR인

집합으로 생성하고, 집합

내에서 대표 값

와 추적 회수

를 계산한다. 도면 상의 원 모양은 극점들의 윈도우 크기이다. 원 모양이 연속적으로 모여 있는 것이 EPR 모형도를 나타낸 것이다. 강인한 EPR을 추출하기 위하여 EPR의 대표 강도(Rep)와 스케일 지속성(nTrack)의 임계값을 설정할 수 있다. 대표 강도(Rep)의 임계값이 클수록 EPR(관심점)일 가능성이 클 것이고, 스케일 지속성(nTrack)의 임계값 클수록 스케일 지속성이 큰 EPR일 것이다.The poles with gid-th identical group number are one EPR.

Create as a set,

Representative value within

And tracking recall

. The circle on the figure is the window size of the poles. The continuous cluster of circles shows the EPR model. In order to extract robust EPR, threshold values of representative strength (Rep) and scale persistence (nTrack) of the EPR may be set. The larger the threshold of the representative intensity Rep, the greater the likelihood of an EPR (point of interest), and the larger the threshold of scale persistence (nTrack), the greater the EPR.

그러면 로컬 디스크립터 추출 과정에 대하여 상세하게 설명한다.The following describes the local descriptor extraction process in detail.

로컬 디스크립터는 로컬 패치의 특징 벡터로서, 극점으로부터 영상의 주변 정보를 이용하여 관심점의 특징을 나타낼 수 있으며, 보통 벡터의 크기는 64차원, 128차원, 256차원 등이 될 수 있다. 로컬 디스크립터 추출은 예를 들면 영상으로부터 관심점을 추출할 수 있는 SIFT(scale invariant feature transform)를 이용할 수 있다. 먼저 SIFT 로컬 디스크립터 추출을 위하여 로컬 디스크립터 추출용 정규화 로컬 패치를 얻는다. 로컬 디스크립터용 정규화 로컬 패치(이하, L로 표시함)의 크기는

의 M×M이다. L을 x축과 y축으로 편미분한다. 편미분 값을 이용하여 L의 지배적인 방향을 찾기 위하여, 36방향의 그래디언트 강도 히스토그램을 생성한다. 그리고 그래디언트 강도 히스토그램을 6번 반복하면서 부드럽게 만들어 주고, 로컬 극값들을 찾고, 최상위에 큰 극값에서 0.8배 작은 극값들은 삭제한다. 여기서 만약 E개의 살아남은 극값이 존재한다면, L의 지배적인 방향은

가 된다. L과 지배적인 방향 사이의 SIFT 로컬 디스크립터 추출은 다음 과정으로 시행된다.The local descriptor is a feature vector of the local patch, and may represent a feature of interest by using peripheral information of the image from the pole, and the size of the vector may be 64, 128, or 256 dimensions. Local descriptor extraction may use, for example, a scale invariant feature transform (SIFT) capable of extracting a point of interest from an image. First we obtain a normalized local patch for local descriptor extraction for SIFT local descriptor extraction. The size of the normalized local patch (hereafter referred to as L) for the local descriptor is

Is M × M. Differentiate L into the x and y axes. To find the dominant direction of L using partial derivatives, a gradient intensity histogram in 36 directions is created. It then smoothes the gradient intensity histogram six times, finds local extremes, and deletes the extremes 0.8 times smaller from the largest extreme at the top. Here, if there are E surviving extremes, the dominant direction of L

. Extraction of the SIFT local descriptor between L and the dominant direction is done by

function LocalDesciptors = ExtractDescriptor(L)function LocalDesciptors = ExtractDescriptor (L)

for i=0:Efor i = 0: E

R = imrotate(L,

)R = imrotate (L,

)

LocalDescriptors[i]=Extract_SIFT(

R,

R, NumofSubBlock, NumofAngle)LocalDescriptors [i] = Extract_SIFT (

R,

R, NumofSubBlock, NumofAngle)

endend

이 과정은 다른 영상들과 기준을 같이 하도록 회전 속성을 제거하기 위하여 L을 회전시키고, Extract_SIFT 함수를 호출한다. 여기서 imrotate는 입력 영상 L을 중심으로

만큼 시계방향으로 회전하는 함수이고, Extract_SIFT는 매우 잘 알려진 SIFT의 로컬 디스크립터 추출 함수이며, NumOfSubBlock은 R을 x축과 y축으로 각각 나눌 서브 블록의 개수이고, NumOfAngle은 각각의 서브 블록 내에서 그래디언트 히스토그램을 생성할 각도의 개수이다.This process rotates L to call out the Extract_SIFT function to remove the rotation attribute to match the reference with other images. Where imrotate is centered on the input image L

Is a function that rotates clockwise, Extract_SIFT is a well-known local descriptor extraction function of SIFT, NumOfSubBlock is the number of subblocks to divide R into the x and y axes, and NumOfAngle is the gradient histogram within each subblock. The number of angles to generate.

Extract_SIFT 함수는 다음과 같이 작성될 수 있다.Extract_SIFT function can be written as follows.

function LocalDescriptor = Extract_SIFT(

R,

R, NumofSubBlock, NumofAngle)function LocalDescriptor = Extract_SIFT (

R,

R, NumofSubBlock, NumofAngle)

LD = zeros(NumofSubBlock,NumofSubBlock,NumOfAngle)LD = zeros (NumofSubBlock, NumofSubBlock, NumOfAngle)

width = size(

R,2)width = size (

R, 2)

height = size(

R,1)height = size (

R, 1)

for yi=0:(height-1)for yi = 0: (height-1)

sbyi = NumofSubBlock*yi/heightsbyi = NumofSubBlock * yi / height

for xi=0:(width-1)for xi = 0: (width-1)

sbxi = NumofSubBlock*xi/widthsbxi = NumofSubBlock * xi / width

Theta = NumOfAngle*angle(

R(yi,xi),

R(yi,xi))/360Theta = NumOfAngle * angle (

R (yi, xi),

R (yi, xi)) / 360

Mag = magnitude(

R(yi,xi),

R(yi,xi))Mag = magnitude (

R (yi, xi),

R (yi, xi))

for byi=0:(NumofSubBlock-1)for byi = 0: (NumofSubBlock-1)

for bxi=0:(NumofSubBlock-1)for bxi = 0: (NumofSubBlock-1)

for ti=0:(NumOfAngle-1)for ti = 0: (NumOfAngle-1)

weight = trilinear_weight(byi,bxi,ti,sbyi,sbxi,Theta)weight = trilinear_weight (byi, bxi, ti, sbyi, sbxi, Theta)

LD(byi,bxi,ti) += Mag*weightLD (byi, bxi, ti) + = Mag * weight

endend

LocalDescriptor <- enumerator_1D(LD)LocalDescriptor <-enumerator_1D (LD)

endend

여기서 LD는 3차원 행렬(인덱스는 0부터 시작한다), zeros는 모든 원소를 0으로 채운 3차원 행렬을 생성하고, trilinear_weight는 3차원 공간에서 (sbyi, sbxi, Theta)가 (byi, bxi, ti)로부터 가까우면 1이고 멀어지면 0에 가까워지는 선형 가중치 계산 함수, angle은 그래디언트의 각도, magnitude는 그래디언트의 강도, enumerator_1D는 입력 행렬을 열 우선 일직선으로 나열하는 함수이고, LocalDescriptor는 L의 로컬 디스크립터이다.Where LD is a three-dimensional matrix (indexes are zero-based), zeros is a three-dimensional matrix filled with zeros of all elements, and trilinear_weight is (sbyi, sbxi, Theta) in three-dimensional space (byi, bxi, ti). ) Is a linear weight calculation function that is close to 1 and close to 0 when angle is close, angle is the angle of the gradient, magnitude is the strength of the gradient, enumerator_1D is the column-first alignment of the input matrix, and LocalDescriptor is the local descriptor of L. .

특징 매칭 과정은 관심점 매칭, 보우팅(voting), 스레시홀딩(thresholding)으로 구성되며, 최종적으로 스케일에 강인한 대응점들이 추출된다. 관심점 매칭 과정은 EPR과 상관없이 개개의 극점의 로컬 디스크립터에 맞는 거리 측정 과정을 수행하여 가장 가까운 한 개의 극점 차원의 1차적인 대응점들을 생성한다. 도 7은 본 발명의 실시예에 따른 EPR 보우팅 과정을 도시한 개략도이다. 도 7의 극점들 사이의 화살표가 로컬 디스크립터 기반으로 생성된 극점들의 대응 관계이다. 관심점과 로컬 디스크립터는 하나씩 쌍을 이루고 있으며, 로컬 디스크립터를 기반으로 두 관심점 사이의 유사 정도를 계산한다. 유사 정도가 임계값보다 크면 두 관심점의 연결 관계를 생성하고, 그렇지 않으면 관계를 생성하지 않는다.The feature matching process consists of interest point matching, voting, and thresholding, and finally, corresponding points that are robust to the scale are extracted. The point-of-interest matching process performs a distance measurement process that fits the local descriptors of individual poles irrespective of the EPR to generate primary correspondence points in the one pole dimension closest to each other. 7 is a schematic diagram illustrating an EPR bowling process according to an embodiment of the present invention. The arrows between the poles of FIG. 7 are the correspondences of the poles generated based on the local descriptor. Points of interest and local descriptors are paired one by one, and the similarity between the two points of interest is calculated based on the local descriptors. If the similarity is greater than the threshold, a linkage relationship between the two points of interest is created; otherwise, no relationship is created.

보우팅 과정은 한 영상의 한 개의 EPR과 다른 영상의 다른 한 개의 EPR과의 극점들 대응 관계를 표시하는 EPR 유사도 테이블(voting table)을 생성한다. 본 발명에서는 EPR 유사도 테이블의 값들은 극점의 대응 관계 개수, 극점의 로컬 디스크립터 거리 가중치 합, 평균, 최소, 최대가 될 수 있다.The bowing process generates an EPR similarity table that indicates the poles correspondence between one EPR of one image and another EPR of another image. In the present invention, the values of the EPR similarity table may be the number of correspondences of the poles, the sum of the weights of the local descriptor distances of the poles, the average, the minimum, and the maximum.

서로 다른 시점의 두 영상으로부터 추출된 두 EPR들의 대응 관계를 찾기 위하여, EPR의 원소 극점들의 로컬 디스크립터(SIFT의 로컬 디스크립터를 사용)와 거리를 비교하여 최종적으로 EPR들의 대응 관계를 찾는다. EPR 대응 관계는 다음과 같은 EPR 보우팅 과정으로 진행된다.In order to find the correspondence of two EPRs extracted from two images of different viewpoints, the correspondence of the EPRs is finally found by comparing the distances with local descriptors of element poles of the EPR (using a local descriptor of SIFT). The EPR correspondence process proceeds as follows.

함수명: EPR_VOTING(

집합,

집합)Function name: EPR_VOTING (

set,

set)

1) 극점 대응점 찾기(extreme point matching)1) Extreme point matching

모든 극점인

와

의 로컬 디스크립터 기반 거리 판단을 근거로, 두 쌍

을 찾는다. 여기서 로컬 디스크립터 기반 거리 측정 방법은 로컬 디스크립터를 SIFT로 사용하여 SIFT 매칭(matching)을 사용한다.All extreme

Wow

Two pairs, based on a local descriptor-based distance determination of

Find it. Here, the local descriptor based distance measurement method uses SIFT matching using a local descriptor as a SIFT.

2) 보우팅(Voting)2) Voting

유일한 대응 쌍

들의 그룹 번호(윗첨자 gid, EPR의 고유 번호)에 가중시켜서 최종적으로 두 EPR의 대응 관계 정도를 수치적으로 찾는다. 일반적으로 EPR 보우팅 과정 중에서 해당 EPR에 가중치를 1로 적용하여, 유사도를 표현하는 EPR 유사도 테이블을 생성한다.Unique pair

Finally, the group number (superscript gid, unique number of the EPR) is weighted to find the degree of correspondence between the two EPRs. In general, the EPR similarity table representing the similarity is generated by applying a weight of 1 to the corresponding EPR in the EPR bowling process.

3) 스레시홀딩(Thresholding)3) Thresholding

두 EPR의 대응 관계 정도를 임계값(thVoting)을 적용하여 대응 관계가 강한 EPR들만 추출한다. EPR 유사도 테이블에 임계값(thVoting)을 기준으로 임계 처리한다. 스레시홀딩 과정은 EPR 유사도 테이블에 환경에 알맞은 임의의 임계값을 적용함으로써 최종적으로 EPR간의 대응 관계를 생성할 수 있다. 여기서 임계값 적용 과정은 주어진 임계값 보다 크다 또는 작다고 할 수 있다.By applying the threshold (thVoting) the degree of correspondence between the two EPRs, only the EPRs with strong correspondences are extracted. Threshold processing is based on the threshold (thVoting) in the EPR similarity table. Thresholding process can finally generate a correspondence relationship between EPRs by applying any threshold value appropriate to the environment to the EPR similarity table. Here, the threshold application process may be said to be larger or smaller than the given threshold.

이와 같이 본 발명에 의하면 관심점 함수에 직접적으로 스케일 정규화 항을 삽입하기보다는 간접적으로 입력 로컬 영역을 크기 정규화하는 방법을 이용하므로 이로 인하여 기존 유클리드 모델 기법의 관심점 함수를 특별한 수정 없이 스케일에 강인한 모델로 확장이 용이하다. 따라서 더 다양한 타입의 관심점 추출이 가능해져 관심점 추출 기법의 활용 영역을 확장할 수 있다.As such, the present invention uses a method of indirectly size normalizing the input local region rather than inserting the scale normalization term directly into the point of interest function. Thus, a model that is robust to the scale of the point of interest function of the existing Euclidean model technique without special modification is used. Easy to expand Therefore, more various types of points of interest extraction are possible, thereby extending the application area of the points of interest extraction technique.

본 발명의 다른 실시예는 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터로 읽을 수 있는 매체를 포함한다. 이 매체는 지금까지 설명한 영상 처리 방법을 실행시키기 위한 프로그램을 기록한다. 이 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 이러한 매체의 예에는 하드디스크, 플로피디스크 및 자기 테이프와 같은 자기 매체, CD 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(Floptical Disk)와 자기-광 매체, 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 구성된 하드웨어 장치 등이 있다. 또는 이러한 매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.Other embodiments of the invention include a computer readable medium having program instructions for performing various computer implemented operations. This medium records a program for executing the image processing method described so far. The medium may include program instructions, data files, data structures, etc., alone or in combination. Examples of such media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD and DVD, programmed instructions such as Floptical Disk and magneto-optical media, ROM, And a hardware device configured to store and execute the program. Or such medium may be a transmission medium, such as optical or metal lines, waveguides, etc., including a carrier wave that transmits a signal specifying a program command, data structure, or the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

이상에서 본 발명의 바람직한 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the preferred embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present invention defined in the following claims are also provided. It belongs to the scope of right.

100: 영상 처리 장치 110: 입력부
120: 데이터베이스 130: 영상 처리부
140: 출력부100: image processing apparatus 110: input unit
120: database 130: image processing unit
140:

Claims

An image processing apparatus for matching a point of interest from two images,
Generating an L-scale space each of which the scale is enlarged or reduced for the two images, resampling using a predetermined magnification with respect to a local patch of the L-scale space, to generate a normalized local patch, and the normalized local patch An image processor extracting a point of interest by generating an M-scale space by applying a point of interest function to
And the image processing apparatus.

In claim 1,
The image processing unit extracts a plurality of poles from the M-scale space to group the extracted plurality of poles, and extracts the points of interest of the two images in preparation for the grouped poles.

3. The method of claim 2,
The image processing unit groups the extracted poles while traversing the extracted poles from an upper level to a lower level of the M-scale space, and obtains representative positions, representative intensities, and scale persistences of the grouped poles. An image processing apparatus that sets the representative value of the pole.

3. The method of claim 2,
The image processor extracts a local descriptor that is a feature vector of a local patch around the pole, measures a distance between local descriptors of the two images, and generates a corresponding relationship between the grouped poles of the two images according to the measured distance. Image processing device.

5. The method of claim 4,
And the image processor matches the points of interest of the two images based on the number of correspondences of the grouped poles.

5. The method of claim 4,
And the image processor generates a similarity table according to a corresponding relationship between the grouped poles of the two images and matches the points of interest of the two images based on the similarity table.

5. The method of claim 4,
And the image processing unit compares the degree of correspondence between the grouped poles with a threshold and matches the grouped poles having a degree of correspondence greater than the threshold with the points of interest of the two images.

In claim 1,
And the L-scale space generates two times reduction and / or enlargement between each octave each time a scale space is generated.

In claim 1,
The point of interest function calculates an intensity for the normalized local patch and is any one of a Harris corner detector, a Kanade-Lucas-Tomasi (KLT) detector, and a generalized symmetric transform (GST).

In claim 1,
And a normalized local patch normalized to P × P by performing a normalization process on the local patch of the L-scale space in proportion to a scale.

An image processing method of matching a point of interest from two images,
Generating L-scale spaces each of which the scale is enlarged or reduced for the two images;
Generating a normalized local patch by resampling the local patch of the L-scale space using a predetermined magnification; and
Extracting a point of interest by generating an M-scale space by applying a point of interest function to the normalized local patch
And an image processing method.

12. The method of claim 11,
The interest extraction step,
Extracting a plurality of poles from the M-scale space to group the extracted plurality of poles, and
Extracting the points of interest of the two images in preparation for the grouped poles
And an image processing method.

The method of claim 12,
The pole grouping step may be performed by grouping the extracted poles while traversing the extracted poles from a higher level to a lower level of the M-scale space, and obtaining representative positions, representative strengths, and scale persistence of the grouped poles. An image processing method comprising the step of setting the representative value of the pole.

The method of claim 12,
Extracting a local descriptor that is a feature vector of a local patch around the pole; and
Measuring a distance between local descriptors of the two images and generating a corresponding relationship between the grouped poles of the two images according to the measured distances
Further comprising the steps of:

The method of claim 14,
And matching the points of interest of the two images according to the number of correspondences of the grouped poles.

The method of claim 14,
Generating a similarity table according to a corresponding relationship between the grouped poles of the two images, and
Matching interests of the two images based on the similarity table
Further comprising the steps of:

The method of claim 14,
And comparing the degree of correspondence of the grouped poles with a threshold and matching the grouped poles having a degree of correspondence larger than the threshold with the points of interest of the two images.

12. The method of claim 11,
And the L-scale space is reduced and / or enlarged twice in size between each octave each time a scale space is generated.

12. The method of claim 11,
The point of interest function calculates an intensity for the normalized local patch and is any one of a Harris corner detector, a Kanade-Lucas-Tomasi detector, and a generalized symmetric transform.

12. The method of claim 11,
And a normalized local patch normalized to P × P by performing a normalization process on the local patch of the L-scale space in proportion to a scale.

A computer readable medium having recorded thereon a program for executing the method of claim 11.