KR101478709B1

KR101478709B1 - Method and apparatus for extracting and generating feature point and feature descriptor rgb-d image

Info

Publication number: KR101478709B1
Application number: KR20130074403A
Authority: KR
Inventors: 우운택; 박노영; 장영균
Original assignee: 한국과학기술원
Priority date: 2012-06-27
Filing date: 2013-06-27
Publication date: 2015-01-05
Also published as: KR20140001168A

Abstract

본 발명은 카메라를 통해 획득된 깊이 정보를 포함하는 3차원 영상의 픽셀별 깊이 영상 함수관계를 통해 픽셀별 위치에 기반한 깊이 그래디언트(gradient)를 생성하는 과정과, 생성된 상기 깊이 그래디언트에 기초하여 상기 깊이 영상의 소정 정점(vertex)을 기준으로 인접한 세 정점을 계산하는 과정과, 계산된 상기 세 정점을 사용하여 기설정된 벡터를 산출하고, 산출된 상기 벡터의 외적(cross-product)연산으로부터 정규화(normalized)된 표면 정규 벡터(surface normal vector)를 추출하는 과정과, 추출된 상기 표면 정규 벡터에 의해 구성된 해당 깊이 영상 벡터 정보를 이용하여 렌더링을 수행하여 3차원 벡터를 2차원 영상으로 영상화하는 과정과, SIFT(Scale Invariant Feature Transform) 알고리즘을 적용하여 상기 2차원 영상으로부터 특징 기술자(feature descriptor)를 산출하는 과정을 포함함을 특징으로 한다.The method includes generating a depth gradient based on a pixel-by-pixel position through a depth-image function relationship of each pixel of a three-dimensional image including depth information obtained through a camera, Calculating a set of adjacent vertexes based on a predetermined vertex of the depth image, calculating a preset vector using the calculated vertexes, and performing normalization from the computed cross-product of the vector dimensional vector into a two-dimensional image by performing rendering using the depth normalized vector normal information and a surface normal vector obtained by normalizing the surface normal vector, , Calculating a feature descriptor from the 2D image by applying a Scale Invariant Feature Transform (SIFT) algorithm, and A it characterized in that it comprises.

Description

TECHNICAL FIELD [0001] The present invention relates to a method and apparatus for extracting feature points of RGB-D images and a method of generating feature descriptors,

본 발명은 RGB-D 영상센서로부터 얻어지는 깊이 영상(depth image) 기반의 영상 특징점(feature point) 및 특징 기술자(feature descriptor) 생성에 관한 것이다.The present invention relates to generation of image feature points and feature descriptors based on a depth image obtained from an RGB-D image sensor.

컴퓨터 비젼, 로보틱스, 증강현실 분야에서 3차원 공간 및 3차원 객체 검출 및 인식기술의 중용성이 대두되고 있다. 특히, 텍스처가 부족한(low-detailed texture) 대상이나 다양한 조명 조건의 변화(light change) 속에서도 객체의 검출 및 인식을 강건하게 하는 다양한 연구들이 진행되고 있다.In the fields of computer vision, robotics, and augmented reality, three - dimensional space and three - dimensional object detection and recognition technology are becoming more important. In particular, various researches are being conducted to enhance the detection and recognition of objects even in a light-change condition of a low-detailed texture object or various lighting conditions.

마이크로소프트사의 키넥트(Microsoft Kinect) 방식을 사용하는 영상 센서를 통하여 RGB 영상과 깊이 영상을 실시간 획득하는 것이 가능해짐으로 인하여 객체 검출, 추적 및 인식 연구에 많은 변화를 가져오고 있다[1, 2].Since the real time acquisition of RGB image and depth image can be achieved through image sensor using Microsoft Kinect method, many changes have been made in object detection, tracking and recognition [1, 2] .

기존의 RGB 영상만을 사용할 경우, SIFT[3], SURF[4] 등의 다양한 특징점 검출 및 기술자 생성방법을 사용하여 표현할 수 있다.If only existing RGB images are used, various feature point detection and descriptor generation methods such as SIFT [3] and SURF [4] can be used.

하지만 텍스처가 부족한 경우나 조명변화가 극심한 경우에 RGB 영상에서의 특징점 추출 및 기술자의 매칭(matching)이 불가능하여 객체 인식률을 떨어뜨리는 주요한 요인으로 작용한다[5][6].However, in case of lack of texture or extreme illumination change, extraction of feature points and matching of descriptors in RGB images are impossible, which is a major factor for lowering object recognition rate [5] [6].

따라서 본 발명은 이와 같은 문제를 해결하기 위해 키넥트 방식의 영상센서를 사용하여 RGB 특징 기술자의 깊이 영상 기반의 특징 기술자를 동시에 사용하여 객체를 학습한 후, 영상 기반의 특징 기술자를 동시에 사용하여 객체를 학습한 후, 다양한 환경 변화에도 객체의 인식률을 높이는 방법을 제안하고자 한다.Accordingly, in order to solve such a problem, the present invention uses an image sensor of a Kinect type to simultaneously use a depth image-based feature descriptor of an RGB feature descriptor to simultaneously learn an object, We propose a method to increase the object recognition rate in various environmental changes.

본 발명의 일 견지에 따르면, 카메라를 통해 획득된 깊이 정보를 포함하는 3차원 영상의 픽셀별 깊이 영상 함수관계를 통해 픽셀별 위치에 기반한 깊이 그래디언트(gradient)를 생성하는 과정과, 생성된 상기 깊이 그래디언트에 기초하여 상기 깊이 영상의 소정 정점(vertex)을 기준으로 인접한 세 정점을 계산하는 과정과, 계산된 상기 세 정점을 사용하여 기설정된 벡터를 산출하고, 산출된 상기 벡터의 외적(cross-product)연산으로부터 정규화(normalized)된 표면 정규 벡터(surface normal vector)를 추출하는 과정과, 추출된 상기 표면 정규 벡터에 의해 구성된 해당 깊이 영상 벡터 정보를 이용하여 렌더링을 수행하여 3차원 벡터를 2차원 영상으로 영상화하는 과정과, SIFT(Scale Invariant Feature Transform) 알고리즘을 적용하여 상기 2차원 영상으로부터 특징 기술자(feature descriptor)를 산출하는 과정을 포함함을 특징으로 한다.According to an aspect of the present invention, there is provided a method for generating a depth gradient based on a pixel-by-pixel position, the method comprising: generating a depth gradient based on a pixel- Calculating a set of vertexes adjacent to a predetermined vertex of the depth image based on a gradient, calculating a predetermined vector using the calculated vertexes, and calculating a cross-product of the calculated vector ) Extraction of the normalized surface normal vector from the computed depth normal vector, and rendering using the extracted depth image vector information composed of the extracted surface normal vector to obtain a three-dimensional vector as a two-dimensional image And applying a Scale Invariant Feature Transform (SIFT) algorithm to the feature descriptor tor) of the image.

본 발명의 다른 견지에 따르면, RGB-D 영상 특징점 추출 및 특징 기술자 생성 장치의 전반적인 동작을 제어하는 제어부의 제어 하에 3차원 영상의 픽셀별 깊이 영상 함수관계를 통해 픽셀별 위치에 기반한 깊이 그래디언트(gradient)를 생성하는 그래디언트 생성부와, 모드 전환을 통해 촬영부로부터 깊이 정보를 포함하는 3차원 영상 및 RGB 영상을 각각 획득하고, 상기 그래디언트 생성부로부터 생성된 깊이 그래디언트에 기초하여 상기 깊이 영상의 소정 정점(vertex)을 기준으로 인접한 세 정점을 계산하고, 상기 계산된 세 정점을 사용하여 기설정된 벡터를 산출하여 표면 정규 벡터로의 변환을 통해 생성된 영상으로부터 깊이 영상의 특징점 및 특징 기술자를 추출하는 제어부와, 상기 제어부로부터 산출된 벡터의 외적(cross-product)연산으로부터 정규화(normalized)된 표면 정규 벡터(surface normal vector)를 추출하는 표면 정규 벡터 추출부와, 상기 제어부의 제어 하에 상기 표면 정규 벡터에 의해 구성된 해당 깊이 영상 벡터 정보를 이용하여 랜더링을 수행하는 랜더링부를 포함함을 특징으로 한다.According to another aspect of the present invention, there is provided a depth-gradient-based gradient-based pixel-by-pixel-based depth-image function relationship between a three-dimensional image and a depth image function under the control of a controller for controlling the overall operation of the RGB- And acquiring a 3D image and an RGB image including depth information from the photographing unit through mode switching, respectively, and acquiring a predetermined vertex of the depth image based on the depth gradient generated from the gradient generating unit, a controller for calculating the adjacent vertexes based on the vertex of the depth image and calculating a predetermined vector using the calculated vertexes and extracting the feature points and feature descriptors of the depth image from the image generated through the conversion into the surface normal vector From the cross-product calculation of the vector calculated from the controller, a surface normal vector extracting unit for extracting an ed surface normal vector and a rendering unit for performing rendering using the depth image vector information configured by the surface normal vector under the control of the control unit .

본 발명은 카메라의 회전 및 이동 변화, 텍스쳐가 부족한 대상이나 다양한 조명 조건의 변화 속에서도 객체의 검출 및 인식을 강건하게 하는 효과가 있다.INDUSTRIAL APPLICABILITY The present invention has the effect of strengthening the detection and recognition of objects even in the case of changes in the rotation and movement of a camera, a change in a subject with insufficient texture or various lighting conditions.

도 1은 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성 방법에 관한 전체 흐름도.
도 2는 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성에 관한 화면 예시도.
도 3은 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성 방법이 적용된 표면 정규 벡터 메쉬 랜더링 결과 및 그레이 스케일 변환 결과를 보인 화면 예시도.
도 4는 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성 방법이 적용된 테스트 환경.
도 5는 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성 장치에 관한 블록도.
도 6은 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성 방법에서 검출 방법에 따른 객체 별 인식률을 보인 그래프.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an overall flowchart of an RGB-D image feature point extraction and feature descriptor generation method according to an embodiment of the present invention; FIG.
FIG. 2 is a diagram illustrating a screen for extracting RGB-D image feature points and generating feature descriptors according to an embodiment of the present invention; FIG.
FIG. 3 is a diagram illustrating a result of a surface normal vector mesh rendering and a gray-scale transformation using a feature point extraction and feature descriptor generation method according to an exemplary embodiment of the present invention.
FIG. 4 is a test environment to which an RGB-D image feature point extraction and feature descriptor generation method according to an exemplary embodiment of the present invention is applied.
FIG. 5 is a block diagram of an apparatus for generating feature points and extracting feature points of RGB-D images according to an exemplary embodiment of the present invention. Referring to FIG.
FIG. 6 is a graph showing recognition rates per object according to a detection method in the method of extracting feature points and generating characteristic descriptors of RGB-D images according to an exemplary embodiment of the present invention.

이하 본 발명에 따른 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 하기 설명에서는 구체적인 구성 소자 등과 같은 특정 사항들이 나타나고 있는데 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐 이러한 특정 사항들이 본 발명의 범위 내에서 소정의 변형이나 혹은 변경이 이루어질 수 있음은 이 기술 분야에서 통상의 지식을 가진 자에게는 자명하다 할 것이다.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. It will be appreciated that those skilled in the art will readily observe that certain changes in form and detail may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims. To those of ordinary skill in the art.

본 발명은 마이크로소프트사의 키넥트(Microsoft Kinect) 방식의 RGB-D 영상센서로부터 얻어지는 깊이 영상(depth image) 기반의 영상 특징점(feature point) 및 특징 기술자(feature descriptor) 생성에 관한 것으로, 더욱 상세하게는 획득된 깊이 영상으로부터 3차원 객체의 기하정보를 표현하는 표면 정규 벡터(Surface Normal Vector)를 추출하여 그 결과를 영상화함으로써 텍스쳐 유무, 카메라 회전 및 이동 변화 등의 환경변화에 강건한 3차원 객체 인식 성능을 향상 가능한 기술을 제공하고자 한다.
The present invention relates to generation of image feature points and feature descriptors based on a depth image obtained from an RGB-D image sensor based on Microsoft's Kinect method, Dimensional object recognition function that is robust to environmental changes such as texture, camera rotation, and movement change by extracting a surface normal vector representing the geometric information of the 3D object from the acquired depth image and imaging the result. To provide a technology that can improve the performance of the system.

이하, 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성 방법에 대해 도 1 및 도 4를 참조하여 자세히 살펴보기로 한다.Hereinafter, an RGB-D image feature point extraction and feature descriptor generation method according to an embodiment of the present invention will be described in detail with reference to FIG. 1 and FIG.

우선, 도 1은 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성 방법에 관한 전체 흐름도이다.First, FIG. 1 is an overall flowchart of an RGB-D image feature point extraction and feature descriptor generation method according to an embodiment of the present invention.

도 1을 참조하면, 먼저 110 과정에서 제어부의 제어 하에 설정된 설정 모드별 영상을 획득한다. 상기 설정 모드는 깊이 정보가 포함된 3차원 영상 및 RGB 영상 획득 모드로 구분된다.Referring to FIG. 1, in step 110, an image for each setting mode is acquired under the control of the control unit. The setting mode is classified into a three-dimensional image including depth information and an RGB image acquisition mode.

112 과정에서는 획득된 영상이 3차원 영상인지의 여부를 체크하여 3차원 영상인 경우, 116 과정으로 이동하여 해당 3차원 영상의 픽셀별 깊이 영상 함수관계를 통해 픽셀별 위치에 기반한 깊이 그래디언트(gradient)를 생성한다.In step 112, it is checked whether the acquired image is a three-dimensional image. If the acquired image is a three-dimensional image, the process moves to step 116 and a depth gradient based on the pixel- .

더욱 상세하게는, 상기 3차원 영상의 픽셀 위치 x에 대한 깊이 영상 함수 D(x)는 하기 수학식 같은 형태이다.More specifically, the depth image function D (x) for the pixel position x of the three-dimensional image is in the form of the following equation.

이를 통해 깊이 영상의 픽셀 위치 x를 기준으로 임의의 오프셋(offset) dx는

로 나타낼 수 있으며, 상기 그래디언트

의 최소 제곱법(least-square)으로부터 추정 가능하다.Thus, an arbitrary offset dx based on the pixel position x of the depth image is

, And the gradient

(Least-squares) of < / RTI >

120 과정에서는 116 과정의 동작으로 생성된 상기 깊이 그래디언트에 기초하여 상기 깊이 영상의 소정 정점(vertex)을 기준으로 인접한 세 정점을 계산하고, 122 과정에서는 상기 계산된 세 정점을 이용하여 기설정된 벡터를 산출한다.In step 120, neighboring vertexes are calculated based on a predetermined vertex of the depth image based on the depth gradient generated in step 116. In step 122, a predetermined vector is calculated using the calculated vertexes. .

124 과정에서는 산출된 상기 벡터의 외적(cross-product)연산으로부터 정규화(normalized)된 표면 정규 벡터(surface normal vector)를 추출한다.In operation 124, a normalized surface normal vector is extracted from the computed cross-product operation of the vector.

본 발명에 따르면 깊이 카메라의 상태 변화에도 대상 객체의 3차원 표면정보를 3차원 벡터로 일정하게 표현하는 표면 정규 벡터 변환 방식을 사용하며, 이는 깊이 영상으로부터 3차원 기하정보를 표현하는 영상 특징을 정의하기 위해서는 깊이 영상을 카메라의 회전(rotation), 이동(translation), 어파인(affine) 변화에도 일정한 형태를 유지 할 수 있는 영상 변환이 필요하기 때문이다.According to the present invention, a surface normal vector transformation method that constantly expresses the three-dimensional surface information of a target object as a three-dimensional vector is used even in a state change of a depth camera, and this defines an image feature expressing three- This is because the depth image needs to be transformed to maintain a certain shape in the rotation, translation, and affine changes of the camera.

즉, 3차원 공간에 존재하는 인접한 세 개의 정점

,

는 각각 하기의 수학식과 같이 표현될 수 있다.That is, three adjacent vertices existing in the three-dimensional space

,

Can be expressed by the following equations respectively.

이때 는 깊이 센서 카메라의 주점(principal point)으로부터 픽셀 위치 x를 지나 3차원 포인트인 X로 향하는 벡터를 나타내므로 하기의 수학식을 이용하여 깊이센서 카메라의 내부 파라미터(internal parameter) K를 통해 계산될 수 있다.At this time Represents a vector from the principal point of the depth sensor camera through the pixel position x to the three-dimensional point X, it can be calculated through the internal parameter K of the depth sensor camera using the following equation have.

이와 같이 계산된 3차원 공간의 세 개의 포인트

,

를 사용하여, 정점

로부터 시작하는 두 개의 벡터

-

및

-

로 표현할 수 있고, 이 두 벡터의 외적(cross-product) 연산으로부터 정규화(normalized)된 표면 정규 벡터의 추출이 가능하다.The three points of the three-dimensional space thus calculated

,

, The vertex

Two vectors starting from

-

And

-

, And it is possible to extract the normalized surface normal vector from the cross-product operation of these two vectors.

여기서, 도 2을 참조하면 도 2는 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성에 관한 화면 예시도로서, 도 2의 상단은 깊이 영상 기반 그래디언트 가시화 결과를 보인 예시도이고, 도 2의 하단은 깊이 영상 기반 표면 정규 벡터 가시와 결과를 보인 화면 예시도이다.Referring to FIG. 2, FIG. 2 is a view illustrating a screen for extracting feature points of RGB-D images and generating feature descriptors according to an exemplary embodiment of the present invention. The upper part of FIG. 2 is an example And the lower end of FIG. 2 is an example of a screen showing a depth image-based surface normal vector view and a result.

계속해서 126 과정에서는 깊이 영상의 한 정점

을 기준으로 반시계 방향(CCW)으로 인접한 세 정점

,

를 연결하는 하나의 폴리곤 메쉬(polygon mesh)를 랜더링한다.Subsequently, in step 126,

(CCW) relative to the three vertices

,

A polygon mesh is connected.

이 과정에서 생성되는 폴리곤 메쉬의 법선벡터의 방향은 각 4개의 정점의 표면 정규 벡터

,

의 평균값으로 지정한다.The direction of the normal vector of the polygon mesh generated in this process is the surface regular vector of each of the four vertices

,

As the average value of "

이러한 126 과정의 동작을 통해 x. y, z 방향의 3차원 벡터를 2차원 영상으로 영상화한다.Through the operation of these 126 processes, x. y, and z directions into a two-dimensional image.

여기서, 도 3을 참조하면, 도 3은 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성 방법이 적용된 화면 예시도서, 도 3의 왼쪽에는 표면 정규 벡터 메쉬 랜더링 결과이고, 오른쪽에는 그레이 스케일 변환 결과가 도시된 화면 예시도이다.3 is a screen illustrative book to which an RGB-D image feature point extraction and feature descriptor generation method according to an embodiment of the present invention is applied, and FIG. 3 shows a result of a surface normal vector mesh rendering on the left side of FIG. Is an example of a screen in which a result of gray scale conversion is shown.

128 과정에서는 SIFT(Scale Invariant Feature Transform) 알고리즘을 적용하고, 130 과정에서 상기 2차원 영상으로부터 특징 기술자(feature descriptor)를 산출한다.In operation 128, a Scale Invariant Feature Transform (SIFT) algorithm is applied. In operation 130, a feature descriptor is calculated from the 2D image.

한편, 112 과정에의 3차원 영상 여부 체크 결과, 3차원 영상이 아닌 경우 114 과정으로 이동하여 RGB 영상을 118 과정을 통해 그레이 스케일(Grey Scale)영상으로 변환하고, 128 과정으로 이동하여 변환된 상기 그레이 스케일 영상에 SIFT 알고리즘을 적용하여 특징 기술자를 산출한다.If it is determined that the 3D image is not a 3D image, the RGB image is converted to a gray scale image in step 118, and the RGB image is converted to a gray scale image in step 118. In step 128, The feature descriptor is calculated by applying the SIFT algorithm to the gray scale image.

상술한 바와 같이, 본 발명에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성이 적용된 기술은 키넥트 타입의 카메라로부터 RGB 영상과 깊이 영상을 획득 후, 획득된 각각의 영상으로부터 각각 SIFT 특징점(feature point)을 추출하고, 특징 기술자(feature point)를 생성한다.As described above, the technology to which the RGB-D image feature point extraction and feature descriptor generation according to the present invention is applied is to acquire an RGB image and a depth image from a Kinect type camera, and then obtain SIFT feature points ), And generates a feature point.

이렇게 생성된 기술자는 Rondom Forest(RF) 기반의 코드북(CodeBook)학습을 통해 히스토그램으로 배깅(bagging)된다. 이 배깅하는 방법은, 동일 차원을 가지는 제안된 특징 기술자들이 코드북을 통해 잎새 노드(leaf node)에 도달했을 때 각 특징 기술자들이 가장 잘 구분될 수 있도록 학습된 코드북을 사용한다[6, 7].The generated descriptors are then bagged with histograms through Rondom Forest (RF) based codebook learning. This method of bidding uses the learned codebooks so that each feature descriptor is best distinguished when the proposed feature descriptors having the same dimension reach the leaf node through the codebook [6, 7].

도 2에 도시된 영상에서 검출된 임의의 특징점들은 코드북을 통과해서 잎새 노드에 도달했을 때 그 해당 잎새 노드에 상응하는 히스토그램의 인덱스(index)에 누적된다. 이와 같은 과정을 통해 영상에서 생성된 모든 특징 기술자들이 코드북을 통과함으로써 객체 자체를 나타내는 새로운 히스토그램으로 재생성된다. Any feature points detected in the image shown in FIG. 2 are accumulated at the index of the histogram corresponding to the corresponding leaf node when reaching the leaf node through the codebook. Through this process, all the feature descriptors generated in the image pass through the codebook and are regenerated as a new histogram representing the object itself.

이렇게 생성된 각 학습된 레퍼런스 히스토그램(reference histogram)들은 매칭 시에 생성된 히스토그램과 k-NN(nearest neighbor) 알고리즘을 사용함으로써 매칭된다.Each of the learned reference histograms thus generated is matched by using a histogram generated at the time of matching and a nearest neighbor (k-NN) algorithm.

학습 시에 SVM(Support Vector Machine)이나 RF와 같이 복잡하게 학습된 구분자(classifier)를 사용하면 보다 좋은 성능을 낼 것으로 예상하지만, 본 발명에서는 기술자 자체가 가지는 유의성을 강조하기 위해 매칭은 가장 유사도가 높게 나타나는 것을 선택하는 1-NN를 사용하였다.In the present invention, in order to emphasize the significance of the engineer itself, the matching is the most similarity is obtained by using the classifier of SVM (Support Vector Machine) or RF 1-NN, which selects high-appearing ones, was used.

실험 및 결론Experiment and Conclusion

본 발명의 RGB-D 영상 특징점 추출 및 특징 기술자 생성의 성능을 입증하기 위하여 도 4에 도시된 바와 같이 10개의 서로 다른 객체를 사용하여 테스트 환경을 구성하였다.In order to verify the performance of the RGB-D image feature point extraction and feature descriptor generation of the present invention, a test environment was constructed using 10 different objects as shown in FIG.

대상 객체들은 텍스처의 유무, 카메라의 회전 및 이동 변화를 포함하는 테스트 세트와 25 프레임의 상의한 트레이닝 세트를 기반으로 실험 환경을 구성하였다.Target objects were constructed based on a test set including texture presence, camera rotation and movement change, and a training set of 25 frames.

실험 환경은 Core i7 3.40Ghz CPU와 GPUSift[8] 사용을 위한 GTX580 그래픽프로세서 환경에서 실험하였다.Experimental environment was tested in Core i7 3.40Ghz CPU and GTX580 graphics processor environment for GPUSift [8].

실험을 위한 기본 인식률 확인은 Fast Corner 검출 방식과 SURF 특징 기술자 검출 방식, RGB 영상 기반의 SIFT 특징 검출 방식, RGB-D 영상 기반 SIFT 특징 검출로 나누어 성능 향상을 확인하였다.The basic recognition rate verification for the experiment is divided into Fast Corner detection method, SURF feature descriptor detection method, RGB image based SIFT feature detection method, and RGB-D image based SIFT feature detection.

도 6에 도시된 바와 같이 본 발명이 적용된 기술 방식을 사용한 결과 인식률의 향상을 확인할 수 있었다. 수치적으로는 RGB 영상 특징만을 사용한 경우에 비하여 12.2%의 인식률 향상을 확인하였으며 주어진 테스트 환경에 74.4%의 인식률을 확인하였다.
As shown in FIG. 6, the improvement of the recognition rate can be confirmed by using the description scheme to which the present invention is applied. Numerical results show that the recognition rate is improved by 12.2% compared to the case where only RGB image features are used and the recognition rate of 74.4% is confirmed in a given test environment.

이상에서는, 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성 방법에 관해 살펴보았다.Hereinabove, the RGB-D image feature point extraction and the feature descriptor generation method according to an embodiment of the present invention have been described.

이하, 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성 장치에 관하여 도 5를 참조하여 살펴보기로 한다.Hereinafter, an RGB-D image feature point extracting and feature descriptor generating apparatus according to an embodiment of the present invention will be described with reference to FIG.

도 5는 본 발명의 일 실시 예에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성 장치에 관한 블록도이다.FIG. 5 is a block diagram of an apparatus for extracting feature points of RGB-D images and generating feature descriptors according to an embodiment of the present invention. Referring to FIG.

도 5를 참조하면, 본 발명이 적용된 RGB-D 영상 특징점 추출 및 특징 기술자 생성 장치(500)는 촬영부(510), 그래디언트 생성부(512), 제어부(514), 랜더링부(516) 및 표면 정규벡터 추출부(518)을 포함한다.5, an RGB-D image feature point extracting and feature descriptor generating apparatus 500 to which the present invention is applied includes an image capturing unit 510, a gradient generating unit 512, a control unit 514, a rendering unit 516, And a normal vector extracting unit 518.

상기 그래디언트 생성부(512)는 RGB-D 영상 특징점 추출 및 특징 기술자 생성 장치의 전반적인 동작을 제어하는 제어부(514)의 제어 하에 3차원 영상의 픽셀별 깊이 영상 함수관계를 통해 픽셀별 위치에 기반한 깊이 그래디언트(gradient)를 생성한다.The gradient generating unit 512 generates a gradient based on a pixel-by-pixel position based on a pixel-by-pixel depth image function relationship of the three-dimensional image under the control of the controller 514 for controlling the overall operation of the RGB- Create a gradient.

상기 제어부(514)는 모드 전환을 통해 촬영부로부터 깊이 정보를 포함하는 3차원 영상 및 RGB 영상을 각각 획득하고, 상기 그래디언트 생성부(512)로부터 생성된 깊이 그래디언트에 기초하여 상기 깊이 영상의 소정 정점(vertex)을 기준으로 인접한 세 정점을 계산하고, 상기 계산된 세 정점을 사용하여 기설정된 벡터를 산출하여 표면 정규 벡터로의 변환을 통해 생성된 영상으로부터 깊이 영상의 특징점 및 특징 기술자를 추출한다.The control unit 514 acquires the 3D image and the RGB image including the depth information from the photographing unit through the mode switching and obtains a predetermined vertex of the depth image based on the depth gradient generated from the gradient generating unit 512. [ and calculates the predetermined vector using the computed cleavage points and extracts the feature points and feature descriptors of the depth image from the image generated through the transformation into the surface normal vector.

그리고 상기 제어부(514)는 SIFT(Scale Invariant Feature Transform) 알고리즘을 적용하여 상기 2차원 영상으로부터 특징 기술자(feature descriptor)를 산출한다.The controller 514 calculates a feature descriptor from the 2D image by applying a Scale Invariant Feature Transform (SIFT) algorithm.

또한, 상기 제어부(514)는 상기 3차원 영상 획득 모드와 다른 모드에서 획득된 RGB 영상을 그레이 스케일(Grey Scale) 영상을 변환하고, 변환된 상기 그레이 스케일 영상에 SIFT 알고리즘을 적용하여 특징 기술자를 산출한다.Also, the controller 514 converts a gray scale image of the RGB image obtained in the mode different from the 3D image acquisition mode, and applies a SIFT algorithm to the converted gray scale image to calculate a feature descriptor do.

상기 표면 정규 벡터 추출부(518)은 제어부(514)로부터 산출된 벡터의 외적(cross-product)연산으로부터 정규화(normalized)된 표면 정규 벡터(surface normal vector)를 추출한다.The surface normal vector extraction unit 518 extracts a normalized surface normal vector from the cross-product operation of the vector calculated by the control unit 514.

즉, 세 정점을 사용하여 소정 정점으로부터 시작하는 두 개 벡터의 외적(cross-product) 연산으로부터 정규화(normalized)된 표면 정규 벡터를 추출한다.That is, a normalized surface normal vector is extracted from a cross-product operation of two vectors starting from a predetermined vertex using a triple vertex.

상기 랜더링부(516)는 제어부(514)의 제어 하에 상기 표면 정규 벡터에 의해 구성된 해당 깊이 영상 벡터 정보를 이용하여 렌더링을 수행한다.The rendering unit 516 performs rendering using the corresponding depth image vector information configured by the surface normal vector under the control of the controller 514.

즉, 깊이 영상의 한 정점을 기준으로 반시계 방향으로 인접한 세 정점을 연결하는 하나의 폴리곤 메쉬(polygon mesh)를 랜더링하고, 이를 통해 생성된 폴리곤 메쉬의 법선벡터의 방향은 각 4개 정점의 표면 정규 벡터의 평균값으로 지정한다.
That is, a polygon mesh connecting three vertices adjacent in a counterclockwise direction with respect to one vertex of the depth image is rendered, and the direction of the normal vector of the polygon mesh generated through the vertex is defined as the surface of each of the four vertices It is specified as an average value of normal vectors.

상기와 같이 본 발명에 따른 RGB-D 영상 특징점 추출 및 특징 기술자 생성에 관한 동작이 이루어질 수 있으며, 한편 상기한 본 발명의 설명에서는 구체적인 실시 예에 관해 설명하였으나 여러 가지 변형이 본 발명의 범위를 벗어나지 않고 실시될 수 있다. 따라서 본 발명의 범위는 설명된 실시 예에 의하여 정할 것이 아니고 청구범위와 청구범위의 균등한 것에 의하여 정하여져야 할 것이다.As described above, the operation of extracting the feature point of the RGB-D image and generating the feature descriptor according to the present invention can be performed. While the present invention has been described in detail with reference to the specific embodiments thereof, . Accordingly, the scope of the present invention should not be limited by the illustrated embodiments, but should be determined by equivalents of the claims and the claims.

510: 촬영부 512: 그래디언트 생성부
514: 사용자 인터페이스부 516: 랜더링부
518: 표면 정규 벡터 추출부
[참고 문헌]

510: photographing unit 512: gradient generating unit
514: User interface unit 516: Rendering unit
518: surface normal vector extracting unit
[references]

Claims

Generating a depth gradient based on a pixel-by-pixel position through a pixel-by-pixel depth image function relationship of a three-dimensional image including depth information obtained through a camera;
Calculating three vertexes adjacent to each other based on a predetermined vertex of the depth image based on the generated depth gradient;
Calculating a predetermined vector using the computed cleavage point and extracting a normalized surface normal vector from the computed cross-product computation of the vector;
Rendering a three-dimensional vector into a two-dimensional image by performing rendering using vector information of a depth image extracted from the extracted surface normal vector,
And extracting a feature descriptor from the 2D image by applying a Scale Invariant Feature Transform (SIFT) algorithm,
The rendering,
A polygon mesh connecting three vertices adjacent to each other in a counterclockwise direction with respect to one vertex of the depth image is rendered and the direction of a normal vector of the polygon mesh generated through the vertex is defined as a surface normal of each of the vertices Wherein the feature points are designated by an average value of the vectors.

The method according to claim 1,
Converting the RGB image obtained in the mode different from the 3D image acquisition mode into a gray scale image and applying a SIFT algorithm to the converted gray scale image to calculate a feature descriptor A method for extracting feature points of RGB-D images and generating feature descriptors.

The method according to claim 1,
The depth image function D (x) is expressed by the following equation, and an arbitrary offset dx with respect to the pixel position x of the depth image with respect to a predetermined pixel position x

, And the gradient

And extracting feature points of the RGB-D image from the least-squares method.

The method according to claim 1,
The three vertexes adjacent to the predetermined vertex may be expressed by the following equations,

D image feature point generation method and feature descriptor generation method according to the present invention is characterized by representing a vector from a principat point of a depth sensor camera through a pixel position x to a three-dimensional point X.

5. The method of claim 4,

Quot;
Wherein the feature point is calculated using an internal parameter K of the depth sensor camera using the following equation.

2. The method of claim 1, wherein the extracting of the normal vector comprises:
Extracting a normalized normal vector from a cross-product operation of two vectors starting from a predetermined vertex using the cleavage point, and performing an RGB-D image feature point extraction and feature descriptor generation method .

delete

A gradient generating unit for generating a depth gradient based on a pixel-by-pixel position through a pixel-by-pixel depth image function relationship of the three-dimensional image under the control of a control unit for controlling the overall operation of the feature- ,
And acquiring three-dimensional images and RGB images including depth information from a photographing unit through mode switching, and acquiring three-dimensional images and RGB images that are adjacent to each other based on a predetermined vertex of the depth image based on the depth gradient generated from the gradient generating unit. A control unit for calculating a predetermined vector using the computed cleavage point and extracting feature points and feature descriptors of the depth image from the image generated through conversion to a surface normal vector,
A surface normal vector extracting unit for extracting a normalized surface normal vector from a cross-product calculation of the vector calculated by the controller,
And a rendering unit for rendering the three-dimensional vector into a two-dimensional image by rendering using the vector information of the depth image extracted from the surface normal vector formed by the surface normal vector under the control of the control unit,
The rendering unit may include:
A polygon mesh connecting three vertices adjacent in a counterclockwise direction with respect to a vertex of the depth image is rendered, and the direction of a normal vector of the polygon mesh generated through the rendering is a surface normal of each of the vertices Wherein the RGB-D image feature point extracting and feature descriptor generating unit is configured to generate an RGB-D image feature point.

9. The apparatus according to claim 8,
And extracting a feature descriptor from the two-dimensional image by applying a Scale Invariant Feature Transform (SIFT) algorithm.

9. The apparatus according to claim 8,
Wherein the feature descriptor is calculated by converting a gray scale image of an RGB image obtained in a mode different from the three-dimensional image acquisition mode and applying a SIFT algorithm to the converted gray scale image, Image feature point extraction and feature descriptor generation device.

The apparatus of claim 8, wherein the surface normal vector extractor comprises:
Wherein the normalized normal normal vector is extracted from a cross-product operation of two vectors starting from a predetermined vertex using the cleavage point, and the RGB-D image feature point extraction and feature descriptor generation apparatus.

delete