KR101511146B1

KR101511146B1 - Smart 3d gesture recognition apparatus and method

Info

Publication number: KR101511146B1
Application number: KR1020140096231A
Authority: KR
Inventors: 이상윤; 최재성; 김광택; 김정현
Original assignee: 연세대학교 산학협력단
Priority date: 2014-07-29
Filing date: 2014-07-29
Publication date: 2015-04-17

Abstract

The present invention relates to a smart three-dimensional gesture recognition device and a method thereof. The invention consists of the followings: a video acquisition system to acquire video data made up of multiple continuous frames; a hand tracking system to acknowledge and track the location of a user′s hand using both color-based and depth-based tracking of video; a gesture decision system to analyze the location of the hand identified as a result of the hand tracking system to decide whether the user makes an intentional hand gesture; a characteristics extraction system to take out the characteristics of hand gesture based on settings; a user profile storage system where multiple templates of hand gestures are stored; a recognition classification system to recognize the patterns of a hand gesture based on the extracted characteristics from the extraction system, produce user′s behavior profile based on the patterns, recognize the gesture by searching for a gesture template corresponding to a recognized hand gesture out of the multiple templates stored in the user profile storage system and by applying the recognized pattern to the gesture template found as a result; and an instruction conversion system to convert the recognized gesture into an instruction corresponding thereto.

Description

[0001] SMART 3D GESTURE RECOGNITION APPARATUS AND METHOD [0002]

본 발명은 제스처 인식 장치 및 방법에 관한 것으로, 특히 비착용형 인지 기반 스마트 3차원 제스처 인식 장치 및 방법에 관한 것이다.The present invention relates to a gesture recognition apparatus and method, and more particularly, to a non-wearable recognition-based smart three-dimensional gesture recognition apparatus and method.

제스처 인식 장치는 주로 사용자의 손 또는 손가락의 움직임, 즉 제스처를 감지 및 인식하고, 인식된 제스처에 대응하는 명령을 생성하는 장치로서 사용자의 편의성을 극대화할 수 있다는 장점으로 인해 적용 분야가 점차로 확대 되어가고 있다.The gesture recognition device is mainly a device for detecting and recognizing the movement of the user's hand or finger, that is, a gesture, and generating a command corresponding to the recognized gesture, and the application field is gradually enlarged It is going.

제스처 인식 장치는 사용자의 제스처를 인식하는 방식에 따라 크게 착용형과 비착용형으로 구분된다. 착용형 제스처 인식 장치는 사용자가 제스처 인식 장치를 착용한 상태에서 동작을 수행하고, 사용자의 동작을 가속도 센서나, 지자기센서, 중력 센서, 자이로 센서 등의 다수의 센서를 이용하여 인지한다. 그에 비해 비착용형 제스처 인식 장치는 대부분 카메라와 같은 영상 획득 수단을 이용하여 사용자의 동작을 감지하고 사용자 명령을 인식하는 영상 기반 제스처 인식 장치이다.The gesture recognition device is largely classified into a wearable type and a non-wearable type according to a method of recognizing a user's gesture. The wearable gesture recognition apparatus performs an operation in a state in which the user wears the gesture recognition apparatus and recognizes the user's operation by using a plurality of sensors such as an acceleration sensor, a geomagnetism sensor, a gravity sensor, and a gyro sensor. On the other hand, the non-wearing gesture recognition apparatus is an image-based gesture recognition apparatus which detects a user's operation and recognizes a user's command by using an image acquisition means such as a camera.

비착용형 제스처 인식 장치는 사용자가 제스처 인식 장치를 착용하지 않아도 된다는 편리함으로 인해 착용형 제스처 인식 장치에 비해, 적용 가능한 분야가 매우 다양하지만 사용자들마다의 행동 특성에 따른 차이를 제스처 인식에 반영하지 못하여 인식률이 사용자의 행동 특성에 의존적이라는 문제가 있다. 또한 기존의 비착용형 제스처 인식 장치는 사용자의 신체에 직접 접촉하지 않는 구조이므로, 제스처 인식에 대해 2차원의 시각적 피드백만을 제공하도록 구성됨으로써, 사용자에게 불편함과 피로감을 준다는 한계가 있다.Unlike the wearable gesture recognition device, the non-wearable gesture recognition device has a variety of applicable fields because of convenience that the user does not need to wear the gesture recognition device. However, the gesture recognition device does not reflect the difference depending on the behavior characteristics of each user There is a problem that the recognition rate depends on the behavior characteristics of the user. In addition, since the conventional non-wearable gesture recognition device does not directly contact the user's body, it is configured to provide only two-dimensional visual feedback on the gesture recognition, which limits the user's inconvenience and fatigue.

한국 공개 특허 제10-2012-0029738호 "사용자 동적 기관 제스처 인식 방법 및 인터페이스와, 이를 사용하는 전기 사용 장치"(2012.03.27 공개)에는 촬상 소자를 통해 입력되는 목표 영상에서 손 영역 이외의 피부색 배경이 존재하거나, 조명의 변화나 노이즈가 발생한 경우에도 용이하게 손 제스처를 인식할 수 있도록 손 영역을 검출하는 기술이 개시되어 있다. 그러나 상기한 기술에서도 사용자 개개인의 행동 특성에 대한 고려가 되어 있지 않아 제스처 인식의 정확도를 향상시키기에는 한계가 있다.Korean Patent Laid-Open No. 10-2012-0029738 entitled " Method and interface for recognizing a user dynamic engine gesture, and an electric device using the same "(published on Mar. 23, 2012) A hand region is detected so that a hand gesture can be easily recognized even when there is a change in illumination or a noise occurs. However, the above-mentioned technique does not take into consideration the behavior characteristics of each user, and thus there is a limit to improve the accuracy of the gesture recognition.

본 발명의 목적은 사용자 개개인의 행동 특성으로 고려하여 제스처 인식의 정확도를 향상 시킬 수 있는 스마트 3차원 제스처 인식 장치를 제공하는데 있다.An object of the present invention is to provide a smart three-dimensional gesture recognition device capable of improving the accuracy of gesture recognition in consideration of the behavior characteristics of individual users.

본 발명의 다른 목적은 상기 목적을 달성하기 위한 스마트 3차원 제스처 인식 방법을 제공하는데 있다.Another object of the present invention is to provide a smart three-dimensional gesture recognition method for achieving the above object.

상기 목적을 달성하기 위한 본 발명의 일 예에 따른 제스처 인식 장치는 연속하는 복수개의 프레임으로 구성되는 영상을 획득하는 영상 획득부; 상기 영상에서 컬러 기반 방식 및 깊이 기반 방식을 동시에 이용하여 사용자의 지정된 손의 위치를 감지 및 추적하는 손 추적부; 상기 손 추적부에서 추적한 상기 손의 위치 정보를 분석하고, 분석된 상기 손의 위치 변화를 이용하여 상기 사용자가 제스처를 의도했는지 여부를 판별하는 제스처 판단부; 상기 손의 위치 변화의 특징을 기설정된 방식으로 추출하는 특징 추출부; 복수개의 제스처 템플릿이 저장되는 사용자 프로파일 저장부; 추출된 상기 특징으로부터 상기 제스처의 패턴을 인식하고, 인식된 상기 패턴을 사용자 행동 프로파일로 생성하며, 상기 사용자 프로파일 저장부에 저장된 상기 복수개의 제스처 템플릿 중 상기 사용자 행동 프로파일에 대응하는 제스처 템플릿을 검색하여, 인식된 상기 패턴을 검색된 제스처 탬플릿에 적용함으로써 상기 제스처를 인식하는 인식 분류부; 및 인식된 제스처를 대응하는 사용자 명령으로 변환하는 명령 변환부; 를 포함한다.According to an aspect of the present invention, there is provided an apparatus for recognizing a gesture, comprising: an image acquiring unit acquiring an image composed of a plurality of consecutive frames; A hand tracking unit for detecting and tracking a position of a user's designated hand by simultaneously using a color-based method and a depth-based method in the image; A gesture determining unit for analyzing the position information of the hand tracked by the hand tracking unit and determining whether the user intends the gesture using the analyzed positional change of the hand; A feature extraction unit for extracting a feature of the hand position change in a preset manner; A user profile storage unit storing a plurality of gesture templates; Recognizing a pattern of the gesture from the extracted features, generating the recognized pattern as a user behavior profile, searching a gesture template corresponding to the user behavior profile among the plurality of gesture templates stored in the user profile storage unit A recognition classifier recognizing the gesture by applying the recognized pattern to a searched gesture template; And a command conversion unit for converting the recognized gesture into a corresponding user command; .

상기 손 추적부는 상기 컬러 기반 방식으로 Lab 색 공간의 컬러 벡터에 대해 국부 이진 패턴(Local binary pattern : LBP)을 이용하여 손을 검출하고, 상기 깊이 기반 방식으로 기계 학습에 의해 감지되는 손을 조건부 확률을 이용하여 검출하며, 검출된 상기 사용자의 손을 CAMSHIFT 알고리즘에 따라 추적하는 것을 특징으로 한다.Wherein the hand tracer detects a hand using a local binary pattern (LBP) for a color vector of an Lab color space in the color-based manner, And tracks the detected hand of the user according to the CAMSHIFT algorithm.

상기 제스처 판단부는 추적되는 상기 손의 위치 변화에 대해 모션 그래디언트(motion gradient)를 계산하고, 계산된 모션 그래디언트가 기설정된 상한값 이상이거나, 기설정된 하한값 이하이면, 상기 제스처가 상기 사용자에 의해 의도된 것으로 판별하는 것을 특징으로 한다.Wherein the gesture determination unit calculates a motion gradient for a positional change of the hand being tracked and if the calculated motion gradient is equal to or greater than a predetermined upper limit value or equal to or lower than a predetermined lower limit value, .

상기 특징 추출부는 상기 특징으로 상기 손의 위치 변화의 속도를 추출하는 것을 특징으로 한다.And the feature extracting unit extracts the speed of the hand position change with the feature.

상기 인식 분류부는 동적 시간 교정법(Dynamic Time Warping : DTW)을 이용하여 상기 제스처의 패턴을 인식하는 것을 특징으로 한다.The recognition and classification unit recognizes the pattern of the gesture using dynamic time warping (DTW).

상기 인식 분류부는 상기 사용자 행동 프로파일에 대응하는 제스처 템플릿이 검색되지 않으면, 칼만 필터로 상기 사용자 행동 프로파일을 필터링하여 새로운 제스처 템플릿을 생성하고, 생성된 상기 제스처 템플릿을 사용자 프로파일 저장부에 저장하는 것을 특징으로 한다.If the gesture template corresponding to the user behavior profile is not found, the recognition and classification unit may filter the user behavior profile with a Kalman filter to generate a new gesture template, and store the generated gesture template in the user profile storage unit .

상기 제스처 인식 장치는 상기 손 검출부에서 상기 사용자의 손이 검출되지 않거나, 상기 제스처 판단부에서 상기 제스처가 상기 사용자에 의해 의도되지 않은 것으로 판단되는 경우 및 상기 인식 분류부에서 상기 제스처를 인식한 경우 중 적어도 하나가 발생하면, 사용자가 시각, 청각 및 촉각 중 적어도 하나로 인지할 수 있도록 피드백을 발생하는 피드백 제어부; 를 더 포함하는 것을 특징으로 한다.Wherein the gesture recognition apparatus determines that the user's hand is not detected in the hand detection unit or that the gesture is not intended by the user in the gesture determination unit and when the recognition classification unit recognizes the gesture A feedback control unit for generating feedback so that the user can recognize at least one of visual, auditory, and tactile senses when at least one occurs; And further comprising:

상기 다른 목적을 달성하기 위한 본 발명의 일 예에 따른 제스처 인식 방법은 영상 획득부, 손 추적부, 제스처 판단부, 특징 추출부, 사용자 프로파일 저장부, 인식 분류부 및 명령 변환부를 포함하는 제스처 인식 장치의 제스처 인식 방법에 있어서, 상기 영상 획득부가 연속하는 복수개의 프레임으로 구성되는 영상을 획득하는 단계; 상기 손 추적부가 상기 영상에서 컬러 기반 방식 및 깊이 기반 방식을 동시에 이용하여 사용자의 지정된 손의 위치를 감지 및 추적하는 단계; 상기 제스처 판단부가 추적된 상기 손의 위치 정보를 분석하고, 분석된 상기 손의 위치 변화를 이용하여 상기 사용자가 제스처를 의도했는지 여부를 판별하는 단계; 상기 특징 추출부가 상기 손의 위치 변화의 특징을 기설정된 방식으로 추출하는 단계; 상기 인식 분류부가 추출된 상기 특징으로부터 상기 제스처의 패턴을 인식하는 단계; 상기 인식 분류부가 하고, 인식된 상기 패턴을 사용자 행동 프로파일로 생성하고, 상기 사용자 프로파일 저장부에 저장된 상기 복수개의 제스처 템플릿 중 상기 사용자 행동 프로파일에 대응하는 제스처 템플릿을 검색하여, 인식된 상기 패턴을 검색된 제스처 탬플릿에 적용함으로써 상기 제스처를 인식하는 단계; 및 상기 명령 변환부가 인식된 제스처를 대응하는 사용자 명령으로 변환하는 단계; 를 포함한다.According to another aspect of the present invention, there is provided a gesture recognition method including an image acquisition unit, a hand tracking unit, a gesture determination unit, a feature extraction unit, a user profile storage unit, a recognition classification unit, A gesture recognition method of a device, comprising: acquiring an image composed of a plurality of consecutive frames; Detecting and tracking a position of a user's designated hand by simultaneously using a color-based method and a depth-based method in the image; Analyzing position information of the hand in which the gesture judging unit is traced, and determining whether the user intends the gesture using the analyzed hand position change; Extracting a feature of the hand position change in a predetermined manner; Recognizing a pattern of the gesture from the feature extracted by the recognition and classification unit; Wherein the recognizing and classifying unit generates the recognized pattern as a user behavior profile, searches a gesture template corresponding to the user behavior profile among the plurality of gesture templates stored in the user profile storage unit, Recognizing the gesture by applying the gesture template; And converting the recognized gesture into a corresponding user command; .

따라서, 본 발명의 스마트 3차원 제스처 인식 장치 및 방법은 사용자의 제스처 인식 시에 사용자의 제스처 의도를 분석하여 제스처 여부를 판별하고, 사용자별 동작 특성을 반영하여 제스처를 인식함으로써, 제스처 인식 정확도를 극대화할 수 있다. 그러므로, 자연스러운 실감 인터랙션을 필요로 하는 실감 미디어, 게임, 증강 현실, 텔레프레전스 등의 다양한 응용 분야에 광범위하게 활용될 수 있다. 뿐만 아니라, 3차원 영상 콘텐츠와의 인터랙션에도 활용될 수 있어 3차원 영상 디스플레이 산업 발전에 기여할 수 있다. 또한 다중 감각 피드백을 제공하여 제스처 인식 상태 이상 등을 사용자가 용이하게 인지할 수 있도록 함으로써, 제스처 인식 장치의 사용 편의성을 높인다.Accordingly, the smart three-dimensional gesture recognition apparatus and method according to the present invention maximizes gesture recognition accuracy by determining whether a gesture is a gesture by analyzing a gesture intention of a user at the time of gesture recognition of a user, can do. Therefore, it can be widely applied to various applications such as realistic media, games, augmented reality, and telepresence that require natural sensation interaction. In addition, it can be used for interaction with 3D image contents, which can contribute to the development of 3D image display industry. In addition, by providing multiple sensory feedback, the user can easily recognize the gesture recognition state abnormality, etc., thereby improving the usability of the gesture recognition device.

도1 은 본 발명의 일 실시예에 따른 스마트 3차원 제스처 인식 장치의 구성을 나타낸다.
도2 는 깊이 정보를 특징으로 사용하여 사용자의 손을 검출하는 적용 예를 나타낸다.
도3 은 시변성을 고려하지 않은 일반 매칭과 시변성을 고려한 DTW 매칭과의 차이를 나타내는 도면이다.
도4 는 인식 분류부가 시변성을 고려한 DTW 매칭 기법을 이용하여 사용자 제스처 패턴을 인식하는 일실시예를 나타낸다.
도5 는 사용자 프로파일을 분석한 사용자 적응형 템플릿과 사용자의 특성을 고려하지 않은 템플릿의 비교한 예를 나타낸다.
도6 은 피드백 방식에 따른 제스처 인식률의 변화를 실험한 결과를 나타내는 그래프이다.
도7 은 본 발명의 일 실시예에 따른 스마트 3차원 제스처 인식 방법을 나타낸다.FIG. 1 shows a configuration of a smart three-dimensional gesture recognition apparatus according to an embodiment of the present invention.
2 shows an application example in which the depth information is used as a feature to detect a user's hand.
FIG. 3 is a diagram showing a difference between DTW matching in consideration of time matching and general matching in consideration of time variability.
FIG. 4 shows an embodiment of recognizing a user gesture pattern using a DTW matching technique considering a time-variancy of a recognition classification part.
FIG. 5 shows an example of a comparison between a user adaptive template analyzing a user profile and a template not considering characteristics of a user.
FIG. 6 is a graph showing a result of experiment of changing the recognition rate of the gesture according to the feedback method.
7 illustrates a smart three-dimensional gesture recognition method according to an embodiment of the present invention.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다. In order to fully understand the present invention, operational advantages of the present invention, and objects achieved by the practice of the present invention, reference should be made to the accompanying drawings and the accompanying drawings which illustrate preferred embodiments of the present invention.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로서, 본 발명을 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다. Hereinafter, the present invention will be described in detail with reference to the preferred embodiments of the present invention with reference to the accompanying drawings. However, the present invention can be implemented in various different forms, and is not limited to the embodiments described. In order to clearly describe the present invention, parts that are not related to the description are omitted, and the same reference numerals in the drawings denote the same members.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈", "블록" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. Throughout the specification, when an element is referred to as "including" an element, it does not exclude other elements unless specifically stated to the contrary. The terms "part", "unit", "module", "block", and the like described in the specification mean units for processing at least one function or operation, And a combination of software.

도1 은 본 발명의 일 실시예에 따른 스마트 3차원 제스처 인식 장치의 구성을 나타낸다.FIG. 1 shows a configuration of a smart three-dimensional gesture recognition apparatus according to an embodiment of the present invention.

도1 을 참조하면, 본 발명의 스마트 3차원 제스처 인식 장치(100)는 영상 획득부(110), 손 추적부(120), 제스처 판단부(130), 피드백 제어부(140), 특징 추출부(150), 인식 분류부(160), 사용자 프로파일 저장부(170) 및 명령 변환부(180)를 구비한다.Referring to FIG. 1, the smart three-dimensional gesture recognition apparatus 100 of the present invention includes an image acquisition unit 110, a hand tracking unit 120, a gesture determination unit 130, a feedback control unit 140, 150, a recognition classifying unit 160, a user profile storing unit 170, and an instruction converting unit 180.

우선 영상 획득부(110)는 연속하는 복수개의 프레임으로 구성되는 영상을 획득한다. 본 발명에서 영상 획득부(110)는 기존의 컬러 영상뿐만 아니라 깊이 영상을 병용하여 획득하도록 구성된다.The priority image acquiring unit 110 acquires an image composed of a plurality of consecutive frames. In the present invention, the image acquiring unit 110 is configured to acquire not only existing color images but also depth images.

비접촉식 제스처 인식 장치의 대부분은 영상 기반 제스처 인식 기법을 이용하며, 영상 기반 제스처 인식 기법은 크게 컬러 기반 인식 기법과 깊이 기반 인식 기법으로 구분된다.Most of the non-contact gesture recognition devices use image-based gesture recognition techniques. Image-based gesture recognition techniques are classified into color-based recognition methods and depth-based recognition methods.

컬러 기반 인식 기법은 획득된 영상을 RGB, HSV, HIS, YUV, YIQ, YcbCr과 같은 컬러 공간으로 변형하여 손을 찾고 인식하는 방식이다. 컬러 기반 인식 기법은 일반적으로 사용자의 피부색을 검출하는 방법이 이용되지만, 영상내의 주변 조명 변화에 취약하여, 제스처 인식률이 낮다는 단점이 있다.The color-based recognition method transforms the acquired image into a color space such as RGB, HSV, HIS, YUV, YIQ, and YcbCr to find and recognize the hand. The color-based recognition method generally uses a method of detecting a skin color of a user, but it is vulnerable to a change in ambient illumination in an image, and thus has a disadvantage of low gesture recognition rate.

반면 깊이 기반 인식 기법은 영상 내의 주변 조명 변화에 강인하다는 장점이 있으나, 컬러 기반 영상에 비해 획득할 수 있는 영상이 단순하다는 단점이 있다. 또한 깊이 기반 인식 기법을 이용하기 위한 깊이 영상을 획득할 수 있는 센서의 가격이 비싸고 해상도가 낮다는 문제로 인해 기존에는 활용도가 높지 않았다. 그러나 최근 물체 인식을 위해 광원으로부터 시작되는 빛에 고유한 패턴을 부가하는 구조형 광(structured light)이 개발되어 낮은 가격으로 컬러 영상과 깊이 영상을 실시간으로 획득할 수 있는 센서가 개발됨에 따라 깊이 영상을 이용한 제스처 인식 방법에 대한 연구가 활발해지고 있다.On the other hand, the depth-based recognition method has a merit that it is robust against ambient illumination change in the image, but it has a disadvantage that the image that can be acquired is simple compared with the color-based image. Also, since the sensor that can acquire the depth image to use the depth - based recognition technique is expensive and the resolution is low, the utilization rate is not high. Recently, structured light, which adds a unique pattern to light originating from a light source, has been developed for object recognition, and a sensor capable of acquiring color image and depth image in real time at a low price has been developed, There have been a lot of studies on gesture recognition methods.

이에 영상 획득부(110)는 구조형 광을 이용하여 컬러 영상과 깊이 영상을 병용하여 획득하고, 획득된 영상을 손 추적부(120)로 전송한다. 이때 컬러 영상은 RGB 영상인 것으로 가정하여 설명하지만, 다른 컬러 영상 기반 영상이어도 무방하다.The image acquisition unit 110 acquires the color image and the depth image together using the structured light, and transmits the acquired image to the hand tracking unit 120. [ At this time, it is assumed that the color image is an RGB image, but it may be another color image based image.

손 추적부(120)는 영상 획득부(110)에서 전송된 영상에서 사용자의 손이나 손가락을 검출하고 추척한다. 손 추적부(120)는 공개된 다양한 방식으로 손 또는 손가락을 검출할 수 있으나, 여기서는 일예로 베이시안 모델(Bayesian model)을 이용한 피부 색 검출 기법과 깊이 영상 제약 방식을 함께 사용하여 손 또는 손가락을 검출한다.The hand tracking unit 120 detects and traces the user's hand or finger from the image transmitted from the image obtaining unit 110. [ The hand tracking unit 120 may detect a hand or a finger in various ways. For example, a skin color detection method using a Bayesian model and a depth image restricting method may be used to detect a hand or a finger .

컬러 공간의 컬러 벡터(c)에 대해, 피부일 경우와 피부가 아닐 경우 각각의 조건부 확률 함수 클래스가 P(c|skin), P(c|nonskin)라고 한다면, 컬러 벡터(c)는 수학식 1과 같이 분류될 수 있다.If the conditional probability function class is P (c | skin) and P (c | nonskin), respectively, for the color vector c of the color space, 1 < / RTI >

(여기서, c는 컬러 공간의 컬러 벡터이고, P(c|skin) 및 P(c|nonskin)는 피부일 경우와 피부가 아닐 경우 각각의 조건부 확률 함수 클래스이며, θ는 임계값이다.)Where c is the color vector of the color space and P (c | skin) and P (c | nonskin) are the conditional probability function classes for skin and skin, respectively, and θ is a threshold value.

그리고 수학식 1에서 임계값(θ)는 수학식 2에 따라 획득할 수 있다.In Equation (1), the threshold value (?) Can be obtained according to Equation (2).

(여기서, P(skin)과 P(nonskin)은 각각 피부인 경우와 피부가 아닌 경우의 사전 확률이고, λ_fd 와 λ_fr는 각각 오검출(false detection) 및 오거부(false detection) 비용(cost)를 의미한다.)Where f (skin) and p (nonskin) are prior probabilities for skin and non-skin, respectively, and λ _fd and λ _fr are the false detection and false detection costs, ).

그러나 상기한 피부 색 검출 기법을 이용한 손 검출 방식은 얼굴이나 다리 또는 사람의 피부 색상과 비슷한 색상이 영상에 존재하는 경우, 검출해야 하는 손이 노이즈로 취급될 수 있다는 문제가 있다. 이에 본 발명에서는 사용자의 손을 강건하게 검출하기 위해 세미 나이브 분류기(semi naㅿve classifier)를 이용한다.However, the above-described hand detection method using the skin color detection method has a problem in that, when a color similar to the skin color of the face, leg, or human exists in the image, the hand to be detected can be treated as noise. Therefore, in the present invention, a semi-naive classifier is used to robustly detect a user's hand.

검출할 손을 구분하는 특징으로 F라고 하고, 사용자의 손을 C_hand 라고 정의할 때, 영상에서 특징(F)가 추출되면, 그것이 사용자의 손일 확률은 조건부 확률 P(C_hand| F)와 같이 나타낼 수 있다. 그리고 사용자의 손을 추적하기 위해 사용할 특징의 개수를 m개라고 할 때, 특징은 F_{i = 1, 2, … m}로 표현될 수 있으며, 이 때, 세미 나이브 분류기를 사용하기 위해 m개의 서로 다른 특징들은 독립적이어야 한다.If the feature F is extracted from the image when the user's _hand is defined as C _hand and the user's hand probability is defined as the conditional probability P (C _hand | F) . If the number of features to be used to track the user's hand is m, then the feature is F _{i = 1, 2, ... m} , where m different characteristics must be independent in order to use the semi-Naive classifier.

사용자의 손을 검출하기 위한 특징들은 피부색 벡터(skin color vector), 움직임 벡터(motion vector), 텍스처 정보(texture information), 위치 벡터(location vector) 등이 사용될 수 있다. 그러나 본 발명은 자연스러운 사용자 인터페이스(natural user interface : NUI)를 추구하며, 사용자의 손을 강인하게 검출할 수 있어야 하므로, RGB 영상과 깊이 영상을 동시에 획득할 수 있는 RGB-D 카메라를 통해 획득되는 Lab 색 공간에서의 컬러 벡터와 깊이 정보, 그리고 반경(radius)가 각각 다른 국부 이진 패턴(Local binary pattern : 이하 LBP)의 텍스처 정보를 이용한 특징을 쓴다. 컬러 벡터 는 RGB, HSV, YCbCr, LUV등의 여러 특징도 사용할 수 있으나, 본 발명에서는 손의 강인한 특징을 추출하기 위해, Lab 색 공간의 컬러 벡터를 사용한다. LBP는 조명에 강인하고 빠르며 강력한 텍스처 기술자(texture descriptor)로서 이미지 i의 특정 픽셀(x_c, y_c)에 해당하는 채도값(intensity value)인 g_c는 LBP로 수학식 3과 같이 표현될 수 있다.The features for detecting the user's hand can be skin color vectors, motion vectors, texture information, location vectors, and the like. However, since the present invention seeks a natural user interface (NUI) and robust detection of the user's hand, it is necessary to use the RGB It uses texture information of local binary pattern (LBP) with different color vectors, depth information, and radius in color space. Various color features such as RGB, HSV, YCbCr, and LUV can be used. However, in the present invention, a color vector of a Lab color space is used to extract robust features of a hand. LBP is a robust, fast and robust texture descriptor that can be expressed as LBP as an intensity value g _c corresponding to a particular pixel (x _c , y _c ) have.

(여기서, P는 중심픽셀을 기준으로 주변의 샘플링 포인트의 개수를 나타내며, R은 주변픽셀과 중심픽셀간의 거리를 나타낸다.)(Where P denotes the number of surrounding sampling points with respect to the center pixel, and R denotes the distance between the surrounding pixel and the center pixel).

텍스처 특징을 검출하기 위해서는 센서스 변환(census transform), 수정 센서스 변환(modified census transform)등의 다른 텍스처 기술자를 사용할 수 있지만, 본 발명에서는 가장 일반적이고 대중화된 LBP를 이용하여도 사용자의 손을 추적할 수 있음을 증명하기 위해, LBP를 사용한다.Other texture descriptors, such as census transforms and modified census transforms, can be used to detect texture features, but the present invention allows the user to track the user's hand using the most common and popularized LBPs. To prove that it is possible, LBP is used.

과정을 통해서도 사용자의 손을 추적할 수 있지만, RGB-D camera를 통해 얻어지는 깊이 정보를 하나의 특징으로 사용하여 사용자의 손을 더욱 강인하게 추적할 수 있다.Although the user's hand can be tracked through the process, the depth information obtained through the RGB-D camera can be used as a feature to trace the user's hand more robustly.

분류기를 사용하기 위해서는 먼저 기계학습을 시켜야 하며, 사용자의 손이 찍힌 영상에서 손의 위치를 표시하여, P(F_{i = 1, 2, … m}|C_hand)와 P(F_{i = 1, 2, … m}|C_nonhand)를 각각 구한다. 학습 과정을 마친 후 사용자의 손의 후보군은 수학식 4로 검출 될 수 있다. In order to use the classifiers and first need to machine learning, and displays the location of the hand in the image, the user's hand _{taken, P (F i = 1,} 2, ... m | C hand) and P (F _{i = 1, 2 , ... m} | C _nonhand ), respectively. After finishing the learning process, the candidate group of the user's hand can be detected by Equation (4).

본 발명에서는 손과 손이 아닌 2가지의 경우만 생각할 수 있기 때문에, i는 2가 되며, F는 3가지 이상의 독립적인 특징의 조합으로 이루어 진다. 여기서 검출된 특징이 사용자의 손일 확률은 수학식 5에 의해 구할 수 있다.In the present invention, i is 2, and F is a combination of three or more independent features because it can be considered only two cases, not a hand and a hand. Here, the probability that the detected feature is the hand loss of the user can be obtained by Equation (5).

사용자의 손이 아닐 확률 역시 같은 수학식 5와 같은 방식으로 구할 수 있으며, 두 확률값 중 큰 값을 선택하여 영상에서 사용자의 손이 위치할 확률을 구한다. 이런 방식으로 사용자의 손의 후보군들을 획득할 수 있다.The likelihood of not being the user's hand can also be obtained in the same manner as in Equation (5), and a probability that the user's hand is located in the image is obtained by selecting a larger value among the two probability values. In this way, candidates of the user's hand can be obtained.

도2 는 깊이 정보를 특징으로 사용하여 사용자의 손을 검출하는 적용 예를 나타낸다.2 shows an application example in which the depth information is used as a feature to detect a user's hand.

도2 에 도시된 바와 같이 깊이 정보를 특징으로 사용하여 사용자의 손을 검출하는 방식은 다수의 영상에서 손의 위치를 미리 표시하여 인식하도록 하는 기계 학습법을 이용하여 손 검출부(120)를 학습 시킨 후, 입력되는 영상에 대해 손 검출부(120)가 조건부 확률을 이용하여 손을 검출하도록 한다.As shown in FIG. 2, in a method of detecting a user's hand using depth information as a feature, a hand detecting unit 120 is learned using a machine learning method for recognizing and displaying the position of a hand in a plurality of images , The hand detection unit 120 detects the hand using the conditional probability with respect to the input image.

또한 손 추적부(120)는 사용자에게 피드백 제공 시에 빠르게 피드백을 제공할 수 있도록 연산량의 효율화 및 실시간 동작을 수행할 수 있는 손 및 손가락 추적 알고리즘을 사용한다. 추적 알고리즘 또한 공개된 다양한 알고리즘이 사용될 수 있으나, 여기서는 일예로 CAMSHIFT 알고리즘을 사용한다.In addition, the hand tracking unit 120 uses a hand and finger tracking algorithm to efficiently perform computation and real-time operation so as to provide feedback quickly when providing feedback to a user. Various algorithms can be used as the tracking algorithm, but here we use the CAMSHIFT algorithm as an example.

CAMSHIFT 알고리즘은 The CAMSHIFT algorithm

1. 전체 이미지에 대한 확률 분포 이미지의 관심 영역(ROI : Region of Interest)을 설정하는 단계;1. setting a region of interest (ROI) of a probability distribution image for the entire image;

2. 평균 이동 검색 윈도우(mean shift search window)의 초기 위치를 선택하는 단계; 여기서 선택되는 위치는 추적되는 타겟 분포이다.2. selecting an initial position of a mean shift search window; The selected location here is the target distribution being tracked.

3. 평균 이동 검색 창에서 중심이 되는 영역의 컬러 확률 분포를 계산하는 단계; 3. calculating a color probability distribution of a centered region in an average movement search window;

4. 확률 이미지의 중심점을 찾기 위해 평균 이동 알고리즘을 반복하여, 0번째 모멘트(분포 영역) 및 중심점 위치를 저장하는 단계; 4. Repeating the averaging algorithm to find the center point of the probability image, storing the 0th moment (distribution area) and the center point location;

5. 이후 프레임에 대해, 단계 4에서 탐색된 평균 위치를 중심으로 하고, 0 번째 모멘트의 함수를 크기로 하는 탐색 윈도우를 설정하는 단계; 5. For a subsequent frame, setting a search window centered on the mean position searched in step 4 and having a function of the 0 < th >

6. 다시 3 단계로 이동하여 반복하는 단계;6. Repeat step 3 to repeat;

로 구성된다..

상기한 CAMSHIFT 알고리즘에서 0번째, 1번째 및 2번째 모멘트는 수학식 6과 같이 계산될 수 있다.The 0th, 1st and 2nd moments in the above CAMSHIFT algorithm can be calculated as Equation (6).

(여기서 M₀₀는 0번째 모멘트이고, M₁₀, M₀₁은 1번째 모멘트이며, M₂₀, M₀₂는 2번째 모멘트이다. 그리고 P(x, y)는 x, y 위치의 픽셀의 컬러 확률 분포를 나타낸다.)(Where M ₀₀ is the 0th moment, M ₁₀ and M ₀₁ are the first moments, M ₂₀ and M ₀₂ are the second moments, and P (x, y) is the color probability distribution Lt; / RTI >

그리고 CAMSHIFT 알고리즘을 통해 매 프레임 마다 갱신되는 탐색 윈도우의 크기는 수학식 7로 계산된다.The size of the search window updated every frame through the CAMSHIFT algorithm is calculated by Equation (7).

(여기서 width 와 height는 각각 탐색 윈도우의 폭과 높이이다.)(Where width and height are the width and height of the search window, respectively).

현재 RGB 영상과 깊이 영상을 동시에 획득할 수 있는 RGB-D 카메라의 경우, 깊이 영상에 존재하는 노이즈로 인해 검출 및 추적되는 사용자의 손 또는 손가락의 위치가 정확하지 않다는 문제가 있다. 이러한 노이즈는 제스처 인식 시에 인식률의 저하로 이어질 수 있으므로, 노이즈를 제거할 필요성이 있다. 본 발명에서는 예측 필터를 추적 대상인 손에 적용함으로써 노이즈를 제거하고 제스처 인식률을 향상 시킨다. 예측 필터는 공지된 기술이므로 여기서는 상세하게 설명하지 않는다.In the case of an RGB-D camera capable of simultaneously acquiring an RGB image and a depth image, there is a problem in that the position of the user's hand or finger, which is detected and tracked due to noise existing in the depth image, is not accurate. Such noise may lead to deterioration of the recognition rate at the time of recognizing the gesture, so there is a need to remove noise. In the present invention, the prediction filter is applied to the hand to be tracked, thereby removing noise and improving the gesture recognition rate. Since the prediction filter is a known technique, it is not described in detail here.

손 추적부(120)가 영상에서 손 또는 손가락을 추적하면, 제스처 판단부(130)는 매 프레임에서 추적되는 사용자의 손 또는 손가락의 위치 정보는 모션 그래디언트(motion gradient) 기법이나 관성 모멘트 기법 등으로 분석한다. 획득된 영상에 포함된 사용자의 동작은 제스처 인식 장치가 제스처를 인식할 수 있도록 의도적으로 수행되는 동작도 있으나, 사용자의 의도하지 않은 동작이 제스처와 유사하게 나타나는 경우도 있다. 이에 제스처 판단부(130)는 사용자의 의도하지 않은 제스처를 판별하여 제외함으로써, 제스처 인식률을 높인다.If the hand tracking unit 120 tracks a hand or a finger in the image, the gesture determining unit 130 determines whether the position of the user's hand or finger tracked in each frame is a motion gradient technique or a moment of inertia technique Analyze. Although the operation of the user included in the acquired image is intentionally performed so that the gesture recognition apparatus recognizes the gesture, there are cases where the unintended operation of the user appears similar to the gesture. Accordingly, the gesture judging unit 130 discriminates and excludes unintentional gestures of the user, thereby raising the gesture recognition rate.

본 발명에서는 일예로 모션 그래디언트를 이용하여 손 또는 손가락의 위치 정보를 분석하여, 사용자의 제스처 의도를 판별한다.In the present invention, the position information of a hand or a finger is analyzed using a motion gradient to determine a gesture intention of a user.

모션 그래디언트 기법은 손 추적부(120)에 의해 매 프레임에서 추적되는 손 이나 손가락 위치의 모션 그래디언트를 수학식 8과 같이 계산한다.The motion gradient technique calculates the motion gradient of the hand or finger position tracked in each frame by the hand tracking unit 120 as shown in equation (8).

(여기서 ∇f 는 손 또는 손가락의 모션 그래디언트를 나타낸다.)(Where ∇f represents the motion gradient of the hand or finger).

제스처 판단부(130)는 t번째 프레임에 추적되는 손 또는 손가락의 그래디언트를 G_T라고 할 때, 그래디언트(G_T)가 기설정된 상한값(G_max) 이상이거나 하한값(G_min) 이하이면, 사용자가 의도한 제스처인 것으로 판별하는 반면, 그 외의 경우에는 사용자의 동작이 제스처를 의도하지 않은 것으로 판단한다. 즉 사용자가 제스처를 의식하지 않은 일반적인 동작에 의한 모션 그래디언트는 제스처를 의도하지 않은 것으로 판단한다.Gesture when the judgment unit 130 when the gradient of the hand or finger to be tracked in the t-th frame is called G _T, the gradient (G _T) is a predetermined upper limit value (G _max) or more or less than the lower limit value (G _min), the user and The gesture is judged to be an intended gesture, while in other cases, the gesture is not intended by the user. That is, the motion gradient caused by a general operation in which the user is not conscious of the gesture judges that the gesture is unintentional.

특징 추출부(150)는 제스처 판단부(130)에서 사용자의 동작이 제스처라고 판단한 경우, 손의 위치를 이용하여 제스처 인식을 위해 사용될 특징을 추출한다. 제스처 인식을 위한 특징은 손의 위치, 방향, 속도 등이 사용될 수 있으나, 이중 속도(velocity)를 이용하면, 빠르게 특징이 추출 가능하다. 수학식 9는 속도를 이용하여 특징을 추출하는 계산식이다.If the gesture determination unit 130 determines that the user's action is a gesture, the feature extraction unit 150 extracts a feature to be used for gesture recognition using the position of the hand. The features for gesture recognition can be hand position, direction, speed, etc. However, by using dual velocity, features can be extracted quickly. Equation (9) is a formula for extracting features using velocity.

(여기서 F는 t번째 프레임에서의 특징이며, P_t(X, Y, Z)는 t번째 프레임에서 검출 및 추적되는 손의 위치를 나타낸다.)(Where F is a feature in the t-th frame and P _t (X, Y, Z) is the position of the hand detected and tracked in the t-th frame).

특징 추출부(150)에 의해 제스처의 특징이 추출되면, 인식 분류부(160)가 추출된 특징으로부터 제스처의 패턴을 판별하여 제스처를 인식한다. 이때 인식 분류부(160)는 특징 추출부(150)에서 추출된 제스처의 특징을 사용자 프로파일 저장부(170)에 저장된 복수개의 제스처 템플릿과 비교하고, 가장 유사한 제스처 템플릿을 추출하여 사용자 제스처를 인식하는 사용자 적응적(user adaptive) 제스처 인식을 수행하여 제스처 인식 성능을 향상 시킬 수 있다.When the feature extraction unit 150 extracts the feature of the gesture, the recognition and classification unit 160 recognizes the gesture by recognizing the pattern of the gesture from the extracted feature. At this time, the recognition classifying unit 160 compares characteristics of the gesture extracted by the feature extracting unit 150 with a plurality of gesture templates stored in the user profile storage unit 170, extracts the most similar gesture templates, and recognizes the user gesture User-adaptive gesture recognition can be performed to improve gesture recognition performance.

손 또는 손가락의 동작을 통해 인식되는 제스처는 크게 정적 패턴과 동적 패턴으로 구분될 수 있으며, 기존의 일반적인 패턴 인식 알고리즘은 대부분 동적 패턴을 정확하게 인식하지 못하는 한계가 있다.The gestures recognized through the motion of the hand or the finger can be largely divided into a static pattern and a dynamic pattern, and the conventional general pattern recognition algorithm has a limitation that it can not correctly recognize a dynamic pattern.

그러나 동적 패턴은 시간이라는 변수가 추가된 정적 패턴의 연속인 것으로 생각할 수 있다. 이에 본 발명에서는 손 또는 손가락 위치의 연속적인 변화 패턴이 하나의 동작이 되는 동적 제스처로서 동적 패턴의 시변성을 고려한 패턴인식 알고리즘을 사용한다. 시변성을 고려한 패턴 인식 알고리즘은 여러가지 방법이 제안되었으나, 가장 많이 사용되는 방법은 은닉 마르코프 모델(Hidden Markov Models : 이하 HMM)과 동적 시간 교정법(Dynamic Time Warping : 이하 DTW)을 이용한 방법이다. 본 발명의 인식 분류부(160) 또한 다양한 패턴인식 알고리즘을 이용하여 제스처를 인식할 수 있으나, 여기서는 일예로 DTW를 사용하는 것으로 가정한다. DTW는 실시간 구현이 가능하고 정확도가 높다는 장점이 있다.However, the dynamic pattern can be thought of as a sequence of static patterns added with a variable of time. Accordingly, the present invention uses a pattern recognition algorithm that considers time-varying dynamic patterns as a dynamic gesture in which a continuous change pattern of a hand or a finger position becomes one operation. Various methods have been proposed for the pattern recognition algorithm considering time-varying, but the most widely used method is Hidden Markov Models (HMM) and Dynamic Time Warping (DTW). The recognition classifier 160 of the present invention can also recognize the gesture using various pattern recognition algorithms, but it is assumed here that DTW is used as an example. DTW has the advantages of real-time implementation and high accuracy.

도3 은 시변성을 고려하지 않은 일반 매칭과 시변성을 고려한 DTW 매칭과의 차이를 나타내는 도면이다.FIG. 3 is a diagram showing a difference between DTW matching in consideration of time matching and general matching in consideration of time variability.

도3 에서 (a)는 시변성을 고려하지 않은 유클리드 매칭을 나타내고, (b)는 시변성이 고려된 DTW 매칭을 나타낸다. 도3 의 (a)에 도시된 바와 같이 매칭된 두 신호 사이의 거리 또는 차이는 시간 축에서의 신호 시퀀스 간의 유클리드 거리로 측정할 수 있으나, 이 경우, 신호의 국부적인 압축이나 팽창 등이 고려될 수 없다. 이에 반해 (b)에 도시된 DTW 매칭을 이용하는 경우에 두 신호간의 거리 측정은 서로 다른 길이의 시퀀스 간에 최소 거리를 갖는 교정 경로를 통해 차이를 측정하기 때문에 신호의 압축, 팽창으로 인한 변화가 반영될 수 있다.In FIG. 3, (a) represents Euclidean matching that does not take time-variance into account, and (b) represents DTW matching that takes time-variance into account. The distance or difference between the two matched signals as shown in Figure 3 (a) can be measured by Euclidean distance between signal sequences in the time axis, but in this case local compression or expansion of the signal is considered I can not. On the other hand, in the case of using the DTW matching shown in (b), the distance measurement between the two signals measures the difference through a calibration path having a minimum distance between sequences of different lengths, so that the change due to the compression and expansion of the signal is reflected .

DTW는 두 개의 순차 데이터의 시간길이를 왜곡함으로써 두 패턴의 최적의 정합을 구하고, 해당 정합에서의 두 데이터 사이의 거리를 계산하는 알고리즘이다. DTW는 동적 프로그래밍(Dynamic programming)을 통해 소프트웨어적으로 구현될 수 있다. 각각 길이가 m, n인 두 개의 시계열(time series) 특징 벡터를 각각 A=a₁, a₂, … a_m, B=b₁, b₂, … b_n 라 할 때, 두 점 a_i과 b_i 사이의 거리(d(a_i, b_i))는 유클리드 거리로 계산될 수 있다. 그러나 DTW에서 교정 경로(Warping path)(W)는 A와 B사이의 매핑을 정의하는데, 경계 조건(boundary condition), 연속성(continuity) 및 단조성(monotonicity)의 세 가지 조건을 만족해야 하며, 이 조건을 만족하는 적어도 하나의 교정 경로(W) 중 교정 비용(warping cost)을 최소로 하는 경로를 탐색한다.DTW is an algorithm that computes the optimal matching of two patterns by distorting the time length of two sequential data and calculates the distance between two data in the matching. DTW can be implemented in software through dynamic programming. Two time series feature vectors of length m and n are A = a ₁ , a ₂ , ... a _m , B = b ₁ , b ₂ , ... b _n , the distance between two points a _i and b _i (d (a _i , b _i )) can be calculated as the Euclidean distance. However, in the DTW, the warping path (W) defines the mapping between A and B, which must satisfy three conditions: boundary condition, continuity, and monotonicity. A path that minimizes the warping cost among at least one calibration path W satisfying the condition is searched.

DTW에서 교정 비용을 최소로 하는 교정 경로는 수학식 10에 의해 획득될 수 있다.A calibration path that minimizes the calibration cost in the DTW can be obtained by Equation (10).

(여기서 W = w₁, w₂, … w_k이고, max(m, n) ≤ K ≤ m+n-1 이다.)(Where W = w ₁ , w ₂ , ... w _k , and max (m, n) ≤ K ≤ m + n-1.

k번째 교정 경로(w_k)를 (i, j)의 매핑이라고 하면, 유클리드 거리(d(i, j))를 사용하여 (i, j)까지의 누적 거리(D(i, j))는 수학식 11과 같이 정의된다.a k-th correction path (w _k) when said mapping of the (i, j), cumulative distances (D (i, j)) to use the Euclidean distance (d (i, j)) (i, j) is Is defined as Equation (11).

수학식 11 에 따른 누적 거리(D(i, j))는 상기한 바와 같이 동적 프로그래밍을 사용하여 용이하게 구현될 수 있다.The cumulative distance D (i, j) according to Equation (11) can be easily implemented using dynamic programming as described above.

도4 는 인식 분류부가 시변성을 고려한 DTW 매칭 기법을 이용하여 사용자 제스처 패턴을 인식하는 일실시예를 나타낸다.FIG. 4 shows an embodiment of recognizing a user gesture pattern using a DTW matching technique considering a time-variancy of a recognition classification part.

도4 에서 (a)는 두 개의 시계열 특징 벡터(Time Series A, B)들 사이에 교정 경로(wk)를 (i, j)의 매핑으로 나타낸 도면이다. 상기한 바와 같이, DTW알고리즘은 기준이 되는 특징 패턴과 실시간으로 획득되는 사용자의 손 제스처 특징의 패턴간의 유사도를 동적 프로그래밍을 이용해 계산한다.4A is a diagram showing a mapping of a correction path wk between (i, j) and two time series feature vectors (Time Series A, B). As described above, the DTW algorithm calculates the similarity between the reference feature pattern and the pattern of the user's hand gesture feature acquired in real time using dynamic programming.

유사도는 (b)와 같이 비용 매트릭스(cost matrix)의 형태로 획득할 수 있으며, 사용자 제스처를 판별하기 위해 미리 설정된 기준 제스처의 특징 벡터의 길이를 M, 추후 영상에서 획득하는 사용자의 제스처 테스트 특징벡터의 길이를 N이라고 한다면, 비용 매트릭스의 크기는 MㅧN으로 만들어 지게 된다. 이 방법을 통해 서로 다른 특징길이를 가지고 있는 특징벡터들간의 비교가 가능해 짐으로서, 비선형적인 대응관계로부터 유사도 산출이 가능해진다. The similarity can be obtained in the form of a cost matrix as shown in (b), and a length of a feature vector of a reference gesture set in advance for discriminating a user gesture is M, a gesture test feature vector Is N, the size of the cost matrix is M N. By this method, it is possible to compare feature vectors having different feature lengths, so that it is possible to calculate the similarity from a nonlinear correspondence.

사용자 프로파일 저장부(170)는 다수의 사용자들에 대한 사용자 행동 프로파일로부터 미리 생성된 복수개의 제스처 템플릿(gesture template)이 저장되고, 저장된 복수개의 제스처 템플릿 중 인식 분류부(160)에서 생성된 사용자 행동 프로파일에 대응하는 제스처 템플릿을 검색하여 인식 분류부(160)로 전송한다.The user profile storage unit 170 stores a plurality of gesture templates generated in advance from the user behavior profile for a plurality of users and stores user behavior generated by the recognition classifier 160 among the plurality of stored gesture templates The gesture template corresponding to the profile is searched and transmitted to the recognition classifier 160.

인식 분류부(160)는 사용자 프로파일 저장부(170)로부터 사용자의 행동 프로파일에 대응하는 제스처 템플릿이 검색되면, 검색된 제스처 템플릿에 따라 제스처를 인식한다. 즉 사용자의 행동 프로파일을 제스처 템플릿에 반영하여 제스처의 인식 성능을 높이는 사용자 적응적 제스처 인식을 수행한다.When the gesture template corresponding to the user's behavior profile is retrieved from the user profile storage unit 170, the recognition classifying unit 160 recognizes the gesture according to the retrieved gesture template. In other words, user adaptive gesture recognition is performed to enhance the recognition performance of the gesture by reflecting the user's behavior profile in the gesture template.

기존의 인식 분류부(160)는 인식해야 할 복수개의 제스처 각각에 대해 하나의 기준 제스처 특징 벡터만을 구비하고, 사용자의 제스처 특징 벡터를 기준 제스처 특징 벡터를 기준 제스처 특징 벡터와 비교하여 사용자 제스처를 인식하였다. 그러나 본 발명에서는 인식 분류부(160)가 사용자의 행동프로파일을 분석함으로써 획득되는 제스처 템플릿과 사용자의 제스처 특징 벡터를 비교하여 획득되는 가장 유사한 제스처 템플릿을 판별함으로써, 사용자 제스처를 인식하는 사용자 적응적 제스처 인식을 수행하여 제스처 인식 성능을 향상시킨다.The conventional recognition classifying unit 160 includes only one reference gesture feature vector for each of a plurality of gestures to be recognized and compares the user's gesture feature vector with a reference gesture feature vector with a reference gesture feature vector to recognize a user gesture Respectively. However, in the present invention, the recognition classifier 160 compares the gesture template obtained by analyzing the user's behavior profile with the gesture feature vector of the user, and determines the most similar gesture template to be obtained. Thus, the user adaptive gesture Recognition is performed to improve gesture recognition performance.

상기한 바와 같이 사용자의 행동 프로파일을 제스처 템플릿에 반영하는 사용자 적응적(user adaptive) 제스처 인식을 수행하는 이유는 사용자별로 제스처의 패턴이 서로 상이하기 때문이다. 동일한 제스처를 다수의 사용자에게 취하도록 하더라도, 각 사용자별로 제스처의 패턴은 일치하지 않는다. 일예로 다수의 사용자가 동일한 제스처를 취하더라도, 제스처를 빠르게 수행하는 사람이 있는 반면, 느리게 수행하는 사람도 있다. 또한 제스처를 수행하는 손 또는 손가락의 동선도 사용자마다 다르게 나타난다. 따라서 인식 분류부(160)가 제스처에 대한 사용자별 차이, 즉 사용자별 행동 양식을 분석하고, 분석된 행동 양식을 기설정된 복수개의 제스처 템플릿 중 대응하는 제스처 템플릿에 적용하는 사용자 적응적 제스처 인식을 수행하면, 제스처 인식률을 크게 높일 수 있다.As described above, the user adaptive gesture recognition for reflecting the user's behavior profile in the gesture template is performed because the patterns of the gestures are different from user to user. Even if a plurality of users take the same gesture, the patterns of the gestures do not coincide with each other. For example, even though a large number of users take the same gesture, some people perform gestures quickly, while others perform slowly. Also, the movement of the hand or finger performing the gesture is different for each user. Therefore, the recognition classifier 160 analyzes the user-specific gesture difference, that is, the behavior pattern for each user, and performs the user-adaptive gesture recognition in which the analyzed behavior pattern is applied to the corresponding gesture template among the plurality of preset gesture templates , The gesture recognition rate can be greatly increased.

그러나 사용자 프로파일 저장부(170)는 만일 인식 분류부(160)에서 생성된 사용자 행동 프로파일에 대응하는 제스처 템플릿을 검색되지 않으면, 생성된 사용자 프로파일을 새로운 제스처 템플릿으로써 저장할 수 있다.However, if the gesture template corresponding to the user behavior profile generated by the recognition classifier 160 is not retrieved, the user profile storage unit 170 may store the generated user profile as a new gesture template.

이때, 인식 분류부(160)는 사용자 행동 프로파일을 그대로 제스처 템플릿으로 저장하는 것이 아니라, 칼만 필터(Kalman filter) 나 입자 필터(particle filter) 등의 평가(estimation) 기법을 사용하여 분석하여 제스처 템플릿으로 저장한다.At this time, the recognition classifier 160 does not store the user behavior profile as it is as a gesture template, but analyzes it using an estimation technique such as a Kalman filter or a particle filter to generate a gesture template .

특히 칼만 필터(kalman filter)는 잡음이 포함되어 있는 선형 역학계의 상태를 추적하는 재귀 필터로서 본 발명에 적용했을 경우, 매우 효율적인 성능을 보여준다. 칼만 필터는 이산 시간 선형 동적 시스템을 기반으로 동작하며, 각 시간에서의 상태 벡터는 이전 시간의 벡터들에 대해서만 관계된다는 마르코프 연쇄를 가정하고 있다. 특정 시간 k에서의 상태 벡터를 x_k라고 정의하고, 그 시간에서의 사용자 입력을 u_k라고 정의하면, 칼만 필터는 수학식 12와 같은 관계식으로 가정될 수 있다. In particular, the Kalman filter is a recursive filter for tracking the state of a linear dynamic system including noise, and is highly efficient when applied to the present invention. The Kalman filter operates on a discrete-time linear dynamic system, assuming a Markov chain in which the state vector at each time is only related to the vectors of the previous time. If the state vector at a certain time k is defined as x _k, and defined as the user input u _k at that time, the Kalman filter can be assumed to be the same relationship as equation (12).

(여기에서 F_k는 해당 시간에서 이전 상태에 기반한 상태 전이 행렬, B_k는 사용자 입력에 의한 상태전이 행렬, 그리고 w_k는 공분산행렬 Q_k를 갖는 다변수 정규 분포(w_k ~ N(0, Q_k))에서 유추되는 잡음 변수이다.)(Here, F _k is the state transition matrix based on a previous state at that time, B _k is the state transition matrix by a user input, and w _k is a covariance matrix of a multivariate normal distribution having a Q _k (w _k ~ N (0, Q _k )).

그리고 상태 벡터 X_k와 그 벡터를 측정했을 때 실제로 얻어진 벡터 Z_k는 수학식 13과 같은 관계를 갖는다.Then, the state vector X _k and the actually obtained vector Z _k when the vector is measured have the relationship as shown in Equation (13).

(기서 H_k는 해당 시간에서 측정에 관계되는 행렬이고, v_k는 공분산행렬 R_k갖는 다변수 정규 분포(v_k ~ N(0, R_k))에서 유추되는 잡음 변수이다.)(Where H _k is a matrix related to the measurement at that time, and v _k is a noise parameter inferred from a multivariate normal distribution (v _k ~ N (0, R _k )) with a covariance matrix R _k .

칼만필터는 재귀적으로 동작한다. 즉 칼만 필터는 바로 이전 시간에 추정한 값을 통해, 현재의 값을 추정하며, 또한 바로 이전 시간 외의 측정값이나 추정값을 사용되지 않는다는 특성이 있다. 각 추정계산은 두 단계로 이루어지며, 이전 시간에 추정된 상태에 대해, 그 상태에서 사용자 입력을 가했을 때, 예상되는 상태를 계산한다. 이 단계를 예측(prediction) 단계라고 부르며, 그 다음 앞서 계산된 예측 상태와 실제로 측정된 상태를 토대로 정확한 상태를 계산하는 보정(update) 단계가 있다. 예측 단계의 계산은 연역적으로 이루어 지며, 연역적 상태 예측과 연역적 공분산 예측은 각각 수학식 14 및 15를 따른다.The Kalman filter operates recursively. That is, the Kalman filter estimates the current value through the estimated value at the immediately preceding time, and also does not use the measured value or the estimated value immediately outside the immediately preceding time. Each estimation calculation consists of two steps and calculates the expected state when the user input is applied to the state estimated at the previous time. This step is referred to as a prediction step, and there is an update step for calculating an accurate state based on the predicted state and the actually measured state, which have been calculated previously. The computation of the prediction step is a priori, and the a-priori state prediction and the a-priori covariance prediction follow equations (14) and (15), respectively.

(여기서,

는 k 시점의 측정값을 기초로한 k시점의 상태 추정값을 나타낸다.)(here,

Represents the state estimation value at time k based on the measured value at time k.)

(여기서,

는 k 시점의 측정값을 기초로한 k 시점의 상태 공분산 행렬을 나타낸다.)(here,

Represents the state covariance matrix at time k based on the measured value at time k.)

그리고 보정단계에서는 예측 단계에서 획득한 예측값과 실제 측정값 사이의 오차를 이용하여, 이전 획득한 값을 귀납적으로 수정한다. 예측 단계에서 획득한 예측값과 실제 측정값 사이의 오차(

)는 수학식 16으로 계산된다.In the correction step, the previously obtained value is corrected inductively by using the error between the predicted value obtained in the prediction step and the actual measured value. The error between the predicted value obtained in the prediction step and the actual measurement value

) Is calculated by the following equation (16).

그리고 최적의 칼만 이득(Kalman gain) K_k는 수학식 17 를 이용하여 획득할 수 있다.And the optimal Kalman gain K _k can be obtained using Equation (17).

이에 수학식 18 및 19을 이용하여 귀납적 상태 보정식과 귀납적 공분산 보정식으로 칼만 필터를 구현할 수 있다.Using the equations (18) and (19), the Kalman filter can be implemented by the inductive state correction formula and the inductive covariance correction formula.

도5 는 사용자 프로파일을 분석한 사용자 적응형 템플릿과 사용자의 특성을 고려하지 않은 템플릿의 비교한 예를 나타낸다. FIG. 5 shows an example of a comparison between a user adaptive template analyzing a user profile and a template not considering characteristics of a user.

도5 에서 (a)는 사용자 행동 프로파일을 분석한 그래프이고, (b)는 (a)의 사용자 행동 프로파일을 분석하여 생성된 제스처 템플릿과 사용자 행동 프로파일과 무관하게 일반적으로 사용되는 기존의 제스처 템플릿을 비교한 도면이다.FIG. 5A is a graph showing the analysis of the user behavior profile, FIG. 5B is a graph showing the relationship between the gesture template generated by analyzing the user behavior profile of FIG. 5A and the existing gesture template generally used, FIG.

(b)에서 붉은색 선이 사용자 행동 프로파일을 분석하여 생성된 제스처 템플릿이고, 파란색 선이 기존의 제스처 템플릿을 나타낸다. 도4 의 (b)에 도시된 바와 같이, 사용자 행동 프로파일에 대응하는 제스처 템플릿은 기존의 제스처 템플릿과 상이한 형태로 나타나며, 이에 사용자 행동 프로파일에 대응하는 제스처 템플릿을 이용하여 제스처를 인식하면 매우 높은 정확도로 제스처를 인식할 수 있게 된다.In (b), the red line is the gesture template generated by analyzing the user behavior profile, and the blue line indicates the existing gesture template. As shown in FIG. 4 (b), the gesture template corresponding to the user behavior profile appears in a different form from the existing gesture template, and if the gesture is recognized using the gesture template corresponding to the user behavior profile, To recognize the gesture.

다시 도1 을 참조하면, 명령 변환부(180)는 인식 분류부(160)에서 인식한 제스처를 대응하는 명령으로 변환하여 외부에 연결된 장치로 출력한다. 제스처 인식 장치는 기본적으로 사용자가 제스처로 입력하는 명령을 인식하기 위한 장치이다. 따라서 대부분 독립적인 장치로 사용되지 않고, 사용자 명령을 인가받아 사용자 명령에 대응하는 동작을 수행하는 장치로 인식된 제스처에 대응하는 사용자 명령을 전달하도록 구성된다. 이에 명령 변환부(180)는 인식된 사용자의 제스처를 대응하는 사용자 명령으로 변환하여 전송하는 인터페이스부로서의 역할을 수행한다. 경우에 따라서는 제스처 인식 장치가 별도로 구비되지 않고, 사용자 명령에 대응하는 동작을 수행하는 장치의 내부에 포함되어 구현될 수도 있다.Referring again to FIG. 1, the command conversion unit 180 converts the gesture recognized by the recognition and classification unit 160 into a corresponding command, and outputs the command to a device connected to the outside. The gesture recognition device is basically a device for recognizing a command input by a user as a gesture. Therefore, it is configured not to be used as an independent device but to transmit a user command corresponding to a recognized gesture as a device which receives a user command and performs an operation corresponding to the user command. The command conversion unit 180 serves as an interface unit that converts the recognized user's gesture into a corresponding user command and transmits the corresponding user command. In some cases, a gesture recognition device is not separately provided, but may be included in an apparatus that performs an operation corresponding to a user command.

피드백 제어부(140)는 손 추적부(120)가 영상에서 사용자의 손이나 손가락을 감지하지 못하는 경우나 제스처 판단부(130)가 제스처를 판단하지 못한 경우와 같은 제스처 인식 실패 시, 또는 제스처가 제대로 인식되었는 경우, 사용자에게 피드백을 제공한다. 기존의 영상 기반 제스처 인식 장치들이 단순히 차원의 시각적 피드백만을 제공하여 사용자가 항시 제스처 인식 장치에서 피드백을 제공하는 화면을 주시해야 하도록 구성된 데 반하여, 본 발명의 피드백 제어부(140)는 시각, 청각, 촉각의 다중 감각 피드백을 제공하여 사용자가 제스처 인식 결과를 즉각적으로 판단할 수 있도록 한다.The feedback control unit 140 may be configured to determine whether the hand tracking unit 120 can detect the user's hand or finger in the image or fail the gesture recognition such as when the gesture determination unit 130 can not determine the gesture, If so, provide feedback to the user. The feedback control unit 140 of the present invention is configured such that the visual image, the auditory sense, the tactile sense, and the tactile sense recognition apparatus are constructed so that the conventional image-based gesture recognition apparatuses provide only visual feedback of the dimension, So that the user can immediately judge the gesture recognition result.

이러한 다중 감각 피드백을 통해 사용자가 제스처 인식 결과를 즉각적으로 판단할 수 있도록 하는 경우, 사용자의 제스처 동작에도 영향을 주게 된다.This multi-sensory feedback allows the user to instantly determine the gesture recognition result, which also affects the gesture behavior of the user.

도6 은 피드백 방식에 따른 제스처 인식률의 변화를 실험한 결과를 나타내는 그래프이다.FIG. 6 is a graph showing a result of experiment of changing the recognition rate of the gesture according to the feedback method.

도6 에 도시된 실험은 피드백을 제공하지 않은 경우와 시각적인 피드백을 제공하는 경우, 청각적인 피드백을 제공하는 경우 및 촉각적인 피드백을 제공하는 경우 각각에 대해, 제스처 인식률의 변화를 나타내며, 도4 에 도시된 바와 같이, 여러가지 방식으로 피드백을 제공할 때, 사용자의 동작 특성에 변화가 발생하여 실제 제스처 인식률에도 의미 있는 변화가 발생함을 알 수 있다. 특히 제스처 동작에 따라 각각의 서로 다른 피드백이 미치는 영향이 상이하다는 점 또한 확인할 수 있다. 이에 본 발명에서는 제스처 인식률의 향상을 위해 시각, 청각, 촉각의 다중 감각 피드백을 제공하여 제스처 인식률을 크게 향상할 수 있도록 한다.The experiment shown in Fig. 6 shows a change in the gesture recognition rate for each of cases in which no feedback is provided, when visual feedback is provided, when auditory feedback is provided, and when tactile feedback is provided, , It can be seen that when the feedback is provided in various ways, a change occurs in a user's operation characteristic and a meaningful change occurs in an actual gesture recognition rate. Especially, it can be confirmed that the influence of each different feedback differs depending on the gesture operation. Accordingly, in order to improve the gesture recognition rate, the present invention provides a multi-sensory feedback of visual, auditory, and tactile signals, thereby greatly improving the gesture recognition rate.

피드백 제어부(140)는 사용자에게 직접 다중 감각 피드백을 제공하도록 구성될 수도 있으나, 일반적으로 연결된 외부 장치를 통해 다중 감각 피드백을 제공할 수 있도록 피드백 명령을 외부 장치로 전달하는 것이 일반적이다.The feedback control unit 140 may be configured to provide multi-sensory feedback directly to the user, but it is common to transmit a feedback command to an external device so as to provide multi-sensory feedback through a generally connected external device.

도7 은 본 발명의 일 실시예에 따른 스마트 3차원 제스처 인식 방법을 나타낸다.7 illustrates a smart three-dimensional gesture recognition method according to an embodiment of the present invention.

도1 을 참조하여 도7 의 스마트 3차원 제스처 인식 방법을 설명하면, 먼저 영상 획득부(110)가 복수개의 프레임으로 구성되는 영상을 획득한다(S11). 손 추적부(120)는 영상 획득부(110)가 획득한 영상을 전송받고, 전송된 영상에서 컬러 기반 방식 및 깊이 기반 방식을 병용하여 분석함으로써, 사용자의 손 또는 손가락이 감지되는지 판별한다(S12). 만일 영상에서 사용자의 손 또는 손가락이 검출되지 않은 것으로 판별되면, 피드백 제어부(140)를 통해 다중 감각 피드백을 사용자에게 제공함으로써, 사용자가 제스처 인식에 실패하였음을 인지하도록 한다(S13).Referring to FIG. 1, the smart three-dimensional gesture recognition method of FIG. 7 will be described. First, the image acquisition unit 110 acquires an image composed of a plurality of frames (S11). The hand tracking unit 120 receives the image acquired by the image acquiring unit 110 and analyzes whether the user's hand or finger is sensed by using the color-based method and the depth-based method in combination with the transmitted image ). If it is determined that the user's hand or finger is not detected in the image, the multi-sensory feedback is provided to the user through the feedback control unit 140 to recognize that the user has failed to recognize the gesture (S13).

그러나 손 또는 손가락이 감지되면, 손 추적부(120)는 손 또는 손가락의 위치를 추적한다(S14). 그리고 제스처 판단부(130)는 매 프레임에서 추적되는 사용자의 손 또는 손가락의 위치 정보를 분석하여, 사용자가 의도한 제스처인지를 판별한다(S15). 제스처는 사용자의 신체 동작에 따른 행위이므로, 경우에 따라서는 사용자가 제스처를 인가하고자 하지 않았음에도 제스처와 유사한 형태의 동작을 수행할 수 있다. 제스처 판단부(130)는 임계값을 미리 지정하고, 추적된는 손 또는 손가락의 위치 변화를 임계값과 비교함으로써, 사용자가 의도한 제스처인지 아닌지를 판별한다.However, if a hand or a finger is detected, the hand tracking unit 120 tracks the position of the hand or the finger (S14). The gesture judging unit 130 analyzes the position information of the user's hand or finger tracked in each frame to determine whether the gesture is intended by the user (S15). Since the gesture is an action according to the user's body motion, in some cases, the user can perform an action similar to the gesture even though the user does not intend to apply the gesture. The gesture determination unit 130 determines a threshold value in advance and determines whether or not the gesture is intended by the user by comparing the tracked positional change of the hand or the finger with the threshold value.

만일 사용자가 의도한 제스처가 아닌 것으로 판별되면, 피드백 제어부(140)를 통해 다중 감각 피드백을 사용자에게 제공할 수 있다(S13). 그러나 영상에서 사용자의 손 또는 손가락이 검출되지 않은 경우와 달리, 사용자가 의도한 제스처인지 여부에 대한 피드백은 경우에 따라서 제공하지 않도록 구성되어도 무방하다.If it is determined that the user is not the intended gesture, the multi-sensory feedback can be provided to the user through the feedback control unit 140 (S13). However, unlike the case where the user's hand or finger is not detected in the image, the feedback may be configured not to provide feedback as to whether the user is the intended gesture.

한편, 사용자가 의도한 제스처인 것으로 판별되면, 특징 추출부(150)는 추적되는 손 또는 손가락의 위치의 변화로부터 제스처 인식을 위해 사용될 특징을 추출한다(S16). 그리고 인식 분류부(150)가 추출된 특징을 이용하여 제스처 패턴을 인식한다(S17). 여기서 인식 분류부(150)는 동적 패턴의 시변성을 고려한 패턴 인식 알고리즘을 이용하여 패턴을 인식한다.On the other hand, if it is determined that the user is the intended gesture, the feature extraction unit 150 extracts a feature to be used for gesture recognition from a change in the position of the hand or finger being tracked (S16). The recognition classifying unit 150 recognizes the gesture pattern using the extracted feature (S17). Here, the recognition classifying unit 150 recognizes a pattern using a pattern recognition algorithm that considers time-variant dynamic patterns.

인식 분류부(150)는 패턴이 인식되면, 인식된 패턴을 사용자 행동 프로파일로 생성한다(S18). 그리고 사용자 프로파일 저장부(170)에 저장된 복수개의 제스처 템플릿 중 생성된 사용자 행동 프로파일에 대응하는 제스처 템플릿을 검색한다(S19). 그리고 대응하는 제스처 템플릿이 사용자 프로파일 저장부(170)에 존재하는지 판별한다(S20). 만일 제스처 템플릿이 존재하지 않으면, 생성된 사용자 행동 프로파일을 분석하여 제스처 템플릿을 생성하고, 생성된 제스처 템플릿을 사용자 프로파일 저장부(170)에 저장한다(S21). 그러나 대응하는 제스처 템플릿이 존재하면, 해당 제스처 템플릿을 인가받고, 인식된 패턴을 제스처 템플릿 적용함으로써 제스처를 인식한다(S22). 여기서 제스처 템플릿을 미리 설정하고, 인식된 패턴을 제스처 템플릿에 적용하여 제스처를 인식하는 것은 사용자별 제스처의 차이로 인한 인식률 저하를 줄임으로써, 제스처 인식 장치의 인식률을 높이기 위함이다. 그리고 명령 변환부(180)는 인식된 제스처를 대응하는 사용자 명령으로 변환하여 연결된 외부 장치로 전송한다. 이때 제스처 인식이 성공했음을 피드백 제어부(140)로 통지하고, 피드백 제어부(140)는 다중 감각 피드백을 사용자에게 제공하여 사용자의 제스처가 인식되었음을 인지할 수 있도록 한다. The recognition classifier 150 generates a recognized pattern as a user behavior profile when the pattern is recognized (S18). Then, a gesture template corresponding to the generated user behavior profile is searched among a plurality of gesture templates stored in the user profile storage unit 170 (S19). Then, it is determined whether a corresponding gesture template exists in the user profile storage unit 170 (S20). If the gesture template does not exist, the gesture template is generated by analyzing the generated user behavior profile, and the generated gesture template is stored in the user profile storage unit 170 (S21). However, if there is a corresponding gesture template, the gesture template is recognized and the gesture is recognized by applying the recognized pattern to the gesture template (S22). Here, recognizing the gesture by pre-setting the gesture template and applying the recognized pattern to the gesture template is intended to increase the recognition rate of the gesture recognition device by reducing the recognition rate deterioration due to the difference of the gesture per user. The instruction conversion unit 180 converts the recognized gesture into a corresponding user command and transmits the corresponding user command to the connected external device. At this time, the feedback control unit 140 notifies that the gesture recognition is successful, and the feedback control unit 140 provides the user with multiple sensory feedback to recognize that the user's gesture has been recognized.

결과적으로 본 발명에 따른 제스처 인식 장치 및 방법은 컬러 기반 방식 및 깊이 기반 방식을 병용하여 사용자의 손 및 손가락을 감지 추적하므로, 감지 및 추적 성능을 향상시키고, 사용자의 행동이 제스처를 의도했는지를 판별함으로써, 제스처 인식률을 크게 높일 수 있다. 뿐만 아니라, 제스처 템플릿을 제공하여 사용자별 제스처의 차이를 고려한 제스처 인식을 수행하고, 다중 감각 피드백을 제공함으로써, 제스처 인식률을 더욱 높일 수 있을 뿐만 아니라, 사용자의 편의성을 향상시킨다.As a result, the gesture recognition apparatus and method according to the present invention can detect and track the user's hands and fingers in combination with the color-based method and the depth-based method, thereby improving the detection and tracking performance and discriminating whether the user's action is intended for the gesture The gesture recognition rate can be greatly increased. In addition, a gesture template is provided to perform gesture recognition in consideration of gesture difference per user, and by providing multiple sensory feedback, not only the gesture recognition rate can be further increased, but also user convenience is improved.

상기에서는 제스처 인식을 위한 일예로 사용자의 손 또는 손가락을 감지하고 추적하는 것으로 설명하였으나, 경우에 따라서는 사용자의 다른 신체 부위를 감지 및 추적하도록 구성될 수도 있다. 즉 사용자의 신체 부위 중 사용자가 제스처를 발생하기 용이하고, 제스처 인식 장치가 감지할 수 있는 어떤 부위라도 제스처 인식을 위해 감지될 수 있다.In the above description, the user's hand or finger is detected and tracked as an example for gesture recognition. However, the user may be configured to detect and track another body part of the user. That is, any part of the body part of the user that is easy for a user to generate a gesture and can be detected by the gesture recognition device can be detected for gesture recognition.

본 발명에 따른 방법은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The method according to the present invention can be implemented as a computer-readable code on a computer-readable recording medium. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored. Examples of the recording medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like, and a carrier wave (for example, transmission via the Internet). The computer-readable recording medium may also be distributed over a networked computer system so that computer readable code can be stored and executed in a distributed manner.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art.

따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다.Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

Claims

An image acquiring unit acquiring an image composed of a plurality of consecutive frames;
A hand tracking unit for detecting and tracking a position of a user's designated hand by simultaneously using a color-based method and a depth-based method in the image;
The motion analyzer analyzes the position information of the hand tracked by the hand tracking unit, calculates a motion gradient for the hand positional change analyzed, and determines whether the calculated motion gradient is equal to or greater than a predetermined upper limit value, A gesture judging unit for judging that the gesture of the user is intended by the user;
A feature extraction unit for extracting a feature of the hand position change in a preset manner;
A user profile storage unit storing a plurality of gesture templates;
Recognizing a pattern of the gesture from the extracted features, generating the recognized pattern as a user behavior profile, searching a gesture template corresponding to the user behavior profile among the plurality of gesture templates stored in the user profile storage unit A recognition classifier recognizing the gesture by applying the recognized pattern to a searched gesture template; And
A command conversion unit for converting the recognized gesture into a corresponding user command; And a gesture recognition device.

The apparatus of claim 1, wherein the image obtaining unit
And acquiring the color image and the depth image in combination using an RGB-D camera.

The hand tracking system according to claim 1,
A hand is detected using a local binary pattern (LBP) for the color vector of the Lab color space in the color-based method, and a hand detected by the machine learning in the depth-based method is detected using the conditional probability And tracks the detected hand of the user according to the CAMSHIFT algorithm.

delete

The apparatus of claim 1, wherein the feature extraction unit
Wherein the gesture recognition device extracts the speed of the hand position change with the feature.

The apparatus of claim 1, wherein the recognition classifier
And recognizes a pattern of the gesture using a dynamic time warping (DTW).

The apparatus of claim 1, wherein the recognition classifier
If the gesture template corresponding to the user behavior profile is not searched, generating a new gesture template by filtering the user behavior profile with a Kalman filter, and storing the generated gesture template in a user profile storage unit, Device.

The apparatus according to claim 1, wherein the gesture recognition device
If at least one of the following cases occurs: the user's hand is not detected by the hand detection unit, the gesture determination unit determines that the gesture is not intended by the user, and the recognition classification unit recognizes the gesture A feedback control unit for generating feedback so that the user can recognize at least one of visual, auditory, and tactile senses; The gesture recognition apparatus comprising:

A gesture recognition method of a gesture recognition apparatus including an image acquisition unit, a hand tracking unit, a gesture determination unit, a feature extraction unit, a user profile storage unit, a recognition classification unit, and an instruction conversion unit,
Acquiring an image composed of a plurality of consecutive frames;
Detecting and tracking a position of a user's designated hand by simultaneously using a color-based method and a depth-based method in the image;
Analyzing position information of the hand in which the gesture judging unit is traced, and determining whether the user intends the gesture using the analyzed hand position change;
Extracting a feature of the hand position change in a predetermined manner;
Recognizing a pattern of the gesture from the feature extracted by the recognition and classification unit;
Wherein the recognition classifier generates the recognized pattern as a user behavior profile, searches a gesture template corresponding to the user behavior profile among the plurality of gesture templates stored in the user profile storage unit, and stores the recognized pattern as a retrieved gesture Recognizing the gesture by applying it to a template; And
Converting the recognized gesture into a corresponding user command; Lt; / RTI >
The step of determining whether the gesture was intended
Calculating a motion gradient for a positional change of the hand being tracked; And
Determining that the gesture is intended by the user if the calculated motion gradient is greater than or equal to a predetermined upper limit value; The gesture recognition method comprising:

10. The method of claim 9, wherein sensing and tracking the position of the hand
Detecting a hand using a local binary pattern (LBP) for the color vector of the Lab color space in the color-based manner;
Detecting a hand sensed by machine learning in the depth-based manner using a conditional probability; And
Tracking the detected hand of the user according to the CAMSHIFT algorithm; The gesture recognition method comprising:

delete

10. The method of claim 9, wherein generating the user behavior profile comprises:
Wherein a pattern of the gesture is recognized using a dynamic time warping (DTW) method.

10. The method of claim 9, wherein recognizing the gesture comprises:
Generating the recognized pattern as the user behavior profile;
Retrieving the gesture template corresponding to the user behavior profile;
Recognizing the gesture by applying the recognized pattern to the searched gesture template;
Filtering the user behavior profile with a Kalman filter to generate a new gesture template if the gesture template corresponding to the user behavior profile is not retrieved; And
Storing the generated gesture template in the user profile storage; The gesture recognition method comprising:

10. The method of claim 9, wherein the gesture recognition method comprises:
Wherein the gesture recognition apparatus further comprises a feedback control section,
When the feedback control unit determines that the hand of the user is not detected in the hand detection unit or that the gesture is not intended by the user in the gesture determination unit and when the recognition classification unit recognizes the gesture Generating feedback so that the user can recognize at least one of visual, auditory, and tactile, if one occurs; The gesture recognition method further comprising: