KR20160031873A

KR20160031873A - Gesture recognizer setting method, gesture recognition method and apparatus

Info

Publication number: KR20160031873A
Application number: KR1020140122194A
Authority: KR
Inventors: 김대은; 임현구
Original assignee: 연세대학교 산학협력단
Priority date: 2014-09-15
Filing date: 2014-09-15
Publication date: 2016-03-23
Also published as: KR101627929B1

Abstract

제스처 인식기 설정 방법, 제스처 인식 방법 및 그 장치가 개시된다. 제스처 인식기 설정 방법은, (a) 타겟 제스처를 수행하는 객체를 포함하는 입력 영상에서 상기 객체에 대한 색상정보에 기반하여 상기 입력 영상을 복수의 이진 영상으로 변환하는 단계-상기 입력 영상은 연속된 복수의 영상임; (b) 상기 변환된 복수의 이진 영상에서 상기 객체의 중심을 추출하여 제스처 데이터를 각각 획득하는 단계; (c) 상기 각각 획득된 제스처 데이터를 클러스터링하고, 상기 클러스터링 결과를 이용하여 각 클러스터의 클래스를 각 상태로 포함하는 상태 시퀀스를 획득하는 단계; (d) 상기 상태 시퀀스에서 각 클러스터간의 천이확률값을 계산하는 단계; 및 (e) 상기 상태 시퀀스 및 상기 계산된 천이확률값을 이용하여 상기 타겟 제스처에 대한 인식기를 설정하는 단계를 포함한다.A gesture recognizer setting method, a gesture recognition method, and a device thereof are disclosed. A method for setting a gesture recognizer includes the steps of: (a) converting an input image into an input image including an object performing a target gesture, the input image being converted into a plurality of binary images based on color information about the object, ; (b) extracting a center of the object from the transformed plurality of binary images to obtain gesture data; (c) clustering the obtained gesture data, and obtaining a state sequence including each class of each cluster in each state using the clustering result; (d) calculating a transition probability value between each cluster in the state sequence; And (e) setting a recognizer for the target gesture using the state sequence and the calculated transition probability value.

Description

[0001] The present invention relates to a gesture recognizer setting method, a gesture recognizing method,

본 발명은 컴퓨터 비전(vision)을 이용하여 제스처를 인식할 수 있는 방법 및 그 장치에 관한 것이다.The present invention relates to a method and apparatus for recognizing a gesture using a computer vision.

현재 사람이 컴퓨터와 자연스럽게 인터랙션(interaction)할 수 있는 손 제스처(hand gesture)를 사용하여 제어하는 방법에 대해 꾸준히 연구가 수행되고 있다. 기존의 방법들은 대부분 HMM(Hidden Markov Model)을 이용하는 방법이 주를 이루고 있다.Currently, research is being conducted on how to control using a hand gesture that can naturally interact with a computer. Most of the existing methods mainly use HMM (Hidden Markov Model).

HMM을 이용하여 제스처를 인식하는 경우, 일반적으로 노이즈에 강인하고 제스처 인식 결과가 좋기 때문에 널리 사용되고 있다. 그러나, HMM의 경우 계산량이 많고 트레이닝을 하기 위해 많은 트레이닝 데이터가 필요한 단점이 있다. 또한, HMM을 사용하는 경우, 미리 트레이닝된 제스처만을 인식할 수 있는 단점을 가지고 있다.When a gesture is recognized using an HMM, it is generally used because it is robust against noise and has good gesture recognition results. However, HMM has a large computational complexity and requires a lot of training data for training. In addition, when the HMM is used, only a pre-trained gesture can be recognized.

본 발명은 비교적 적은 트레이닝 데이터로도 휼륭한 제스처 인식 결과를 얻을 수 있는 제스처 인식기 설정 방법, 제스처 인식 방법 및 그 장치를 제공하기 위한 것이다.The present invention provides a gesture recognizer setting method, a gesture recognition method, and a gesture recognition method that can obtain a good gesture recognition result even with a relatively small amount of training data.

또한, 본 발명은 사용자가 직접 자신이 사용하고자 하는 제스처를 비교적 적은 횟수로 입력하여 제스처를 인식하도록 할 수 있는 제스처 인식기 설정 방법, 제스처 인식 방법 및 그 장치를 제공하기 위한 것이다.It is another object of the present invention to provide a gesture recognizer setting method, a gesture recognition method, and a gesture recognition method that allow a user to directly input a gesture to be used by himself / herself relatively fewer times to recognize a gesture.

또한, 본 발명은 사용자마다 제스처를 위한 동작 속도나 시간이 다르더라도 별도의 트레이닝 과정 없이도 정확하게 이를 인식할 수 있는 제스처 인식기 설정 방법, 제스처 인식 방법 및 그 장치를 제공하기 위한 것이다.Another object of the present invention is to provide a gesture recognizer setting method, a gesture recognition method, and a device for recognizing the gesture correctly without any separate training process even if the operation speed or time for the gesture is different for each user.

본 발명의 일 측면에 따르면, 비교적 적은 트레이닝 데이터로도 휼륭한 제스처 인식 결과를 얻을 수 있도록 각 제스처에 대한 인식기를 설정하고, 이를 이용하여 제스처를 인식하는 방법이 제공된다.According to an aspect of the present invention, there is provided a method of recognizing a gesture by setting a recognizer for each gesture so as to obtain a good gesture recognition result with relatively little training data.

본 발명의 일 실시예에 따르면, (a) 타겟 제스처를 수행하는 객체를 포함하는 입력 영상에서 상기 객체에 대한 색상정보에 기반하여 상기 입력 영상을 복수의 이진 영상으로 변환하는 단계-상기 입력 영상은 연속된 복수의 영상임; (b) 상기 변환된 복수의 이진 영상에서 상기 객체의 중심을 추출하여 제스처 데이터를 각각 획득하는 단계; (c) 상기 각각 획득된 제스처 데이터를 클러스터링하고, 상기 클러스터링 결과를 이용하여 각 클러스터의 클래스를 각 상태로 포함하는 상태 시퀀스를 획득하는 단계; (d) 상기 상태 시퀀스에서 각 클러스터간의 천이확률값을 계산하는 단계; 및 (e) 상기 상태 시퀀스 및 상기 계산된 천이확률값을 이용하여 상기 타겟 제스처에 대한 인식기를 설정하는 단계를 포함하는 제스처 인식을 위한 인식기 설정 방법이 제공될 수 있다.According to an embodiment of the present invention, there is provided a method for processing a target gesture, the method comprising the steps of: (a) converting an input image including an object performing a target gesture into a plurality of binary images based on color information about the object, A plurality of consecutive images; (b) extracting a center of the object from the transformed plurality of binary images to obtain gesture data; (c) clustering the obtained gesture data, and obtaining a state sequence including each class of each cluster in each state using the clustering result; (d) calculating a transition probability value between each cluster in the state sequence; And (e) setting a recognizer for the target gesture using the state sequence and the calculated transition probability value.

본 발명의 다른 실시예에 따르면, 복수의 제스처에 대한 복수의 인식기가 설정된 상태에서 제스처를 인식하는 방법에 있어서-상기 인식기는 각 제스처에 대한 상태 시퀀스와 천이확률값을 포함함, 특정 제스처를 수행하는 객체를 포함하는 연속된 복수의 입력 영상에서 상기 객체에 대한 색상정보에 기초하여 상기 복수의 입력 영상을 각각 이진 영상으로 변환하는 단계; 상기 복수의 변환된 이진 영상에서 상기 객체의 중심을 추출하여 제스처 데이터를 각각 획득하는 단계;According to another embodiment of the present invention, there is provided a method of recognizing a gesture with a plurality of recognizers for a plurality of gestures set, the recognizer including a state sequence for each gesture and a transition probability value, Converting each of the plurality of input images into binary images based on color information of the object in a plurality of consecutive input images including the object; Extracting a center of the object from the plurality of converted binary images to obtain gesture data;

상기 각 제스처 데이터에 대한 클러스터 클래스를 확인하는 단계; 및 상기 클러스터 클래스가 상기 상태 시퀀스와 일치하는 제스처로 상기 특정 제스처를 인식하는 단계를 포함하는 제스처 인식 방법이 제공될 수 있다.
Identifying a cluster class for each gesture data; And recognizing the specific gesture with a gesture in which the cluster class coincides with the state sequence.

본 발명의 다른 측면에 따르면, 비교적 적은 트레이닝 데이터로도 휼륭한 제스처 인식 결과를 얻을 수 있도록 각 제스처에 대한 인식기를 설정한 후 이를 이용하여 제스처를 인식하는 장치가 제공된다.According to another aspect of the present invention, there is provided an apparatus for recognizing a gesture by setting a recognizer for each gesture so as to obtain a good gesture recognition result with relatively little training data.

본 발명의 일 실시예에 따르면, 타겟 제스처를 수행하는 객체를 포함하는 입력 영상에서 상기 객체에 대한 색상정보에 기반하여 상기 입력 영상을 복수의 이진 영상으로 변환하는 이진 영상 변환부-상기 입력 영상은 연속된 복수의 영상임; 상기 변환된 복수의 이진 영상에서 상기 객체의 중심을 추출하여 제스처 데이터를 각각 추출하는 추적부; 상기 각각 획득된 제스처 데이터를 클러스터링하고, 상기 클러스터링 결과를 이용하여 각 클러스터를 각각의 상태로 포함하는 상기 타겟 제스처에 대한 상태 시퀀스를 생성하는 시퀀스 획득부; 상기 상태 시퀀스에서 각 클러스터간의 천이확률값을 계산하는 계산부; 및 상기 상태 시퀀스 및 상기 계산된 천이확률값을 이용하여 상기 타겟 제스처에 대한 인식기를 설정하는 인식기 설정부를 포함하는 제스처 인식 장치가 제공될 수 있다.
According to an embodiment of the present invention, there is provided a binary image transform unit for transforming the input image into a plurality of binary images based on color information about the object in an input image including an object performing a target gesture, A plurality of consecutive images; A tracking unit for extracting gesture data by extracting a center of the object from the transformed plurality of binary images; A sequence obtaining unit for clustering the obtained gesture data and generating a state sequence for the target gesture including each cluster in each state using the clustering result; A calculation unit for calculating a transition probability value between each cluster in the state sequence; And a recognizer setting unit for setting a recognizer for the target gesture using the state sequence and the calculated transition probability value.

본 발명의 다른 실시예에 따르면, 복수의 제스처에 대한 복수의 인식기가 설정된 상태에서 제스처를 인식하는 장치에 있어서-상기 인식기는 각 제스처에 대한 상태 시퀀스와 천이확률값을 포함함, 특정 제스처를 수행하는 객체를 포함하는 연속된 복수의 입력 영상에서 상기 객체에 대한 색상정보에 기초하여 상기 복수의 입력 영상을 각각 이진 영상으로 변환하는 이진 영상 변환부; 상기 복수의 변환된 이진 영상에서 상기 객체의 중심을 추출하여 제스처 데이터를 각각 획득하는 추적부; 상기 각 제스처 데이터에 대한 클러스터 클래스를 확인하는 분류부; 시간 순서를 고려하여 상기 클러스터 클래스와 일치하는 상태 시퀀스에 대응하는 제스처로 상기 특정 제스처를 인식하는 인식부를 포함하는 제스처 인식 장치가 제공될 수 있다.According to another embodiment of the present invention, there is provided an apparatus for recognizing a gesture with a plurality of recognizers for a plurality of gestures being set, the recognizer comprising a state sequence for each gesture and a transition probability value, A binary image transform unit for transforming the plurality of input images into binary images based on color information for the object in a plurality of consecutive input images including the object; A tracking unit for extracting a center of the object from the plurality of converted binary images to obtain gesture data; A classifying unit for classifying a cluster class for each gesture data; A gesture recognition apparatus may be provided that includes a recognition unit that recognizes the specific gesture as a gesture corresponding to a state sequence coinciding with the cluster class in consideration of a time sequence.

본 발명의 일 실시예에 따른 제스처 인식기 설정 방법, 제스처 인식 방법 및 그 장치를 제공함으로써, 비교적 적은 트레이닝 데이터로도 휼륭한 제스처 인식 결과를 얻을 수 있는 이점이 있다.The gesture recognizer setting method, the gesture recognizing method, and the apparatus according to an embodiment of the present invention are advantageous in that even a relatively small amount of training data can provide excellent gesture recognition results.

또한, 본 발명은 사용자가 직접 자신이 사용하고자 하는 제스처를 비교적 적은 횟수로 입력하여 제스처를 인식하도록 할 수 있는 이점도 있다.In addition, the present invention has an advantage in that a user can directly input a gesture to be used by himself / herself to a gesture by a relatively small number of times.

또한, 본 발명은 사용자마다 제스처를 위한 동작 속도나 시간이 다르더라도 별도의 트레이닝 과정 없이도 정확하게 이를 인식할 수 있는 이점도 있다.In addition, the present invention has an advantage in that even if an operation speed or time for a gesture is different for each user, the user can accurately recognize the operation speed or the time without a separate training process.

도 1은 본 발명의 일 실시예에 따른 제스처 인식을 위해 각 제스처에 대한 인식기를 설정하는 방법을 나타낸 순서도.
도 2는 본 발명의 일 실시예에 따른 손 중심에 추출에 대한 예시도면.
도 3은 본 발명의 일 실시예에 따른 제스처 데이터 추적에 대한 예시도면.
도 4는 본 발명의 일 실시예에 따른 제스처 데이터의 정규화를 설명하기 위해 도시한 도면.
도 5는 본 발명의 일 실시예에 따른 제스처 데이터에 대한 평면으로의 정사영을 설명하기 위해 도시한 도면.
도 6은 본 발명의 일 실시예에 따른 제스처 데이터의 클러스터링을 설명하기 위해 도시한 도면.
도 7은 본 발명의 일 실시예에 따른 상태 시퀀스와 천이확률값을 설명하기 위해 도시한 도면.
도 8은 도 1에 의해 설정된 복수의 인식기를 이용하여 제스처를 인식하는 방법을 설명하는 순서도.
도 9는 본 발명의 일 실시예에 따른 제스처 인식 장치의 내부 구성을 도시한 블록도.FIG. 1 is a flowchart illustrating a method of setting a recognizer for each gesture for gesture recognition according to an exemplary embodiment of the present invention. Referring to FIG.
2 is an exemplary illustration of hand centered extraction in accordance with one embodiment of the present invention.
3 is an illustration of gesture data tracking in accordance with an embodiment of the present invention.
4 is a diagram illustrating normalization of gesture data according to an embodiment of the present invention;
5 is a diagram illustrating an orthogonal projection to a plane for gesture data according to an embodiment of the present invention;
6 is a diagram illustrating clustering of gesture data according to an embodiment of the present invention.
FIG. 7 is a diagram illustrating a state sequence and a transition probability value according to an embodiment of the present invention; FIG.
FIG. 8 is a flowchart for explaining a method of recognizing a gesture using a plurality of recognizers set by FIG. 1; FIG.
9 is a block diagram illustrating an internal configuration of a gesture recognition apparatus according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.BRIEF DESCRIPTION OF THE DRAWINGS The present invention is capable of various modifications and various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

이하, 본 발명의 실시예를 첨부한 도면들을 참조하여 상세히 설명하기로 한다.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 제스처 인식을 위해 각 제스처에 대한 인식기를 설정하는 방법을 나타낸 순서도이고, 도 2는 본 발명의 일 실시예에 따른 손 중심에 추출에 대한 예시도면이며, 도 3은 본 발명의 일 실시예에 따른 제스처 데이터 추적에 대한 예시도면이고, 도 4는 본 발명의 일 실시예에 따른 제스처 데이터의 정규화를 설명하기 위해 도시한 도면이며, 도 5는 본 발명의 일 실시예에 따른 제스처 데이터에 대한 평면으로의 정사영을 설명하기 위해 도시한 도면이고, 도 6은 본 발명의 일 실시예에 따른 제스처 데이터의 클러스터링을 설명하기 위해 도시한 도면이며, 도 7은 본 발명의 일 실시예에 따른 상태 시퀀스와 천이확률값을 설명하기 위해 도시한 도면이다. 제스처 인식을 위한 인식기를 설정하는 것은 제스처 인식을 위해 제스처 인식 장치에서 선행적으로 수행될 수 있는바, 도 1에서는 각 단계를 수행하는 주체를 제스처 인식 장치로 통칭하여 설명하기로 한다. 그러나, 구현 방법에 따라 제스처 인식 장치(100)는 각 제스처에 대해 설정된 인식기를 입력받을 수도 있으며, 이와 같은 경우, 제스처 인식 장치(100)는 해당 제스처 인식을 위한 인식기 설정을 위한 구성을 포함하지 않고 별도의 구성으로 존재할 수도 있음은 당연하다. FIG. 1 is a flowchart illustrating a method of setting a recognizer for each gesture in order to recognize a gesture according to an embodiment of the present invention. FIG. 2 is an exemplary view for extracting a hand center according to an embodiment of the present invention, FIG. 3 is a diagram illustrating gesture data tracking according to an exemplary embodiment of the present invention. FIG. 4 is a diagram illustrating normalization of gesture data according to an exemplary embodiment of the present invention. 6 is a view for explaining clustering of gesture data according to an embodiment of the present invention, and Fig. 7 is a view for explaining clustering of gesture data according to an embodiment of the present invention. FIG. 6 is a diagram illustrating a state sequence and a transition probability value according to an embodiment of the present invention. FIG. Setting the recognizer for gesture recognition can be performed in advance in the gesture recognition device for gesture recognition. In FIG. 1, the subject performing each step will be collectively referred to as a gesture recognition device. However, according to the implementation method, the gesture recognition apparatus 100 may receive input of a recognizer set for each gesture. In this case, the gesture recognition apparatus 100 does not include a configuration for recognizer setting for recognition of the gesture It is of course also possible to have a separate configuration.

단계 110에서 제스처 인식 장치(100)는 제스처를 수행하는 객체(예를 들어, 손)을 촬영하여 입력 영상을 획득하고, 획득된 입력 영상을 색상정보에 기반하여 이진 영상으로 변환한다. 이하, 본 발명의 일 실시예에서는 객체가 손인 것을 가정하여 이를 중심으로 설명하기로 한다. 또한, 도 1에서는 제스처 인식 장치(100)가 제스처를 수행하는 손을 촬영하여 입력 영상을 획득하는 것을 가정하나, 입력 영상은 별도로 외부에서 입력될 수도 있음은 당연하다.In step 110, the gesture recognition apparatus 100 acquires an input image by photographing an object (for example, a hand) performing a gesture, and converts the obtained input image into a binary image based on color information. Hereinafter, an embodiment of the present invention will be described mainly on the assumption that an object is a hand. Although it is assumed in FIG. 1 that the gesture recognition apparatus 100 captures an input image by photographing a hand performing a gesture, it is natural that the input image may be separately input from the outside.

본 발명의 일 실시예에 따른 제스처 인식 장치(100)는 사용자의 손에 대한 제스처를 인식하기 위한 것으로, 사용자의 손에 대한 색상정보에 기반하여 입력 영상을 이진 영상으로 변환할 수 있다.The gesture recognition apparatus 100 according to an embodiment of the present invention recognizes a gesture of a user's hand and can convert an input image into a binary image based on color information of a user's hand.

즉, 제스처 인식 장치(100)는 손이 가지고 있는 일반적인 색상정보를 이용하여 입력 영상의 RGB 또는 HSV 등의 각 픽셀의 색상정보에 기반하여 각 픽셀이 손에 해당하는 색상정보에 포함되는 경우, 해당 픽셀의 픽셀값은 제1 값으로 변환하고, 나머지 픽셀의 픽셀값은 제2 값으로 픽셀값을 변환하여 이진화할 수 있다.That is, when the gesture recognition apparatus 100 is included in the color information corresponding to the hand based on the color information of each pixel such as RGB or HSV of the input image using general color information of the hand, The pixel value of the pixel may be converted to the first value and the pixel value of the remaining pixels may be binarized by converting the pixel value to the second value.

본 발명의 일 실시예에서는 색상정보에 기반한 입력 영상의 이진화를 설명하기 위해 손의 색상정보를 이용한 입력 영상의 색상정보에 기반한 이진화를 중심으로 설명하였으나. 이외에 다른 방법을 통해 손을 추출할 수 있는 경우, 이에 국한되지 않고 적용될 수 있음은 당연하다.In an embodiment of the present invention, binarization based on color information of an input image using hand color information has been described in order to explain binarization of an input image based on color information. It should be understood that the present invention is not limited to the case where the hand can be extracted through other methods.

단계 115에서 제스처 인식 장치(100)는 이진 영상에 대해 필터링을 수행하여 에러를 제거한다.In step 115, the gesture recognition apparatus 100 performs filtering on the binary image to remove the error.

일반적으로 사람의 손은 연속적이며 부드러운 경계를 가지고 있다. 따라서, 제스처 인식 장치(100)는 이진 영상에서 손에 해당하는 각 픽셀에 대해 색상정보의 오류로 인해 중간중간 간헐적으로 발생하는 점 형태의 에러에 대해 마스크를 씌어 에러를 제거한다. In general, human hands have continuous, smooth boundaries. Therefore, the gesture recognition apparatus 100 masks the point-type error that occurs intermittently and intermittently due to the error of the color information for each pixel corresponding to the hand in the binary image, thereby eliminating the error.

이때, 필터는 실시간 처리를 위해 가로, 세로, 대각선 방향으로 단순한 형태로 적용될 수 있다. 또한, 필터를 복수의 반복 적용되어 이진 영상에서 에러를 제거할 수 있다.At this time, the filter can be applied in a simple form in the horizontal, vertical, and diagonal directions for real-time processing. In addition, a plurality of iterations of the filter can eliminate errors in the binary image.

보다 상세하게, 제스처 인식 장치(100)는 이진 영상의 손 영역에 포함되는 픽셀 중 특정 픽셀이 주변 모든 픽셀들과 상이한 경우, 해당 특정 픽셀의 픽셀값을 주변 픽셀들과 일치시킬 수 있다.More specifically, the gesture recognition apparatus 100 may match a pixel value of a specific pixel with neighboring pixels when a specific pixel among the pixels included in the hand region of the binary image is different from all surrounding pixels.

예를 들어, 이진 영상에서 손 영역에 해당하는 픽셀값이 제1 값이고, 그외의 픽셀값이 제2 값이라고 가정하자. 이때, 손 영역에 포함되는 픽셀들 중 특정 픽셀의 픽셀값이 제2 값이고, 특정 픽셀의 주변 모든 픽셀의 픽셀값이 제1 값이라고 가정하자. 이와 같은 경우, 손 영역에 색상 오류로 인해 이진 영상 변환시 에러가 발생한 것이므로, 제스처 인식 장치(100)는 필터를 적용하여 특정 픽셀의 픽셀값을 제1 값으로 변경하여 에러를 제거할 수 있다.For example, assume that the pixel value corresponding to the hand region in the binary image is the first value and the other pixel value is the second value. Here, it is assumed that the pixel value of a specific pixel among the pixels included in the hand region is a second value, and that the pixel values of all the surrounding pixels of the specific pixel are the first values. In this case, since an error occurs in the binary image conversion due to the color error in the hand region, the gesture recognition apparatus 100 can remove the error by changing the pixel value of the specific pixel to the first value by applying the filter.

단계 120에서 제스처 인식 장치(100)는 에러가 제거된 이진 영상을 이용하여 손 중심을 획득하고, 각 영상에서의 손 중심을 추출하여 제스처 데이터를 획득한다.In step 120, the gesture recognition apparatus 100 acquires the hand center using the binary image from which the error has been eliminated, and extracts the hand center in each image to obtain the gesture data.

제스처 인식 장치(100)는 에러가 제거된 이진 영상에서 손에 해당하는 전체 픽셀들에 대한 평균 중심을 계산한 후 이를 손 중심으로 설정할 수 있다.The gesture recognition apparatus 100 can calculate the average center for all the pixels corresponding to the hand in the binary image from which the error has been eliminated,

이와 같은 방법으로 연속적으로 획득되는 복수의 영상에서 각각 손 중심을 추출하고, 추출된 손 중심에 대한 데이터(제스처 데이터)를 추출할 수 있다.In this way, the hand center can be extracted from a plurality of successively obtained images, and data (gesture data) about the extracted hand center can be extracted.

도 2에는 영상에서 손 중심을 추출한 일 예가 도시되어 있다.FIG. 2 shows an example in which the center of the hand is extracted from the image.

또한, 도 3에는 연속된 영상에서 각각 손 중심에 대한 데이터(이하, 제스처 데이터라 칭하기로 함)를 도시한 도면이다. 도 3은 연속된 영상에서 각 프레임단위로 손 중심을 추출하여 제스처 데이터를 획득한 일 예를 나타낸 것이다.FIG. 3 is a diagram showing data (hereinafter, referred to as gesture data) for each hand center in a continuous image. FIG. 3 shows an example in which gesture data is obtained by extracting a hand center in units of frames in successive images.

단계 125에서 제스처 인식 장치(100)는 획득된 제스처 데이터를 공간상에서의 위치 보정을 위해 정규화하여 저장한다.In step 125, the gesture recognition apparatus 100 normalizes and stores the obtained gesture data for positional correction in space.

도 4의 (a)에는 연속된 복수의 영상에서 획득된 제스처 데이터가 도시되어 있으며, 이를 공간상에서의 위치 보정을 위해 정규화한 일 예가 도 4의 (b)에 도시되어 있다.FIG. 4A shows gesture data obtained from a plurality of consecutive images, and FIG. 4B shows an example of the gesture data normalized for positional correction in space.

단계 130에서 제스처 인식 장치(100)는 정규화된 제스처 데이터를 시간에 상관없이 평면상으로 정사영시킨다.In step 130, the gesture recognition apparatus 100 orthogonalizes the normalized gesture data in a plane regardless of the time.

일반적으로 획득된 제스처 데이터는 2차원으로 저장되나 제스처가 입력되는 시간을 고려하면 데이터는 3차원이 된다.Generally, gesture data obtained is stored in two dimensions, but considering the input time of the gesture, the data becomes three-dimensional.

도 5의 (a)에는 시간을 고려한 제스처 데이터가 도시되어 있다.도 5의 (a)와 같은 3차원 제스처 데이터를 공간상에서 클러스터링하기 위해 시간에 상관없이 제스처 데이터를 평면상으로 정사영시킨다.5 (a) shows gesture data in consideration of time. In order to cluster the three-dimensional gesture data as shown in Fig. 5 (a) in space, the gesture data is orthogonalized regardless of time.

도 5의 (a)와 같은 제스처 데이터를 시간에 상관없이 2차원 평면상으로 정사영시키면, 도 5의 (b)와 같이 나타난다.If the gesture data as shown in FIG. 5 (a) is orthogonalized on a two-dimensional plane regardless of the time, it appears as shown in FIG. 5 (b).

단계 135에서 제스처 인식 장치(100)는 시간에 상관없이 2차원 평면상으로 정사영된 제스처 데이터를 클러스터링한다.In step 135, the gesture recognition apparatus 100 clusters orthogonal gesture data on a two-dimensional plane regardless of time.

이때, 제스처 인식 장치(100)는 가우시안 혼합 모델(GMM: Gaussian Mixture Modeling)을 이용하여 데이터를 클러스터링할 수 있다. 여기서, 가우시안 혼합 모델 자체는 이미 당업자에게는 자명한 사항이므로 이에 대한 별도의 설명은 생략하기로 한다.At this time, the gesture recognition apparatus 100 may cluster data using a Gaussian Mixture Modeling (GMM). Here, the Gaussian mixture model itself is a matter that is obvious to those skilled in the art, so a detailed description thereof will be omitted.

또한, 제스처 인식 장치(100)는 가우시안 혼합 모델을 이용함에 있어 혼합 개수는 1로 설정하며, 공분산의 형태를 전체(Full)로 설정하여 정사영된 제스처 데이터를 클러스터링할 수 있다.In addition, the gesture recognition apparatus 100 may set the number of blending to 1 and use the Gaussian blend model to set the shape of the covariance as Full to cluster the vertically running gesture data.

도 6에는 도 5의 (b)와 같이 정사영된 제스처 데이터에 대한 클러스터링 결과가 도시되어 있다.FIG. 6 shows clustering results for vertically running gesture data as shown in FIG. 5 (b).

도 6에 도시된 바와 같이, 클러스터링이 완료되면, 정사영된 제스쳐 데이터의 모든 데이터들은 자신과 가장 가까이 있는 클러스터로 클래스(class)가 분류된다. 여기서, 클러스터의 클래스는 각 클러스터를 구분하기 위한 클러스터 인덱스일 수 있다. 이때, 제스처 인식 장치(100)는 정사영된 제스처 데이터들간의 거리값을 이용할 수 있다. 거리값은 마하라노비스의 거리(mahalanobis’s distance)를 이용할 수 있다.As shown in FIG. 6, when the clustering is completed, all data of the orthogonal gesture data are classified into clusters closest to the clusters. Here, the class of the cluster may be a cluster index for distinguishing each cluster. At this time, the gesture recognition apparatus 100 can use the distance value between the orthographic gesture data. Distance values can be used for mahalanobis's distance.

단계 140에서 제스처 인식 장치(100)는 클러스터링된 결과값을 시간 순서에 맞게 정렬한다.In step 140, the gesture recognition apparatus 100 arranges the clustered result values in time order.

도 6의 클러스터링 결과를 시간 순서에 맞게 정렬하면, 예를 들어, 다음과 같이 나타날 수 있다.If the clustering results of FIG. 6 are sorted in time order, for example, they may appear as follows.

333……33332222….2222111…1111555…555333…333444…444333 ... ... 33332222 ... .2222111 ... 1111555 ... 555333 ... 333444 ... 444

이와 같이, 시간 순서대로 정렬된 클러스터링 결과에 따른 클래스를 첫번째 클래스만을 이용하여 시간 순서에 따라 정리하면 “a” 제스처에 대한 상태 시퀀스는 3 →2→1→5→3→4와 같이 정리될 수 있다(단계 145).As such, if the class according to the clustering result sorted in chronological order is summarized according to the time order using only the first class, the state sequence for the "a" gesture can be summarized as 3 → 2 → 1 → 5 → 3 → 4 (Step 145).

단계 150에서 제스처 인식 장치(100)는 제스처에 대한 상태 시퀀스의 각 상태(클러스터)들간의 천이 확률값을 계산한다. 이때, 제스처 인식 장치(100)는 Baum-Welch 알고리즘과 유사한 방법으로 각 상태(예를 들어, 각 클러스터)간의 천이 확률값을 계산할 수 있다.In step 150, the gesture recognition apparatus 100 calculates a transition probability value between each state (cluster) of the state sequence for the gesture. At this time, the gesture recognition apparatus 100 can calculate a transition probability value between each state (for example, each cluster) in a similar manner to the Baum-Welch algorithm.

이를 수식으로 나타내면, 하기 수 1과 같다.This can be expressed by the following equation (1).

여기서, S(t)는 시간 t에서 제스처 데이터와 가장 가까이 있는 상태(클러스터)를 나타낸다. 또한,

는

이거나

일 확률값을 나타낸다.Here, S (t) represents a state (cluster) closest to the gesture data at time t. Also,

The

Or

Represents a probability value.

즉,

는 모든 시간에 대해

,

에 있는 제스처 데이터의 개수를 의미한다.In other words,

For every hour

,

The number of the gesture data in the < / RTI >

는 모든 시간에 대해

에 있는 제스처 데이터의 개수를 의미한다.

For every hour

The number of the gesture data in the < / RTI >

따라서, 수 1을 이용하여 각 상태 시퀀스간의 천이확률값(

)를 계산할 수 있다.Therefore, by using the number 1, a transition probability value (

) Can be calculated.

예를 들어, “a” 제스처에 대한 상태 시퀀스가 3 →2→1→5→3→4와 같을 때, 각 상태(클러스터)들간 천이확률값은 하기와 같이 정리될 수 있다.For example, when the state sequence for the "a" gesture is equal to 3 → 2 → 1 → 5 → 3 → 4, the transition probability values between the respective states (clusters) can be summarized as follows.

상기 수식과 같이 계산된 제스처에 대한 상태 시퀀스에서 각 상태(클래스간)의 천이확률값을 각 상태 시퀀스에 적용하여 나타내면 도 7과 같이 나타낼 수 있다.The transition probability value of each state (between the classes) in the state sequence for the gesture computed as in the above equation can be expressed as shown in FIG. 7 by applying the transition probability value to each state sequence.

단계 155에서 제스처 인식 장치(100)는 상태 시퀀스간의 천이확률값과 상태 시퀀스를 이용하여 해당 제스처에 대한 인식기를 설정한다.In step 155, the gesture recognition apparatus 100 sets a recognizer for the gesture using the transition probability value and the state sequence between the state sequences.

도 1에서는 “a” 제스처에 대한 인식기를 생성하는 것을 중심으로 설명하였다. 도 1에서 설명한 바와 같은 방식으로, 제스처 인식 장치(100)는 복수의 제스처 각각에 대해 상태 시퀀스를 도출하고, 상태 시퀀스에 대한 천이확률값을 계산한 후 이를 이용하여 각 제스처에 대한 인식기를 각각 설계할 수 있다.In Fig. 1, a description has been given mainly of generating a recognizer for the " a " gesture. 1, the gesture recognition apparatus 100 derives a state sequence for each of a plurality of gestures, calculates a transition probability value for the state sequence, and uses them to design a recognizer for each gesture .

다른 제스처에 대해서도 도 1에서 설명한 바와 동일한 방식으로 인식기를 구현할 수 있는 바, 중복되는 설명은 생략하기로 한다.
The recognizer can be implemented in the same manner as described with reference to FIG. 1 for other gestures, and a duplicate description will be omitted.

도 8은 도 1에 의해 설정된 복수의 인식기를 이용하여 제스처를 인식하는 방법을 설명하는 순서도이다.FIG. 8 is a flowchart illustrating a method of recognizing a gesture using a plurality of recognizers set by FIG.

즉, 도 8에서는 각각의 제스처의 인식을 위해 각 인식기마다 상태 시퀀스 및 그에 따른 천이확률값이 대응되어 저장된 것을 가정하여 설명하기로 한다. 이후, 사용자가 특정 제스처를 인식하기 위해 제스처를 수행하는 경우를 가정하여 이후의 과정에 대해 설명하기로 한다.That is, in FIG. 8, it will be assumed that a state sequence and a corresponding transition probability value are stored in association with each recognizer in order to recognize each gesture. Hereinafter, the following process will be described assuming that the user performs a gesture to recognize a specific gesture.

단계 810에서 제스처 인식 장치(100)는 특정 제스처를 수행하는 손(객체)을 포함하는 입력 영상을 획득하고, 획득된 입력 영상에서 손의 색상정보에 기반하여 이진 영상으로 변환한다. 이를 통해 제스처 인식 장치(100)는 손과 같은 객체 영역을 추출할 수 있다.In step 810, the gesture recognition apparatus 100 acquires an input image including a hand (object) performing a specific gesture, and converts the obtained input image into a binary image based on hand color information. Accordingly, the gesture recognition apparatus 100 can extract an object region such as a hand.

이어, 단계 815에서 제스처 인식 장치(100)는 이진 영상에서 손 중심을 추출하고, 제스처 데이터를 각각 획득한다.Next, in step 815, the gesture recognition apparatus 100 extracts the hand center from the binary image and obtains the gesture data, respectively.

연속되는 복수의 입력 영상 또는 비디오의 각 프레임 단위로 단계 810 내지 단계 815를 반복적으로 수행하여 연속된 제스처 데이터를 각각 획득할 수 있다.Steps 810 to 815 may be repeatedly performed on a frame-by-frame basis for a plurality of consecutive input images or videos to acquire successive gesture data, respectively.

단계 820에서 제스처 인식 장치(100)는 각각 획득된 제스처 데이터를 클러스터링하여 클러스터 클래스를 각각 분류한다. 여기서, 클러스터 클래스는 각 제스처 데이터의 클래스의 인덱스를 나타낸다. 도 1에서 전술한 바와 같이, 제스처 데이터는 복수의 클러스터로 클러스터링될 수 있다. 이때, 각 클러스터를 클래스(class)로 인덱싱할 수 있다. 따라서, 본 명세서에서 클러스터 클래스는 각 클러스터의 인덱스로 이해되어야 할 것이다.In step 820, each of the gesture recognition apparatuses 100 clusters the obtained gesture data to classify each of the cluster classes. Here, the cluster class represents an index of a class of each gesture data. As described above in Fig. 1, the gesture data may be clustered into a plurality of clusters. At this time, each cluster can be indexed as a class. Therefore, in this specification, the cluster class should be understood as an index of each cluster.

이때, 제스처 인식 장치(100)는 제스처 데이터와 가장 근접한 클러스터로 제스처 데이터를 분류할 수 있다.At this time, the gesture recognition apparatus 100 can classify the gesture data into a cluster closest to the gesture data.

이어, 단계 825에서 제스처 인식 장치(100)는 시간 순서를 고려한 각각 분류된 클러스터 클래스와 일치하는 상태 시퀀스에 대응하는 제스처로 특정 제스처를 인식한다.Then, in step 825, the gesture recognition apparatus 100 recognizes the specific gesture as a gesture corresponding to a state sequence corresponding to each classified cluster class considering the time sequence.

즉, 복수의 인식기에 설정된 각 상태 시퀀스와 시간 순서를 고려하여 동일한 상태 순서대로 클러스터 클래스가 분류되는 상태 시퀀스에 대응하는 제스처로 특정 제스처를 판별할 수 있다.That is, a specific gesture can be identified by a gesture corresponding to a state sequence in which cluster classes are classified in the same order of states in consideration of each state sequence and time sequence set in a plurality of recognizers.

이때, 해당 클러스터 클래스와 일치하는 상태 시퀀스가 복수인 경우를 가정하자.Here, it is assumed that there are a plurality of state sequences coinciding with the corresponding cluster class.

즉, 클러스터 클래스가 트레이닝된 상태 시퀀스의 순서에 맞게 입력되는 것을 확인한 결과 복수의 상태 시퀀스와 순서가 맞게 입력되는 경우, 제스처 인식 장치(100)는 각 클러스터 클래스를 이용하여 복수의 인식기의 상태 시퀀스의 각 상태에 대한 거리값을 도출하고, 이에 대한 평균 거리를 구한 후 최소값을 가지는 인식기에 대응하는 제스처로 특정 제스처를 인식할 수 있다.That is, when it is confirmed that the cluster class is inputted in accordance with the sequence of the training state sequence, when the plurality of state sequences are inputted in order, the gesture recognition apparatus 100 uses each cluster class to generate a state sequence of a plurality of recognizers A distance value for each state may be derived, an average distance may be determined, and a specific gesture may be recognized by a gesture corresponding to a recognizer having a minimum value.

보다 상세히 설명하면, 제스처 인식 장치(100)는 복수의 인식기에 설정된 상태 시퀀스의 각 상태에 대한 거리값을 초기화한다.More specifically, the gesture recognition apparatus 100 initializes the distance value for each state of the state sequence set in the plurality of recognizers.

이어, 제스처 인식 장치(100)는 시간 순서에 따른 클러스터 클래스가 천이되는 경우, 천이전 클러스터 클래스에 해당하는 상태 시퀀스의 상태에 대한 거리값을 도출한다. 이때, 거리값은 클러스터 클래스로 분류되는 제스처 데이터들과 클러스터 클래스의 중심과의 거리값으로 해당 클러스터 클래스에 대응하는 상태의 거리값을 갱신할 수 있다.Next, when the cluster class according to the time sequence transitions, the gesture recognition apparatus 100 derives a distance value for the state of the state sequence corresponding to the previous cluster class. At this time, the distance value can update the distance value of the state corresponding to the corresponding cluster class by the distance value between the gesture data classified into the cluster class and the center of the cluster class.

예를 들어, 제스처 데이터가 제3 클러스터 클래스로 분류된 후 제2 클러스터 클래스로 전이되었다고 가정하자. 이때, 제스처 인식 장치(100)는 제3 클러스터 클래스에 대응하여 복수의 인식기의 제3 상태의 거리값을 계산하여 갱신할 수 있다.For example, assume that the gesture data is classified as a third cluster class and then transferred to a second cluster class. At this time, the gesture recognition apparatus 100 can calculate and update the distance value of the third state of the plurality of recognizers corresponding to the third cluster class.

이와 같은 과정을 모두 거쳐 마지막 클러스터 클래스로 전이되면, 제스처 인식 장치(100)는 각 인식기의 상태의 거리값을 모두 합한 후 평균값을 계산하여 평균 거리를 계산할 수 있다. When all the processes are completed and the transition is made to the last cluster class, the gesture recognition apparatus 100 may calculate the average distance by summing the distance values of the states of the respective recognizers and then calculating an average value.

제스처 인식 장치(100)는 해당 평균 거리가 최소인 인식기에 대응하는 제스처로 해당 특정 제스처를 인식할 수 있다.
The gesture recognition apparatus 100 can recognize the specific gesture as a gesture corresponding to the recognizer having the minimum average distance.

도 9는 본 발명의 일 실시예에 따른 제스처 인식 장치의 내부 구성을 도시한 블록도이다.FIG. 9 is a block diagram illustrating an internal configuration of a gesture recognition apparatus according to an embodiment of the present invention.

도 9을 참조하면, 본 발명의 일 실시예에 따른 제스처 인식 장치(100)는 영상 획득부(910), 이진 영상 변환부(915), 추적부(920), 분류부(925), 인식기 설정부(930), 인식부(935), 메모리(940) 및 제어부(945)를 포함하여 구성된다.9, the gesture recognition apparatus 100 according to an exemplary embodiment of the present invention includes an image acquisition unit 910, a binary image conversion unit 915, a tracking unit 920, a classification unit 925, 930, a recognition unit 935, a memory 940, and a control unit 945.

영상 획득부(910)는 제스처를 수행하는 객체를 촬영하여 입력 영상을 획득하는 기능을 수행한다. 예를 들어, 영상 획득부(910)는 카메라일 수 있다.The image acquiring unit 910 acquires an input image by photographing an object performing a gesture. For example, the image acquisition unit 910 may be a camera.

이진 영상 변환부(915)는 제스처를 수행하는 객체(손)을 포함하는 연속되는 복수의 입력 영상을 객체의 색상정보에 기반하여 이진 영상으로 변환하는 기능을 수행한다. 이때, 이진 영상 변환부(915)는 시간 순서에 따라 입력 영상을 지속적으로 입력받아 이진 영상으로 변환할 수 있다.The binary image converting unit 915 performs a function of converting a plurality of consecutive input images including an object (hand) performing a gesture into a binary image based on the color information of the object. At this time, the binary image converting unit 915 may continuously receive the input image according to the time order and convert it into a binary image.

추적부(920)는 이진 영상에서 인식한 객체 중심을 추출한 후 이를 추적하여 제스처 데이터를 획득하는 기능을 수행한다.The tracking unit 920 performs a function of extracting the object center recognized in the binary image and tracking the object center to obtain the gesture data.

예를 들어, 객체가 손이라고 가정하면, 이진 영상에서 손의 중심을 추출하고, 후속하는 입력 영상 또는 비디오의 프레임에서 손의 중심을 추적하여 제스처 데이터를 각각 추출할 수 있다.For example, assuming that the object is a hand, the gesture data can be extracted by extracting the center of the hand from the binary image and tracking the center of the hand in a subsequent input image or frame of video.

예를 들어, 사용자가 손을 이용하여 제스처를 사용하는 경우, 제스처 수행에 따른 손을 포함하는 입력 영상이 연속적으로 입력될 수 있으며, 추적부(920)는 연속적으로 입력되는 영상에서 손의 중심을 추적하여 제스처 데이터를 각각 추출할 수 있다.For example, when the user uses the gesture using his / her hand, the input image including the hand due to the execution of the gesture can be continuously input, and the tracking unit 920 can continuously input the center of the hand The gesture data can be extracted by tracing.

분류부(925)는 제스처 데이터를 각 클러스터로 클러스터링하는 기능을 수행한다. 제스처 데이터는 시간 순서를 가지고 입력된다. 이에 따라 시간 순서를 고려하는 경우, 제스처 데이터는 3차원 데이터가 된다. 이에 따라, 분류부(925)는 제스처 데이터를 시간 순서를 고려하지 않고, 2차원 평면상에 정사영시킨 후 제스처 데이터를 각각의 클러스터로 클러스터링한다. 이는 이미 전술한 바와 동일하므로 중복되는 설명은 생략하기로 한다.The classifying unit 925 performs a function of clustering the gesture data into each cluster. The gesture data is input in time sequence. Accordingly, when the time order is considered, the gesture data becomes three-dimensional data. Accordingly, the classifying unit 925 performs orthogonal projection of the gesture data on the two-dimensional plane without considering the time order, and then clusters the gesture data into the respective clusters. Since this is the same as described above, redundant description will be omitted.

이는 이미 도 1에서 충분히 설명한 바 중복되는 설명은 생략하기로 한다.This is described in detail in FIG. 1, and redundant description will be omitted.

인식기 설정부(930)는 각 제스처에 따른 인식기를 설정하는 기능을 수행한다.The recognizer setting unit 930 sets a recognizer for each gesture.

예를 들어, 사용자가 손을 이용하여 “a”를 입력하는 제스처를 가정하면, 인식기 설정부(930)는 제스처 데이터를 이용하여 “a” 제스처에 따른 인식기를 설정할 수 있다.For example, assuming a gesture in which the user inputs "a" using his / her hand, the recognizer setting unit 930 can set the recognizer according to the "a" gesture using the gesture data.

이를 위해, 인식기 설정부(930)는 시퀀스 획득부와 계산부를 더 포함할 수 있다.For this, the recognizer setting unit 930 may further include a sequence acquiring unit and a calculating unit.

여기서, 시퀀스 획득부는 클러스터링 결과를 이용하여 각 클러스터를 각각의 상태로 포함하는 상기 타겟 제스처에 대한 상태 시퀀스를 생성하는 기능을 한다.Here, the sequence acquiring unit functions to generate a state sequence for the target gesture including each cluster in each state using the clustering result.

또한, 계산부는 상태 시퀀스에서 각 상태(클러스터)간의 천이확률값을 게산하는 기능을 한다.In addition, the calculation unit calculates a transition probability value between each state (cluster) in the state sequence.

인식기 설정부(930)는 최종적으로 상태 시퀀스 및 계산된 천이확률값을 이용하여 해당 제스처에 대한 인식기를 설정할 수 있다. The recognizer setting unit 930 may finally set the recognizer for the gesture using the state sequence and the calculated transition probability value.

이에 대해서는 도 1에서 상세히 설명한 바와 동일하므로 중복되는 설명은 생략하기로 한다.This is the same as that described in detail with reference to FIG. 1, so that redundant description will be omitted.

또한, 도 1에서 전술한 바와 같이, 제스처 인식 장치(100)는 인식기 설정부(930)를 미포함할 수도 있다. 이와 같은 경우, 제스처 인식 장치(100)는 각 제스처에 대한 인식기를 설정하는 장치로부터 해당 제스처에 대한 인식기를 각각 입력받을 수도 있음은 당연하다.1, the gesture recognition apparatus 100 may include a recognizer setting unit 930. [ In this case, it is natural that the gesture recognition apparatus 100 can receive the recognizers for the gestures from the apparatus for setting the recognizers for the respective gestures.

인식부(935)는 각 제스처에 대해 설정된 인식기를 이용하여 특정 제스처를 인식하는 기능을 수행한다.The recognition unit 935 performs a function of recognizing a specific gesture using a recognizer set for each gesture.

예를 들어, 인식부(935)는 특정 제스처에 대한 제스처 데이터를 추적부(920)를 통해 지속적으로 입력받고, 입력된 제스처 데이터에 대한 클러스터 클래스를 확인하여 시간 순서를 고려하여 설정된 각 인식기의 상태 시퀀스와 동일한 상태(클러스터) 순서대로 입력되는지 여부를 확인하여 특정 제스처를 인식할 수 있다.For example, the recognition unit 935 receives the gesture data for the specific gesture continuously through the tracking unit 920, identifies the cluster class for the input gesture data, and determines the state of each recognizer It is possible to recognize a specific gesture by confirming whether or not the sequence is input in the same state (cluster) sequence.

즉, 인식부(935)는 복수의 인식기에 설정된 상태 시퀀스 중 시간 순서를 고려하여 클러스터 클래스가 입력되는 순서가 동일한 상태 시퀀스에 해당하는 제스처로 해당 특정 제스처를 인식할 수 있다.That is, the recognizing unit 935 can recognize the specific gesture as a gesture corresponding to a state sequence in which the cluster classes are inputted in consideration of the time sequence among the state sequences set in the plurality of recognizers.

인식부(935)와 인식기 설정부(930)는 각각 이진 영상 변환부(915), 추적부(920) 등을 공유할 수 있음은 당연하다.It is natural that the recognition unit 935 and the recognizer setting unit 930 can share the binary image conversion unit 915 and the tracking unit 920, respectively.

메모리(940)는 본 발명의 일 실시예에 따른 제스처 인식 장치(100)를 운용하기 위해 필요한 다양한 알고리즘, 이 과정에서 파생되는 다양한 데이터 등을 저장하는 수단이다.The memory 940 stores various algorithms necessary for operating the gesture recognition apparatus 100 according to an embodiment of the present invention, various data derived from the gesture recognition apparatus 100, and the like.

제어부(945)는 본 발명의 일 실시예에 따른 제스처 인식 장치(100)의 내부 구성 요소들(예를 들어, 이진 영상 변환부(915), 추적부(920), 분류부(925), 인식기 설정부(930), 인식부(935), 메모리(940) 등)을 제어하는 기능을 수행한다.
The control unit 945 controls the internal components of the gesture recognition apparatus 100 according to an embodiment of the present invention such as the binary image conversion unit 915, A setting unit 930, a recognizing unit 935, a memory 940, and the like).

한편, 본 발명의 실시예에 따른 제스처 인식 방법은 다양한 전자적으로 정보를 처리하는 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 저장 매체에 기록될 수 있다. 저장 매체는 프로그램 명령, 데이터 파일, 데이터 구조등을 단독으로 또는 조합하여 포함할 수 있다. Meanwhile, the gesture recognition method according to an embodiment of the present invention may be implemented in a form of a program command that can be executed through a variety of means for processing information electronically and recorded in a storage medium. The storage medium may include program instructions, data files, data structures, and the like, alone or in combination.

저장 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 소프트웨어 분야 당업자에게 공지되어 사용 가능한 것일 수도 있다. 저장 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 전자적으로 정보를 처리하는 장치, 예를 들어, 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. Program instructions to be recorded on the storage medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of software. Examples of storage media include magnetic media such as hard disks, floppy disks and magnetic tape, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, magneto-optical media and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as devices for processing information electronically using an interpreter or the like, for example, a high-level language code that can be executed by a computer.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.
The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야에서 통상의 지식을 가진 자라면 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention as defined in the appended claims. It will be understood that the invention may be varied and varied without departing from the scope of the invention.

100: 제스처 인식 장치
910: 영상 획득부
915: 이진 영상 변환부
920: 추적부
925; 분류부
930: 인식기 설정부
935: 인식부
940: 메모리
945: 제어부100: Gesture recognition device
910:
915:
920:
925; Classification section
930:
935:
940: Memory
945:

Claims

(a) converting the input image into a plurality of binary images based on color information of the object in an input image including an object performing a target gesture, the input image being a plurality of consecutive images;
(b) extracting a center of the object from the transformed plurality of binary images to obtain gesture data;
(c) clustering the obtained gesture data, and using the clustering result, generating a state sequence including each class of each cluster in each state;
(d) calculating a transition probability value between each cluster in the state sequence; And
(e) setting a recognizer for the target gesture using the state sequence and the calculated transition probability value.

The method according to claim 1,
Before the step (b)
Applying a filter to each of the plurality of binary images to remove an error,
Wherein the step (b) acquires the gesture data using a binary image from which an error has been eliminated.

The method according to claim 1,
Prior to step (c)
And normalizing the gesture data based on the received gesture data.

The method according to claim 1,
The step (c)
Orthogonalizing the obtained gesture data to a plane regardless of a time sequence;
Clustering the orthographic gesture data;
Arranging classes of each cluster according to the time sequence according to the clustering result; And
And generating the state sequence including only the first class of each of the classes sorted according to the time order.

5. The method of claim 4,
Wherein the clustering comprises:
Calculating a distance value between the orthographic gesture data and clustering the distance value;

6. The method of claim 5,
Wherein the distance value is calculated as a distance of Mahalanobis between the orthographic gesture data.

The method according to claim 1,
Wherein the transition probability value
And a probability that a second cluster appears after the first cluster in the status sequence.

A method of recognizing a gesture with a plurality of recognizers for a plurality of gestures being set, the recognizer comprising a state sequence for each gesture and a transition probability value,
Converting each of the plurality of input images into binary images based on color information of the object in a plurality of consecutive input images including an object performing a specific gesture;
Extracting a center of the object from the plurality of converted binary images to obtain gesture data;
Identifying a cluster class for each gesture data; And
Wherein the cluster class recognizes the specific gesture with a gesture that matches the state sequence.

9. The method of claim 8,
Wherein identifying the cluster class of each gesture data comprises:
And classifying cluster classes into clusters having a distance closest to each of the gesture data.

9. The method of claim 8,
When recognizing the gesture, if there are a plurality of cluster classes matching the state sequence,
Updating a distance value of each state of the state sequence of the plurality of recognizers according to a transition between the cluster classes; And
And recognizing the specific gesture with a gesture corresponding to a recognizer whose average distance of the updated distance value is the smallest.

11. The method of claim 10,
Wherein updating the distance of each state of the state sequence comprises:
Calculating a distance value between the center of the cluster class and the gesture data included in the cluster class before the transition according to the transition of the cluster class and updating the distance value for the state of the state sequence according to the cluster class; Gesture recognition method.

12. A recording medium product on which a program code for performing the method according to any one of claims 1 to 12 is recorded.

A binary image converting unit for converting the input image into a plurality of binary images based on color information of the object in an input image including an object performing a target gesture, the input image being a plurality of consecutive images;
A tracking unit for extracting gesture data by extracting a center of the object from the transformed plurality of binary images;
A sequence obtaining unit for clustering the obtained gesture data and generating a state sequence for the target gesture including each cluster in each state using the clustering result;
A calculation unit for calculating a transition probability value between each cluster in the state sequence; And
And a recognizer setting unit for setting a recognizer for the target gesture using the state sequence and the calculated transition probability value.

14. The method of claim 13,
The sequence acquiring unit,
The gesture data obtained by vertically aligning the obtained gesture data on a plane regardless of a time order, clustering the orthogonalized gesture data, sorting classes of each cluster according to the time order according to the clustering result, Wherein the gesture recognition unit acquires the state sequence including only the first class of each class among the classes aligned to the gesture recognition unit.

14. The method of claim 13,
Wherein the sequence acquiring unit calculates and clusters a distance value between the vertically running gesture data.

An apparatus for recognizing a gesture in which a plurality of recognizers for a plurality of gestures are set up, the apparatus comprising a state sequence for each gesture and a transition probability value,
A binary image transform unit for transforming the plurality of input images into binary images based on color information about the object in a plurality of consecutive input images including an object performing a specific gesture;
A tracking unit for extracting a center of the object from the plurality of converted binary images to obtain gesture data;
A classifying unit for classifying a cluster class for each gesture data;
And a recognizing section which recognizes the specific gesture with a gesture corresponding to a state sequence coinciding with the cluster class in consideration of a time sequence.

17. The method of claim 16,
Wherein,
Updates the distance value of each state of the state sequence of the plurality of recognizers according to the transition between the cluster classes when the number of state sequences coinciding with the cluster class according to the time order is larger than the average distance of the updated distance values And recognizes the specific gesture with a gesture corresponding to the recognizer that is the minimum.