KR101299031B1

KR101299031B1 - Handgesture recongnition apparatus and method for thereof

Info

Publication number: KR101299031B1
Application number: KR1020110128674A
Authority: KR
Inventors: 박혜영; 최현석
Original assignee: 재단법인대구경북과학기술원; 경북대학교 산학협력단
Priority date: 2011-12-02
Filing date: 2011-12-02
Publication date: 2013-09-16
Also published as: KR20130062200A

Abstract

손 제스처 인식 장치가 개시된다. 본 손 제스처 인식장치는, 사용자의 움직임을 촬상한 동영상을 입력받는 입력부, 입력받은 동영상을 이용하여 사용자의 손의 위치를 탐색하고, 추적하는 탐색부, 탐색 및 추적된 손 위치를 이용하여 제스처 시퀀스를 추출하는 특징 추출부, 복수의 시퀀스 데이터를 저장하는 저장부, 추출된 시퀀스 데이터와 저장된 복수의 시퀀스 데이터 각각 간의 통계적 거리를 산출하는 패턴 매칭부, 산출된 통계적 거리가 가장 낮은 시퀀스 데이터에 대응되는 제스처를 탐색 및 추적된 손 위치의 제스처로 인식하는 인식부, 및, 인식된 제스처를 출력하는 출력부를 포함한다. Disclosed is a hand gesture recognition apparatus. The apparatus for recognizing a hand gesture includes an input unit for receiving a video photographing a user's movement, a search unit for searching for and tracking the position of the user's hand using the input video, and a gesture sequence using the searched and tracked hand position. A feature extractor for extracting the data; a storage unit for storing a plurality of sequence data; a pattern matching unit for calculating a statistical distance between each of the extracted sequence data and the plurality of stored sequence data; A recognition unit for recognizing the gesture as a gesture of the searched and tracked hand position, and an output unit for outputting the recognized gesture.

Description

HANDGESTURE RECONGNITION APPARATUS AND METHOD FOR THEREOF

본 발명은 손 제스처 인식 장치 및 그 방법에 관한 것으로, 더욱 상세하게는 제스처 개수 또는 사용자 수의 변화에 능동적으로 대응할 수 있으며, 길이가 다른 시계열 데이터에 대한 분석을 용이하게 수행할 수 있는 손 제스처 인식 장치 및 그 방법에 관한 것이다. The present invention relates to a hand gesture recognition apparatus and a method thereof, and more particularly, to a hand gesture recognition capable of actively responding to a change in the number of gestures or the number of users, and for easily analyzing time series data having different lengths. An apparatus and a method thereof are provided.

제스처 인식은 스마트 IT 기기의 개발과 더불어 사용자와 기기 간의 효율적이면서도 자연스러운 상호작용 및 정보 교환을 위한 방법으로 주목받고 있다. 특히 비전-기반 제스처 인식의 장점은 기존의 컴퓨터 사용에서와 같이 키보드나 마우스 같은 추가적인 장비 없이 기기에 기본적으로 장착된 영상 획득 장치를 활용함으로써 추가적인 비용 없이 수행할 수 있다는 장점으로 인해 로봇 제어, 컴퓨터 게임, 스마트 TV, 스마트 폰 등에서 이를 활용하기 위한 많은 연구들이 진행되고 있다.Gesture recognition is attracting attention as a method for efficient and natural interaction and information exchange between a user and a device along with the development of smart IT devices. In particular, the advantage of vision-based gesture recognition is that robot control, computer games can be performed at no additional cost by utilizing an image acquisition device built into the device without additional equipment such as a keyboard or a mouse as in conventional computer use. , Smart TV, smart phones, etc., many researches for utilizing this are in progress.

제스처 인식을 위한 기존 연구로 자기조직화지도(SOM, Self-Organizing Map), 함수적 데이터 분석(FDA, Functional Data Analysis), 동적 시간 정합(DTW, Dynamic Time Warping), 은닉 마르코프 모형(HMM, Hidden Markov Model), 신경망(Neural Network) 등의 다양한 방법들이 시도되어 왔다. 이들은 제스처를 잘 표현하는 특징들을 추출하고 각 동작 별 특성을 분석하여 사례 기반 추론(Case-based Reasoning)이나 규칙 기반 시스템(Rule-based System)을 구성하였다. Existing researches for gesture recognition include Self-Organizing Map (SOM), Functional Data Analysis (FDA), Dynamic Time Warping (DTW), Hidden Markov Model (HMM) Various methods have been tried, such as model and neural network. They extracted the features expressing the gesture well and analyzed the characteristics of each motion to construct case-based reasoning or rule-based system.

하지만, 종래의 방법은 변형된 데이터에 민감하며, 새로운 동작이 추가될 때마다 추가적인 학습 과정을 필요로 하는 어려움이 있다. However, the conventional method is sensitive to modified data and has a difficulty in requiring an additional learning process every time a new operation is added.

또한, 특징 추출 및 분류 단계에서 데이터를 이용하여 학습하는 전통적인 통계적 기법이나 기계학습(machinelearning) 방법의 경우 모형을 설계할 때 충분한 데이터가 있어야 안정적인 성능을 기대할 수 있다. 더불어 앞서 언급하였듯이 새로운 사용자 또는 동작의 추가를 위해서는 별도의 학습과정을 필요로 하는 문제점이 있었다. In addition, in the case of the traditional statistical or machine learning method using data at the feature extraction and classification stages, sufficient performance is needed when designing the model so that stable performance can be expected. In addition, as mentioned above, there is a problem that requires a separate learning process in order to add a new user or action.

따라서, 본 발명의 목적은 제스처 개수 또는 사용자 수의 변화에 능동적으로 대응할 수 있으며, 길이가 다른 시계열 데이터에 대한 분석도 용이하게 수행할 수 있는 손 제스처 인식 장치 및 그 방법을 제공하는 데 있다. Accordingly, an object of the present invention is to provide an apparatus and method for recognizing a hand gesture that can actively respond to a change in the number of gestures or the number of users, and can easily analyze time series data having different lengths.

이상과 같이 목적을 달성하기 위한 본 발명에 따른 손 제스처 인식 장치는, 사용자의 움직임을 촬상한 동영상을 입력받는 입력부, 상기 입력받은 동영상을 이용하여 사용자의 손의 위치를 탐색하고, 추적하는 탐색부, 상기 탐색 및 추적된 손 위치를 이용하여 제스처 시퀀스를 추출하는 특징 추출부, 복수의 시퀀스 데이터를 저장하는 저장부, 상기 추출된 시퀀스 데이터와 상기 저장된 복수의 시퀀스 데이터 각각 간의 통계적 거리를 산출하는 패턴 매칭부, 상기 산출된 통계적 거리가 가장 낮은 시퀀스 데이터에 대응되는 제스처를 상기 탐색 및 추적된 손 위치의 제스처로 인식하는 인식부, 및, 상기 인식된 제스처를 출력하는 출력부를 포함한다. As described above, an apparatus for recognizing a hand gesture according to the present invention includes an input unit for receiving a video photographing a user's movement, and a search unit for searching for and tracking the position of a user's hand using the received video. A feature extractor for extracting a gesture sequence using the searched and tracked hand position, a storage for storing a plurality of sequence data, a pattern for calculating a statistical distance between the extracted sequence data and each of the stored plurality of sequence data A matching unit, a recognition unit for recognizing a gesture corresponding to the sequence data having the lowest statistical distance as the gesture of the searched and tracked hand position, and an output unit for outputting the recognized gesture.

이 경우, 상기 저장부는, 최근접 이웃 분류기(K-Nearest Neighbor Classifier)를 통하여 분류된 복수의 시퀀스 데이터를 저장하는 것이 바람직하다. In this case, the storage unit preferably stores a plurality of sequence data classified through a nearest neighbor classifier (K-Nearest Neighbor Classifier).

한편, 상기 패턴 매칭부는, 아래의 수학식을 이용하여 상기 통계적 거리를 산출하는 것이 바람직하다. On the other hand, the pattern matching unit, it is preferable to calculate the statistical distance using the following equation.

여기서, 상기 X는 상기 추출된 시퀀스 데이터, 상기 Y는 저장된 시퀀스 데이터, 상기 d(X,Y)는 상기 X와 상기 Y간의 통계적 거리, 상기 corr(X,Y)는 상기 X와 상기 Y간의 상관 계수, 상기 cov(X,Y)는 상기 X와 상기 Y 간의 공분산, 상기 std(X,Y)는 상기 X와 상기 Y 간의 표준편차이다. Wherein X is the extracted sequence data, Y is stored sequence data, d (X, Y) is a statistical distance between X and Y, and corr (X, Y) is a correlation between X and Y The coefficient, cov (X, Y), is the covariance between X and Y, and std (X, Y) is the standard deviation between X and Y.

한편, 상기 저장부는, 상기 복수의 시퀀스 데이터를 벡터값으로 저장하는 것이 바람직하다. The storage unit preferably stores the plurality of sequence data as a vector value.

한편, 상기 특징 추출부는, 상기 제스처 시퀀스를 벡터값으로 추출하는 것이 바람직하다. On the other hand, it is preferable that the feature extractor extracts the gesture sequence as a vector value.

한편, 본 실시 예에 따른 손 제스처 인식 방법은, 사용자의 움직임을 촬상한 동영상을 입력받는 단계, 상기 입력받은 동영상을 이용하여 사용자의 손의 위치를 탐색하고, 추적하는 단계, 상기 탐색 및 추적된 손 위치를 이용하여 제스처 시퀀스를 추출하는 단계, 상기 추출된 시퀀스 데이터와 기저장된 복수의 시퀀스 데이터 각각 간의 통계적 거리를 산출하는 단계, 상기 산출된 통계적 거리가 가장 낮은 시퀀스 데이터에 대응되는 제스처를 상기 탐색 및 추적된 손 위치의 제스처로 인식하는 단계, 및, 상기 인식된 제스처를 출력하는 단계를 포함한다. On the other hand, the hand gesture recognition method according to the present embodiment, the step of receiving a video captured by the user's movement, using the received video to search for the location of the user's hand, tracking, the search and tracked Extracting a gesture sequence using a hand position, calculating a statistical distance between the extracted sequence data and each of the plurality of prestored sequence data, and searching for a gesture corresponding to the sequence data having the lowest statistical distance. And recognizing it as a gesture of the tracked hand position, and outputting the recognized gesture.

이 경우, 상기 기저장된 복수의 시퀀스 데이터는, 최근접 이웃 분류기(K-Nearest Neighbor Classifier)를 통하여 분류된 복수의 시퀀스 데이터인 것이 바람직하다. In this case, the prestored plurality of sequence data is preferably a plurality of sequence data classified through a nearest neighbor classifier (K-Nearest Neighbor Classifier).

한편, 상기 통계적 거리를 산출하는 단계는, 아래의 수학식을 이용하여 상기 통계적 거리를 산출하는 것이 바람직하다. On the other hand, the step of calculating the statistical distance, it is preferable to calculate the statistical distance using the following equation.

한편, 상기 기저장된 복수의 시퀀스 데이터는, 벡터값을 갖는 것이 바람직하다. On the other hand, the plurality of pre-stored sequence data preferably has a vector value.

한편, 상기 추출하는 단계는, 상기 제스처 시퀀스를 벡터값으로 추출하는 것이 바람직하다. In the extracting step, the gesture sequence may be extracted as a vector value.

이와 같이 본 실시 예에 따른 손 제스처 인식 장치 및 손 제스처 인식 방법은 최근접 이웃 분류기(K-Nearest Neighbor Classifier)를 사용함으로써 제스처 개수 또는 사용자 수의 변화에 능동적으로 대응할 수 있으며, 동적 시간 정합(DTW) 방법을 사용하여 길이가 다른 시계열 데이터에 대한 분석도 용이하게 수행할 수 있게 된다. 또한, 동적 시간 정합(DTW)을 이용해 정렬된 시계열 데이터에 대해 통계적 상관관계 정보를 추가적으로 사용함으로써 두 시계열 데이터 사이의 유사도를 측정하여 인식 과정에 사용함으로써 더 안정적이고 강건한 성능의 인식이 가능하다. As described above, the hand gesture recognition apparatus and the hand gesture recognition method according to the present embodiment can actively respond to the change in the number of gestures or the number of users by using a K-Nearest Neighbor Classifier, and perform dynamic time matching (DTW). ), The analysis of time series data of different lengths can be easily performed. In addition, by using statistical correlation information on time series data sorted using dynamic time matching (DTW), the similarity between two time series data is measured and used in the recognition process, thereby enabling more stable and robust performance recognition.

도 1은 본 발명의 일 실시 예에 따른 손 제스처 인식 장치의 구성을 나타내는 블록도,
도 2는 도 1의 패턴 매칭부의 동작을 설명하기 위한 도면,
도 3은 본 실시 예에 따른 손 제스처 인식 장치의 실험에 이용된 데이터의 예를 나타내는 도면, 그리고,
도 4는 본 실시 예에 따른 손 제스처 인식 장치의 실험에 이용된 이미지의 예를 나타내는 도면,
도 5 및 도 6은 본 실시 예에 따른 손 제스처 인식 장치의 실험 결과를 나타내는 도면,
도 7은 본 발명의 일 실시 예에 따른 손 제스처 인식 방법을 설명하기 위한 흐름도이다. 1 is a block diagram showing the configuration of a hand gesture recognition apparatus according to an embodiment of the present invention;
2 is a view for explaining the operation of the pattern matching unit of FIG.
3 is a view showing an example of data used in an experiment of a hand gesture recognition apparatus according to the present embodiment, and
4 is a diagram illustrating an example of an image used in an experiment of a hand gesture recognizing apparatus according to the present embodiment;
5 and 6 are diagrams showing experimental results of the hand gesture recognition apparatus according to the present embodiment;
7 is a flowchart illustrating a hand gesture recognition method according to an embodiment of the present invention.

이하 첨부된 도면들을 참조하여 본 발명의 일시 예를 보다 상세하게 설명한다. Hereinafter, a temporal example of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른 손 제스처 인식 장치의 구성을 나타내는 블록도이다. 1 is a block diagram illustrating a configuration of an apparatus for recognizing a hand gesture according to an exemplary embodiment.

도 1을 참조하면, 본 손 제스처 인식 장치(100)는 입력부(110), 출력부(120), 저장부(130), 탐색부(140), 특징 추출부(150), 패턴 매칭부(160), 인식부(170) 및 제어부(180)를 포함한다. 여기서 손 제스처 인식 장치(100)는 손 제스처 인식만을 수행하는 장치일 수 있으며, 손 제스처 인식을 특정이 사용자 입력 정보로 이용하는 스마트 TV, 게임 장치 등일 수 있다. .Referring to FIG. 1, the hand gesture recognizing apparatus 100 may include an input unit 110, an output unit 120, a storage unit 130, a search unit 140, a feature extractor 150, and a pattern matching unit 160. ), The recognition unit 170 and the controller 180. The hand gesture recognizing apparatus 100 may be a device that performs only hand gesture recognition, and may be a smart TV or a game device that uses hand gesture recognition as user input information. .

입력부(110)는 사용자의 움직임을 촬상한 동영상을 입력받는다. 구체적으로, 입력부(110)는 외부의 촬상장치에서 촬상된 동영상을 입력받을 수 있다. 한편, 본 실시 예에서는 외부 촬상장치에서 촬상된 동영상을 입력받아 처리하는 것만을 설명하였지만, 입력부(110)는 촬상소자로 구현되어 직접 사용자를 촬상하여 동영상을 생성할 수도 있다. The input unit 110 receives a video captured by the user's movement. Specifically, the input unit 110 can receive a moving image captured by an external imaging device. Meanwhile, in the present exemplary embodiment, only the process of receiving and processing a video captured by an external image capturing apparatus is described, but the input unit 110 may be implemented as an image capturing device to directly generate a video by capturing a user.

출력부(120)는 인식된 제스처를 출력한다. 구체적으로, 출력부(120)는 후술할 인식부(170)에서 인식된 제스처를 사용자에게 출력할 수 있다. 예를 들어, 출력부(120)는 CRT, LCD 등과 같은 디스플레이 장치로 구현될 수 있으며, 인식된 제스처를 표시할 수 있다. 또한, 출력부(120)는 외부 장치와 통신 가능한 통신 인터페이스로 구현될 수 있으며, 인식된 제스처의 정보를 전송하여, 외부 장치에서 인식된 제스처가 표시되거나 이를 이용하는 형태로도 구현될 수 있다. The output unit 120 outputs the recognized gesture. In detail, the outputter 120 may output a gesture recognized by the recognizer 170 to be described later to the user. For example, the output unit 120 may be implemented as a display device such as a CRT or an LCD, and may display a recognized gesture. In addition, the output unit 120 may be implemented as a communication interface capable of communicating with an external device, and may be implemented in a form of displaying or using a gesture recognized by the external device by transmitting information of the recognized gesture.

저장부(130)는 입력받은 동영상을 저장할 수 있다. 구체적으로, 저장부(130)는 입력부(110)를 통하여 입력받은 동영상을 임시 저장할 수 있다. The storage unit 130 may store the input moving image. In detail, the storage unit 130 may temporarily store a video received through the input unit 110.

그리고 저장부(130)는 복수의 시퀀스 데이터를 저장한다. 이러한 복수의 시퀀스 데이터는 최근접 이웃 분류기(K-Nearest Neighbor Classifier)를 통하여 분류된 복수의 시퀀스 데이터이다. 그리고 각 시퀀스 데이터는 m×L? 차원의 벡터 ?

일 수 있다. 이와 같이 최근접 이웃 분류기로 분류된 시퀀스 데이터를 저장하는바, 새로운 사용자 또는 제스처가 추가되더라도, 학습 데이터를 등록하는 과정만으로 학습이 수행될 수 있게 된다. The storage unit 130 stores a plurality of sequence data. Such a plurality of sequence data is a plurality of sequence data classified through a nearest neighbor classifier (K-Nearest Neighbor Classifier). And each sequence data is m × L? Vector of dimensions?

Lt; / RTI > As such, the sequence data classified by the nearest neighbor classifier is stored. Even though a new user or gesture is added, learning can be performed only by registering the learning data.

그리고 저장부(130)는 손 제스처 인식 장치(100) 내의 저장매체 및 외부 저장매체, 예를 들어 USB 메모리를 포함한 Removable Disk, 호스트(Host)에 연결된 저장매체, 네트워크를 통한 웹서버(Web server) 등으로 구현될 수 있다. The storage unit 130 may include a storage medium and an external storage medium in the hand gesture recognition apparatus 100, for example, a removable disk including a USB memory, a storage medium connected to a host, and a web server through a network. And the like.

탐색부(140)는 입력받은 동영상을 이용하여 사용자의 손의 위치를 탐색하고, 추적한다. 구체적으로, 탐색부(140)는 입력된 동영상에서 손의 위치를 탐색하고, 탐색된 손의 위치의 변화를 시간 순서대로 추적할 수 있다. The searcher 140 searches for and tracks the position of the user's hand using the received video. In detail, the searcher 140 may search for the position of the hand in the input video and track the change in the searched position of the hand in chronological order.

특징 추출부(150)는 탐색 및 추적된 손 위치를 이용하여 제스처 시퀀스를 추출한다. 구체적으로, 특징 추출부(150)는 탐색부(140)에서 탐색 및 추적된 손 위치를 이용하여 제스처 인식에 사용될 하나의 제스처 시퀀스를 추출한다. 이러한 제스처 시퀀스는 m×L? 차원의 벡터 ?

일 수 있다.The feature extractor 150 extracts a gesture sequence using the searched and tracked hand position. In detail, the feature extractor 150 extracts one gesture sequence to be used for gesture recognition using the hand position searched and tracked by the searcher 140. This gesture sequence is m × L? Vector of dimensions?

Lt; / RTI >

패턴 매칭부(160)는 추출된 시퀀스 데이터와 상기 저장된 복수의 시퀀스 데이터 각각 간의 통계적 거리를 산출한다. 패턴 매칭부(160)의 구체적인 동작에 대해서는 도 2를 참조하여 후술한다. 이와 같이 패턴 매칭부(160)는 특징 추출부(150)에서 추출된 제스처 시퀀스에 대한 정보를 그대로 사용함으로써, 특징 추출을 위한 추가적인 연산을 수행하지 않는다. The pattern matching unit 160 calculates a statistical distance between the extracted sequence data and each of the stored plurality of sequence data. A detailed operation of the pattern matching unit 160 will be described later with reference to FIG. 2. As such, the pattern matching unit 160 does not perform additional operations for feature extraction by using the information on the gesture sequence extracted by the feature extraction unit 150 as it is.

인식부(170)는 산출된 통계적 거리가 가장 낮은 시퀀스 데이터에 대응되는 제스처를 탐색 및 추적된 손 위치의 제스처로 인식한다. 한편, 인식부(170)는 산출된 통계적 거리와 기설정된 값과 비교할 수 있다. 예를 들어, 기저장된 시퀀스 데이터에 대응되는 손 제스처와 전혀 상이한 손 제스처가 입력된 경우, 입력된 손 제스처가 기저장된 손 제스처 중 하나로 인식되는 것을 방지하기 위하여, 산출된 통계적 거리와 기설정된 값을 비교하고, 산출된 통계적 거리 모두가 기설정된 값보다 큰 경우, 일치되는 손 제스처가 없는 것으로 인식할 수도 있다. The recognition unit 170 recognizes the gesture corresponding to the sequence data having the lowest statistical distance as the gesture of the searched and tracked hand position. The recognition unit 170 may compare the calculated statistical distance with a preset value. For example, when a hand gesture that is completely different from the hand gesture corresponding to the stored sequence data is input, the calculated statistical distance and the preset value may be set to prevent the input hand gesture from being recognized as one of the stored hand gestures. In comparison, when all of the calculated statistical distances are larger than the predetermined value, it may be recognized that there is no matched hand gesture.

제어부(180)는 손 제스처 인식 장치(100) 내의 각 구성을 제어한다. 구체적으로, 입력부(110)를 통하여 동영상이 입력되면, 제어부(180)는 입력된 동영상에서 제스처 시퀀스가 추출되도록 탐색부(140) 및 특징 추출부(150)를 제어하고, 추출된 제스처 시퀀스와 저장부(130)에 기저장된 제스처 시퀀스 간의 통계적 거리가 계산되도록 패턴 매칭부(160)를 제어할 수 있다. 그리고 통계적 거리가 계산되면, 제어부(180)는 입력된 동영상에 계산된 통계적 거리가 가장 가까운 제스처 시퀀스에 대응되는 제스처가 포함된 것으로 인식하고, 인식된 제스처가 출력되도록 출력부(120)를 제어할 수 있다. The controller 180 controls each component in the hand gesture recognition apparatus 100. Specifically, when a video is input through the input unit 110, the controller 180 controls the searcher 140 and the feature extractor 150 to extract the gesture sequence from the input video, and stores the extracted gesture sequence. The pattern matching unit 160 may be controlled to calculate a statistical distance between gesture sequences previously stored in the unit 130. When the statistical distance is calculated, the controller 180 recognizes that the gesture corresponding to the gesture sequence closest to the calculated statistical distance is included in the input video, and controls the output unit 120 to output the recognized gesture. Can be.

한편, 제어부(180)는 입력된 영상에 사용자의 손 위치가 추출되지 않는 경우나 기설정된 값 이상인 통계적 거리만이 산출된 경우에는 인식된 손 제스처가 없음이 출력되도록 출력부(120)를 제어할 수 있다. Meanwhile, the controller 180 may control the output unit 120 to output no recognized hand gesture when the user's hand position is not extracted from the input image or when only a statistical distance of more than a predetermined value is calculated. Can be.

따라서, 본 실시 예에 따른 손 제스처 인식 장치(100)는 동적 시간 정합(DTW)을 이용해 정렬된 시계열 데이터에 대해 통계적 상관관계 정보를 추가적으로 사용함으로써 두 시계열 데이터 사이의 유사도를 측정하여 인식 과정에 사용함으로써 더 안정적이고 강건한 성능의 인식이 가능하다. 또한, 최근접 이웃 분류기를 사용함으로써 제스처 개수 또는 사용자 수의 변화에 능동적으로 대응할 수 있으며, 동적 시간 정합(DTW) 방법을 사용하여 길이가 다른 시계열 데이터에 대한 분석도 용이하게 수행할 수 있게 된다. Therefore, the hand gesture recognizing apparatus 100 according to the present embodiment measures statistical similarity between two time series data by additionally using statistical correlation information on time series data arranged by using dynamic time matching (DTW) to use in the recognition process. This allows more stable and robust performance recognition. In addition, by using the nearest neighbor classifier, it is possible to actively respond to the change in the number of gestures or the number of users, and it is also possible to easily analyze time series data having different lengths using the dynamic time matching (DTW) method.

도 2는 도 1의 패턴 매칭부의 동작을 설명하기 위한 도면이다. FIG. 2 is a diagram for describing an operation of the pattern matching unit of FIG. 1.

도 2를 참조하면, 본 실시 예에 따른 패턴 매칭 방법은 두 단계로 구성된다. 첫째 단계에서는 먼저 동적 시간 정합(DTW)을 사용하여 길이가 다른 두 샘플의 정렬을 수행한다. 여기서, 동적 시간 정합은 두 시퀀스 데이터를 정렬하기 위한 최적 경로를 계산할 수 있는 방법으로 시계열 데이터 사이의 거리 계산에도 사용될 수 있는 방법이다. 하지만, 제스처 인식에서는 이러한 동적 시간 정합을 통한 거리 계산은 사용자 간 또는 제스처 간의 다양한 변화 요인으로 인해 적합하지 않다. 이에 대한 대안으로 본 실시 예에서는 두 정렬된 시퀀스 사이의 거리 계산을 위해 통계적 척도를 사용하였는데 이는 아래의 수학식 1과 같다.Referring to FIG. 2, the pattern matching method according to the present embodiment has two steps. In the first step, we first use dynamic time matching (DTW) to align two samples of different lengths. Here, dynamic time matching is a method that can calculate an optimal path for aligning two sequence data and can also be used to calculate the distance between time series data. However, in gesture recognition, the distance calculation through this dynamic time matching is not suitable due to various factors between users or between gestures. As an alternative to this, in this embodiment, a statistical measure is used to calculate the distance between two aligned sequences, which is expressed by Equation 1 below.

여기서, X는 상기 추출된 시퀀스 데이터, Y는 저장된 시퀀스 데이터, d(X,Y)는 X와 Y간의 통계적 거리, corr(X,Y)는 X와 Y간의 상관 계수, cov(X,Y)는 X와 Y 간의 공분산, std(X,Y)는 X와 Y 간의 표준편차이다. Where X is the extracted sequence data, Y is stored sequence data, d (X, Y) is the statistical distance between X and Y, corr (X, Y) is the correlation coefficient between X and Y, cov (X, Y) Is the covariance between X and Y, and std (X, Y) is the standard deviation between X and Y.

제스처 인식을 위해 수학식 1을 사용하기 위해서는 제스처 시퀀스를 벡터 형태로 전환하는 과정이 필요하다. 이를 위해 먼저 길이가 L인 시퀀스 데이터를

?라 하고, 시각 t에서 m개의 값을 가진다고 하자(즉, _ ). 그렇다면, 하나의 제스처 시퀀스 데이터는 m×L? 차원의 벡터 ?

로 표현할 수 있게 된다. 즉 수식 (1)을 통해 벡터화된 데이터들 사이의 통계적 거리를 계산할 수 있게 된다. In order to use Equation 1 for gesture recognition, a process of converting a gesture sequence into a vector form is required. To do this, we first use sequence data of length L

Suppose we have m values at time t (that is, _). If so, one gesture sequence data is m × L? Vector of dimensions?

Can be expressed as In other words, it is possible to calculate the statistical distance between the vectorized data through the equation (1).

도 3 내지 도 6은 본 실시 예에 따른 손 제스처 인식 장치의 실험 결과를 나타내는 도면이다. 3 to 6 are diagrams showing experimental results of the hand gesture recognition apparatus according to the present embodiment.

구체적으로, 제안된 방법의 적합성을 판단하기 위해 기존의 동적 시간 정합(DTW)을 이용한 거리 계산 방법과 비교 실험을 수행하였다. 제안된 방법에서 사용된 분류기는 학습 데이터의 개수에 영향을 받는 방법이기 때문에, 실제 응용에 적합한 학습 데이터 수를 알아보기 위해 학습 데이터의 개수를 변경해가며 인식률을 관찰하였다. 다음으로, 실제 응용에서는 제스처의 개수가 변동될 수 있기 때문에 제스처 개수 변화에 따른 인식률의 변화도 관찰하였다. 각 실험은 학습 및 테스트 데이터를 랜덤 샘플링(random sampling)을 통해 선택하고 인식률을 관찰하는 과정을 50번 반복실험하고 이때의 평균 인식률을 정리하였다.Specifically, to determine the suitability of the proposed method, a comparison experiment with the conventional distance calculation method using dynamic time matching (DTW) was performed. Since the classifier used in the proposed method is influenced by the number of learning data, the recognition rate is observed by changing the number of learning data to find the number of learning data suitable for the actual application. Next, in actual applications, the number of gestures may vary, so the change in recognition rate according to the number of gestures is also observed. Each experiment was repeated 50 times the process of selecting the learning and test data through random sampling and observing the recognition rate, and summarized the average recognition rate at this time.

도 3은 실험에 사용된 데이터에 대한 요약이다. 3 is a summary of the data used in the experiment.

도 3을 참조하면, 도 3의 데이터는 촬상장치로부터 거리가 다른 두 상황에서 4명의 사용자로부터 5개의 제스처를 획득한 것으로, 5개의 제스처는 각각 ‘(A) 좌측에서 우측, (B) 위에서 아래, (C) 시계 방향 회전, (D) 반 시계 방향 회전, (E) 푸쉬’ 제스처이다. 이 데이터는 비디오 시퀀스로부터 전처리 과정으로 손 탐색 및 추적, 그리고 제스처 분할 과정을 수행한 것으로 총 167개의 데이터로 구성되어 있으며, 하나의 제스처 데이터는 3차원 공간상의 좌표가 시간에 따라 시퀀스를 이루고 있는 시계열 데이터이다. Referring to FIG. 3, the data of FIG. 3 is obtained by obtaining five gestures from four users in two situations in which the distances from the imaging apparatus are different, and the five gestures are respectively (A) from left to right and (B) from top to bottom. , (C) clockwise rotation, (D) counterclockwise rotation, and (E) push 'gestures. This data is a hand search, tracking, and gesture segmentation process from video sequence to preprocessing. It consists of 167 data. One gesture data is a time series in which coordinates in three-dimensional space are sequenced over time. Data.

도 4는 실험 과정에서 입력된 비디오로부터 추출된 샘플 영상이다. 4 is a sample image extracted from a video input during an experiment.

도 4b를 참조하면, 도면상의 적색 점은 탐색된 손의 중심을 나타낸다.Referring to FIG. 4B, the red dot on the figure indicates the center of the searched hand.

도 5는 학습 데이터 개수 변환에 따른 인식 성능의 변화를 나타낸다. 여기서, 가로축은 각 제스처에 대한 학습 데이터 수를 나타내고, 세로축은 평균 인식률을 나타낸다. 그리고 'DTW + CORR’은 본 실시 예에 따른 인식 방법의 성능을 나타내며, ‘DTW’는 비교를 위해 사용된 기존의 방법을 의미한다. 5 shows a change in the recognition performance according to the conversion of the number of training data. Here, the horizontal axis represents the number of learning data for each gesture, and the vertical axis represents the average recognition rate. In addition, "DTW + CORR" represents the performance of the recognition method according to the present embodiment, "DTW" refers to the existing method used for comparison.

도 5를 참조하면, 제안된 방법이 학습 데이터 개수에 상관없이 항상 기존의 방법보다 좋은 성능을 나타냄을 확인할 수 있다. 게다가 학습 데이터의 개수가 적은 경우 제안된 방법이 기존의 방법에 비해 훨씬 더 우수한 성능을 나타내고 있음을 관찰할 수 있었다. 이를 통해 학습 데이터의 개수가 적은 경우에는 제안된 방법의 사용이 더 적합하다고 볼 수 있다. 그리고 제안된 방법의 경우 학습 데이터의 개수가 5개 이상인 경우부터는 안정된 수준의 인식 성능을 보여주었다. 한편, 실험에서는 두 가지 형태의 동적 시간 정합(DTW) 알고리즘을 사용하였는데 각각 3-step 및 5-step으로 표시하였으며, 실험을 통해 동적 시간 정합(DTW) 알고리즘에 따른 변화는 크게 나타나지 않음을 볼 수 있었다.Referring to FIG. 5, it can be seen that the proposed method always shows better performance than the existing method regardless of the number of training data. In addition, we can observe that the proposed method outperforms the existing method when the number of training data is small. This suggests that the proposed method is more suitable when the number of learning data is small. The proposed method showed stable recognition performance when the number of training data is 5 or more. On the other hand, two types of dynamic time matching (DTW) algorithms were used in the experiments, which were represented as 3-step and 5-step, respectively. there was.

도 6은 제스처 개수에 따른 인식 성능의 변화를 나타낸 도면이다. 6 is a diagram illustrating a change in recognition performance according to the number of gestures.

도 6을 참조하면, 제스처 개수의 증가가 인식 성능의 감소를 초래함을 볼 수 있다. 왜냐하면, 제스처 개수가 증가할수록 문제의 복잡도가 높아지기 때문이다. 그럼에도, 본 실시 예에 따른 인식 장치의 감소 정도는 기존의 방법에 비해 작게 나타남을 볼 수 있었으며, 기존의 방법이 제스처 개수의 증가에 따라 인식 성능이 지속적으로 감소하는데 반해 제안된 방법은 5개까지 제스처 개수를 증가시키더라도 약 90%대의 인식 성능을 유지함도 관찰할 수 있었다.Referring to FIG. 6, it can be seen that an increase in the number of gestures causes a decrease in recognition performance. This is because the complexity of the problem increases as the number of gestures increases. Nevertheless, it can be seen that the degree of reduction of the recognition device according to the present embodiment is smaller than that of the conventional method. While the recognition method continuously decreases with the increase of the number of gestures, the proposed method can reduce the proposed method up to five. Increasing the number of gestures was also observed to maintain about 90% recognition performance.

참고로 앞선 실험에서도 볼 수 있었지만 제안된 제스처 인식 시스템은 새로운 제스처가 추가되더라도, SVM이나 HMM처럼 복잡한 학습 과정 없이 새로운 학습 데이터를 등록하는 과정만으로 새로운 제스처에 대해서 인식을 수행할 수 있는 장점이 있다.For reference, although it was seen in the previous experiment, the proposed gesture recognition system has the advantage that even if a new gesture is added, the new gesture can be recognized only by registering new learning data without complicated learning process like SVM or HMM.

도 7은 본 발명의 일 실시 예에 따른 손 제스처 인식 방법을 설명하기 위한 흐름도이다. 7 is a flowchart illustrating a hand gesture recognition method according to an embodiment of the present invention.

먼저, 사용자의 움직임을 촬상한 동영상을 입력받는다(S710). 구체적으로, 외부의 촬상장치에서 사용자를 촬상한 동영상을 입력받을 수 있다. 한편, 본 실시 예에서는 외부 촬상장치에서 촬상된 동영상을 입력받아 처리하는 것만을 설명하였지만, 이와 같은 단계는 직접 사용자의 움직임을 촬상하는 동작으로 구현될 수도 있다. First, a video captured by a user's movement is received (S710). In detail, an external image capturing apparatus may receive a video captured by a user. Meanwhile, in the present exemplary embodiment, only the process of receiving and processing a moving image captured by an external image capturing apparatus has been described, but such a step may be implemented by directly capturing a user's movement.

그리고 입력받은 동영상을 이용하여 사용자의 손의 위치를 탐색하고, 추적한다(S720). 구체적으로, 입력된 동영상에서 손의 위치를 탐색하고, 탐색된 손의 위치의 변화를 시간 순서대로 추적할 수 있다. Then, the position of the user's hand is searched and tracked using the input video (S720). Specifically, the position of the hand may be searched in the input video, and the change of the position of the searched hand may be tracked in chronological order.

그리고 탐색 및 추적된 손 위치를 이용하여 제스처 시퀀스를 추출한다(S730). 구체적으로, 탐색 및 추적된 손 위치를 이용하여 제스처 인식에 사용될 하나의 제스처 시퀀스를 추출할 수 있다. 이러한 제스처 시퀀스는 m×L? 차원의 벡터 ?

일 수 있다.Then, the gesture sequence is extracted using the searched and tracked hand position (S730). Specifically, one gesture sequence to be used for gesture recognition may be extracted using the searched and tracked hand position. This gesture sequence is m × L? Vector of dimensions?

Lt; / RTI >

그리고 추출된 시퀀스 데이터와 기저장된 복수의 시퀀스 데이터 각각 간의 통계적 거리를 산출한다(S740). 통계적 거리를 산출하는 동작은 도 2와 관련하여 앞서 설명하였는바, 중복 설명은 생략한다. In operation S740, a statistical distance between the extracted sequence data and each of the plurality of prestored sequence data is calculated. The operation of calculating the statistical distance has been described above with reference to FIG. 2, and thus redundant description thereof will be omitted.

그리고 산출된 통계적 거리가 가장 낮은 시퀀스 데이터에 대응되는 제스처를 탐색 및 추적된 손 위치의 제스처로 인식한다(S750). 이때, 산출된 통계적 거리와 기설정된 값과 비교할 수 있다. 예를 들어, 기저장된 시퀀스 데이터에 대응되는 손 제스처와 전혀 상이한 손 제스처가 입력된 경우, 입력된 손 제스처가 기저장된 손 제스처 중 하나로 인식되는 것을 방지하기 위하여, 산출된 통계적 거리와 기설정된 값을 비교하고, 산출된 통계적 거리 모두가 기설정된 값보다 큰 경우, 일치되는 손 제스처가 없는 것으로 인식할 수도 있다. The gesture corresponding to the sequence data having the lowest statistical distance is recognized as the gesture of the searched and tracked hand position (S750). In this case, the calculated statistical distance may be compared with a predetermined value. For example, when a hand gesture that is completely different from the hand gesture corresponding to the stored sequence data is input, the calculated statistical distance and the preset value may be set to prevent the input hand gesture from being recognized as one of the stored hand gestures. In comparison, when all of the calculated statistical distances are larger than the predetermined value, it may be recognized that there is no matched hand gesture.

그리고 인식된 제스처를 출력한다(S760). 구체적으로, CRT, LCD 등과 같은 디스플레이 장치로 인식된 제스처를 출력하거나, 외부 장치에서 인식된 제스처가 표시되거나 이를 이용하도록 인식된 제스처를 외부 장치에 전송할 수도 있다. In operation S760, the recognized gesture is output. In detail, a gesture recognized by a display device such as a CRT, an LCD, or the like may be output, or a gesture recognized by an external device may be displayed or transmitted to the external device.

따라서, 본 실시 예에 따른 손 제스처 인식 방법은 동적 시간 정합(DTW)을 이용해 정렬된 시계열 데이터에 대해 통계적 상관관계 정보를 추가적으로 사용함으로써 두 시계열 데이터 사이의 유사도를 측정하여 인식 과정에 사용함으로써 더 안정적이고 강건한 성능의 인식이 가능하다. 또한, 최근접 이웃 분류기를 사용함으로써 제스처 개수 또는 사용자 수의 변화에 능동적으로 대응할 수 있으며, 동적 시간 정합(DTW) 방법을 사용하여 길이가 다른 시계열 데이터에 대한 분석도 용이하게 수행할 수 있게 된다. 도 7과 같은 손 제스처 인식 방법은 도 1의 구성을 가지는 손 제스처 인식 장치상에서 실행될 수 있으며, 그 밖의 다른 구성을 가지는 손 제스처 인식 장치상에서도 실행될 수 있다. Therefore, the hand gesture recognition method according to the present embodiment is more stable by measuring the similarity between two time series data by additionally using statistical correlation information on time series data arranged using dynamic time matching (DTW). And robust performance recognition is possible. In addition, by using the nearest neighbor classifier, it is possible to actively respond to the change in the number of gestures or the number of users, and it is also possible to easily analyze time series data having different lengths using the dynamic time matching (DTW) method. The hand gesture recognition method as shown in FIG. 7 may be executed on the hand gesture recognition apparatus having the configuration of FIG. 1 or may be executed on the hand gesture recognition apparatus having another configuration.

또한, 상술한 바와 같은 손 제스처 인식 방법은, 상술한 바와 같은 손 제스처 인식 방법을 실행하기 위한 적어도 하나의 실행 프로그램으로 구현될 수 있으며, 이러한 실행 프로그램은 컴퓨터 판독 기록매체에 저장될 수 있다. In addition, the hand gesture recognition method as described above may be implemented with at least one execution program for executing the hand gesture recognition method as described above, and the execution program may be stored in a computer-readable recording medium.

따라서, 본 발명의 각 블록들은 컴퓨터 판독가능한 기록매체 상의 컴퓨터 기록 가능한 코드로써 실시될 수 있다. 컴퓨터 판독가능한 기록매체는 컴퓨터시스템에 의해 판독될 수 있는 데이터를 저장할 수 있는 디바이스가 될 수 있다. Thus, each block of the present invention may be embodied as computer readable code on a computer readable recording medium. The computer readable recording medium may be a device capable of storing data that can be read by a computer system.

이상에서는 본 발명의 바람직한 실시 예에 대해서 도시하고, 설명하였으나, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자라면 누구든지 다양한 변형 실시할 수 있는 것은 물론이고, 그와 같은 변경은 청구범위 기재의 범위 내에 있게 된다. Although the above has been illustrated and described with respect to preferred embodiments of the present invention, the present invention is not limited to the above-described specific embodiments, and the present invention belongs to the present invention without departing from the gist of the present invention as claimed in the claims. Anyone of ordinary skill in the art can make various modifications, and such changes are within the scope of the claims.

100: 손 제스처 인식 장치 110: 입력부
120: 출력부 130: 저장부
140: 탐색부 150: 특징 추출부
160: 패턴 매칭부 170: 인식부
180: 제어부100: hand gesture recognition device 110: input unit
120: output unit 130: storage unit
140: search unit 150: feature extraction unit
160: pattern matching unit 170: recognition unit
180:

Claims

In the hand gesture recognition device,
An input unit configured to receive a video photographing a user's movement;
A search unit for searching for and tracking the position of a user's hand using the received video;
A feature extractor which extracts gesture sequence data using the searched and tracked hand position;
A storage unit which stores a plurality of sequence data stored in the nearest neighbor classifier (K-Nearest Classifier);
A pattern matching unit configured to calculate a statistical distance between the extracted gesture sequence data and each of the stored plurality of sequence data;
A recognition unit recognizing a gesture corresponding to the sequence data having the lowest statistical distance as the gesture of the searched and tracked hand position; And
And an output unit configured to output the recognized gesture.
Wherein the pattern matching unit comprises:
An apparatus for recognizing a hand gesture, wherein the statistical distance is calculated using the following equation:

Wherein X is the extracted gesture sequence data, Y is stored sequence data, d (X, Y) is a statistical distance between X and Y, and corr (X, Y) is between X and Y The correlation coefficient, cov (X, Y) is the covariance between X and Y, and std (X) is the standard deviation of X. Std (Y) is the standard deviation of Y.

The method of claim 1,
Wherein the pattern matching unit comprises:
And performing a sort on each of the extracted gesture sequence data and the stored plurality of sequence data, and calculating a statistical distance between the sorted gesture sequence data and each of the sorted sequence data. .

The method of claim 2,
Wherein the pattern matching unit comprises:
And performing alignment on each of the extracted gesture sequence data and the stored plurality of sequence data using dynamic time matching (DTW).

The method of claim 1,
Wherein,
Hand gesture recognition apparatus, characterized in that for storing the plurality of mood-like sequence data as a vector value.

The method of claim 1,
The feature extraction unit may extract,
And extracting the gesture sequence data as a vector value.

In the hand gesture recognition method,
Receiving a video photographing the movement of the user;
Searching for and tracking the position of a user's hand using the received video;
Extracting gesture sequence data using the searched and tracked hand position;
Calculating a statistical distance between the extracted gesture sequence data and each of a plurality of previously stored sequence data stored through the nearest neighbor classifier (K-Nearest Neighbor Classifier);
Recognizing a gesture corresponding to the sequence data having the lowest statistical distance as the gesture of the searched and tracked hand position; And
Outputting the recognized gesture;
Wherein the calculating the statistical distance comprises:
Hand gesture recognition method characterized in that to calculate the statistical distance using the following equation.

X is the extracted gesture sequence data, Y is stored sequence data, d (X, Y) is a statistical distance between X and Y, and corr (X, Y) is between X and Y The correlation coefficient, cov (X, Y) is the covariance between X and Y, and std (X) is the standard deviation of X. Std (Y) is the standard deviation of Y.

The method according to claim 6,
Wherein the calculating the statistical distance comprises:
Hand gesture recognition, wherein the extracted gesture sequence data and the previously stored plurality of sequence data are aligned, and a statistical distance between the aligned gesture sequence data and each of the aligned plurality of sequence data is calculated. Way.

The method of claim 7, wherein
Wherein the calculating the statistical distance comprises:
And aligning each of the extracted gesture sequence data and the plurality of previously stored sequence data using dynamic time matching (DTW).

The method according to claim 6,
The pre-stored plurality of sequence data,
Hand gesture recognition method characterized in that it has a vector value.

The method according to claim 6,
Wherein the extracting comprises:
And extracting the gesture sequence data as a vector value.