KR101440441B1

KR101440441B1 - Gesture recongnition apparatus and method for thereof

Info

Publication number: KR101440441B1
Application number: KR1020120138081A
Authority: KR
Inventors: 박혜영; 최현석
Original assignee: 경북대학교 산학협력단; 재단법인대구경북과학기술원
Priority date: 2012-11-30
Filing date: 2012-11-30
Publication date: 2014-09-17
Also published as: KR20140070059A

Abstract

제스처 인식 방법이 개시된다. 제스처 인식 방법은, 사용자를 촬상한 동영상을 입력받는 단계, 입력받은 동영상을 구성하는 복수의 프레임들 중에서 인접하는 복수의 프레임 사이에 빼기(subtract) 연산을 수행하여 차 영상을 획득하는 차 영상 획득부, 획득된 차 영상의 특징점을 추출하는 단계, 추출된 차 영상의 특징점과 기 저장된 복수의 기준 차 영상 특징점 간의 통계적 거리를 산출하는 단계 및 복수의 기준 차 영상 특징점 중에서 상기 차 영상의 특징점과의 통계적 거리가 최소인 기준 차 영상 특징점에 대응되는 제스처를 사용자의 제스처로 인식하는 단계를 포함한다.A gesture recognition method is disclosed. The gesture recognition method includes: inputting a moving image of a user; subtracting a plurality of adjacent frames among a plurality of frames constituting the input moving image to obtain a difference image; A step of calculating a statistical distance between the feature points of the extracted difference images and the plurality of stored reference difference image feature points, and a step of calculating statistical distances between the feature points of the difference images, And recognizing the gesture corresponding to the reference difference image feature point having the minimum distance as a gesture of the user.

Description

[0001] Gesture Recognition Apparatus and Method [0002]

본 발명은 복잡한 영상 처리 방법 대신 손쉽게 계산할 수 있는 차 영상에 대해 특징점을 추출하여 배경 및 잡음에 강건하면서도 분류에 사용되는 특징 정보를 줄여 저장장치에 저장할 특징 정보를 감소시킬 수 있는 효과적인 제스처 인식 장치 및 그 방법에 관한 것이다.The present invention relates to an effective gesture recognition device capable of reducing feature information to be stored in a storage device by reducing feature information used for classification and robust to background and noise by extracting feature points for a car image that can be easily calculated instead of a complicated image processing method It is about the method.

영상 획득 장치를 장착한 IT 기기의 폭발적인 증가와 함께 사용자와 기기 간의 자연스러운 상호작용 및 정보 교환을 위한 방법의 일환으로 영상 기반 제스처 인식에 대한 관심이 집중되고 있다. 영상 기반 제스처 인식의 장점은 기존의 컴퓨터 사용에서와 같은 키보드나 마우스등의 추가적인 장비 없이 기기에 기본적으로 장착된 영상 획득 장치를 활용함으로서 추가적인 비용 없이 수행할 수 있어 로봇 제어, 컴퓨터 게임, 스마트 TV, 스마트폰 등에서 이를 활용하기 위한 많은 연구들이 진행되고 있다.With the explosion of IT devices equipped with image acquisition devices, there is a growing interest in image-based gesture recognition as a method for natural interaction and information exchange between users and devices. The advantage of image-based gesture recognition is that it can be performed without any extra cost by utilizing the image acquisition device installed in the device without any additional equipment such as a keyboard or a mouse as in the conventional computer use, Many researches are being conducted to utilize this in smart phones and the like.

제스처 데이터는 큰 변형을 동반한 고차원 영상의 연속적인 시퀀스 정보로 주어지기 때문에 분류에 적합한 형태로 데이터를 표현하는 것은 매우 중요하다. 이와 관련하여 종래에는 정교화 된 영상 처리 기법을 사용하였다. Since gesture data is given as continuous sequence information of a high dimensional image accompanied by a large deformation, it is very important to express the data in a form suitable for classification. Conventionally, a sophisticated image processing technique is used in this regard.

SPoF(Space of Probability Function)는 영상의 각 프레임의 Canny edge pixel을 사용하여 각 픽셀들 사이의 통계적인 분포를 표현하였으며, Kinect에 의해 획득된 RGB 및 깊이(depth) 정보를 이용하여 손 탐색을 수행한 연구도 있었다. 이들 방법들은 제스처 정보 표현에 좋은 결과를 보이고 있으나 복잡한 연산을 수반하고 있다는 한계가 있다. SPoF (Space of Probability Function) represents the statistical distribution between each pixel using the Canny edge pixel of each frame of the image, and performs hand search using RGB and depth information obtained by Kinect There was also a study. These methods show good results in gesture information representation, but they have a limitation that they involve complicated operations.

한편, 통계적 접근 방법으로는 PCA(Principal Component Analysis), LDA(Linear Discriminant Analysis), 매니폴드 학습 등의 방법이 사용되었다. 이들 방법들은 복잡한 영상 처리 기법 대신 통계 기반의 특징 추출 방법을 사용함으로서 저차원 특징 데이터를 획득할 수 있었다. 하지만 이들 통계적 특징 추출 방법들은 조명의 변화나 배경에 의한 잡음에 취약한 한계를 지니고 있다.On the other hand, Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and manifold learning were used as statistical approaches. These methods can acquire low dimensional feature data by using statistical feature extraction method instead of complex image processing method. However, these statistical feature extraction methods are vulnerable to changes in illumination or background noise.

본 발명은, 배경 및 잡음에 강건하면서도 분류에 사용되는 특징 정보를 줄여 저장장치에 저장할 특징 정보를 감소시킬 수 있는 제스처 인식 방법 및 장치를 제공하는데 있다.The present invention is to provide a gesture recognition method and apparatus capable of reducing feature information to be stored in a storage device while reducing feature information used for classification while being robust against background and noise.

이상과 같이 목적을 달성하기 위한 본 발명에 따른 제스처 인식 방법은, 사용자를 촬상한 동영상을 입력받는 단계, 상기 입력받은 동영상을 구성하는 복수의 프레임들 중에서 인접하는 복수의 프레임 사이에 빼기(subtract) 연산을 수행하여 차 영상을 획득하는 차 영상 획득부, 상기 획득된 차 영상의 특징점을 추출하는 단계 상기 추출된 차 영상의 특징점과 기 저장된 복수의 기준 차 영상 특징점 간의 통계적 거리를 산출하는 단계 및 복수의 기준 차 영상 특징점 중에서 상기 차 영상의 특징점과의 통계적 거리가 최소인 기준 차 영상 특징점에 대응되는 제스처를 상기 사용자의 제스처로 인식하는 단계를 포함한다.According to another aspect of the present invention, there is provided a gesture recognition method comprising: inputting a moving image of a user; subtracting a plurality of frames among a plurality of frames constituting the input moving image; Calculating a statistical distance between a feature point of the extracted difference image and a plurality of previously stored reference difference image feature points, and a step of calculating a difference Recognizing, as a gesture of the user, a gesture corresponding to a reference difference image feature point having a minimum statistical distance from the feature point of the difference image among the reference difference image feature points of the difference image.

그리고, 상기 획득된 차 영상의 특징점을 추출하는 단계는, PCA(Principal Component Analysis), LDA(Linear Discriminant Analysis), SIFT(Scale Invariant Feature Transform), LPP(Locality Preserving Projection) 또는 dense-SIFT 중 하나의 방법을 이용할 수 있다.The extracting of the minutiae points of the obtained difference image may be performed using one of a Principal Component Analysis (PCA), a Linear Discriminant Analysis (LDA), a Scale Invariant Feature Transform (SIFT), a Locality Preserving Projection (LPP) Method can be used.

또한, 상기 추출된 차 영상의 특징점과 기저장된 복수의 차 영상 특징점 간의 통계적 거리를 산출하는 단계는, 동적 시간 정합(DTW : Dynamic Time Warping) 알고리즘을 사용하여, 상기 추출된 차 영상 특징점의 데이터 길이와 상기 기저장된 복수의 기준 차 영상 특징점의 데이터 길이를 동일하게 조정하는 단계 및 상기 데이터 길이가 동일하게 조정된 상태에서 상기 추출된 차 영상 특징점과 상기 복수의 기준 차 영상 특징점을 순차적으로 비교하여, 상기 통계적 거리를 산출하는 단계를 더 포함할 수 있다.The step of calculating the statistical distance between the minutiae points of the extracted difference image and the plurality of difference image feature points stored in advance may include calculating a data length of the extracted difference image minutiae using a dynamic time warping (DTW) And comparing the extracted difference image feature points and the plurality of reference difference image feature points sequentially in a state in which the data lengths are adjusted in the same manner, And calculating the statistical distance.

또한, 상기 복수의 기준 차 영상 특징점 각각은, 최근접 이웃 분류기(K-Nearest Neighbor Classifier)를 통하여 분류된 차 영상 특징점일 수 있으며, 상기 통계적 거리를 산출하는 단계는, 아래의 수학식을 이용하여 상기 통계적 거리를 산출할 수 있다.In addition, each of the plurality of reference difference image feature points may be a difference image feature point classified through a K-Nearest Neighbor Classifier, and the step of calculating the statistical distance may be performed by using the following equation The statistical distance can be calculated.

여기서, 상기 X는 상기 추출된 차 영상의 특징점, 상기 Y는 저장된 복수의 차 영상 특징점, 상기 d_MI(X,Y)는 상기 X와 상기 Y간의 통계적 거리, 상기 S(X,Y)는 상기 X와 상기 Y간의 상관 계수, 상기 σ_XY는 상기 X와 상기 Y간의 공분산, σ_XX와 σ_YY는 상기 X와 상기 Y의 표준편차이다. Here, the X is the extracted feature points of the differential image, wherein the Y has a plurality of primary image feature points are stored, and the d _MI (X, Y) is a statistical distance between the Y and the X, wherein S (X, Y) is the X with the correlation coefficient, σ the _XY between the Y is the covariance between the Y and the X, σ _XX and _YY σ is the standard deviation of the Y and the X.

한편, 본 발명의 다른 실시 예에 따른 제스처 인식 장치는, 사용자를 촬상한 동영상을 입력받는 입력부, 복수의 기준 차 영상 특징점이 저장된 저장부, 상기 입력받은 동영상을 구성하는 복수의 프레임들 중에서 인접하는 복수의 프레임 사이에 빼기(subtract) 연산을 수행하여 차 영상을 획득하는 이미지 처리부, 상기 획득된 차 영상의 특징점을 추출하여, 상기 추출된 차 영상의 특징점과 상기 복수의 기준 차 영상 특징점 간의 통계적 거리를 산출하는 산출부 및 상기 복수의 기준 차 영상 특징점 중에서 상기 차 영상의 특징점과의 통계적 거리가 최소인 기준 차 영상 특징점에 대응되는 제스처를 상기 사용자의 제스처로 인식하는 제어부를 포함한다.According to another aspect of the present invention, there is provided a gesture recognition apparatus including an input unit for inputting a moving image of a user, a storage unit for storing a plurality of reference difference image feature points, An image processing unit for performing a subtract operation between a plurality of frames to obtain a difference image, extracting feature points of the obtained difference images, and calculating a statistical distance between the feature points of the extracted difference images and the plurality of reference difference image feature points And a controller for recognizing, as a gesture of the user, a gesture corresponding to a reference difference image feature point having a minimum statistical distance from the feature points of the difference image among the plurality of reference difference image feature points.

여기에서 상기 산출부는, PCA(Principal Component Analysis), LDA(Linear Discriminant Analysis), LPP(Locality Preserving Projection), SIFT(Scale Invariant Feature Transform) 또는 dense-SIFT 중 하나의 방법을 이용하여 차 영상의 특징점을 추출할 수 있고, 상기 산출부는, 동적 시간 정합(DTW : Dynamic Time Warping) 알고리즘을 사용하여, 상기 추출된 차 영상 특징점 데이터와 상기 기저장된 복수의 차 영상 특징점 데이터 길이를 동일하게 조정할 수 있다.Here, the calculating unit may calculate the feature points of the difference image using one of Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Locality Preserving Projection (LPP), Scale Invariant Feature Transformation (SIFT), or dense- And the calculating unit may adjust the extracted difference image feature point data and the previously stored difference image feature point data length equally using a dynamic time warping (DTW) algorithm.

그리고, 상기 복수의 기준 차 영상 특징점은, 최근접 이웃 분류기(K-Nearest Neighbor Classifier)를 통하여 분류된 차 영상 특징점일 수 있고, 상기 산출부는,The plurality of reference difference image feature points may be differential image feature points classified through a nearest neighbor classifier (K-Nearest Neighbor Classifier)

아래의 수학식을 이용하여 상기 통계적 거리를 산출할 수 있다.The statistical distance can be calculated using the following equation.

여기서, 상기 X는 상기 추출된 차 영상의 특징점, 상기 Y는 저장된 복수의 차 영상 특징점, 상기 d_MI(X,Y)는 상기 X와 상기 Y간의 통계적 거리, 상기 S(X,Y)는 상기 X와 상기 Y간의 상관 계수, 상기 σ_XY는 상기 X와 상기 Y간의 공분산, σ_XX와 σ_YY는 상기 X와 상기 Y의 표준편차이다.Here, the X is the extracted feature points of the differential image, wherein the Y has a plurality of primary image feature points are stored, and the d _MI (X, Y) is a statistical distance between the Y and the X, wherein S (X, Y) is the X with the correlation coefficient, σ the _XY between the Y is the covariance between the Y and the X, σ _XX and _YY σ is the standard deviation of the Y and the X.

이와 같이 본 실시 예에 따른 제스처 인식 장치 및 인식 방법은, 차 영상을 활용함으로써 정교하고 복잡한 영상 처리 기법 없이도 영상에 존재하는 불필요한 잡음을 제거할 수 있으며, 차 영상의 특징점을 추출하여 제스처를 인식함으로써 데이터 연산량을 줄이고 제스처 인식률을 높일 수 있다.As described above, the gesture recognition apparatus and the recognition method according to the present embodiment can remove unnecessary noise existing in an image without using a sophisticated and complicated image processing technique by utilizing a difference image, and recognize gestures by extracting feature points of a difference image It is possible to reduce the amount of data computation and increase the gesture recognition rate.

도 1은 본 발명의 일 실시 예에 따른, 제스처 인식 장치(100)의 구성을 나타내는 블록도,
도 2는 본 발명의 일 실시 예에 따라 차 영상을 추출하는 방법을 설명하기 위한 도면,
도 3은 도 2에서 설명한 방법에 따라 구한 차 영상을 설명하기 위한 도면,
도 4 및 5는 본 실시 예에 따른 제스처 인식 장치의 실험 결과를 설명하기 위한 도면,
도 6은 본 발명의 일 실시 예에 따른 제스처 인식 방법을 설명하기 위한 흐름도; 및
도 7은 본 발명의 다른 실시 예에 따른 제스처 인식 방법을 설명하기 위한 흐름도이다.1 is a block diagram illustrating a configuration of a gesture recognition apparatus 100 according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining a method of extracting difference images according to an embodiment of the present invention; FIG.
FIG. 3 is a view for explaining difference images obtained by the method described in FIG. 2,
4 and 5 are diagrams for explaining experimental results of the gesture recognition apparatus according to the present embodiment,
FIG. 6 is a flowchart illustrating a gesture recognition method according to an exemplary embodiment of the present invention; FIG. And
7 is a flowchart illustrating a gesture recognition method according to another embodiment of the present invention.

이하 첨부된 도면들을 참조하여 본 발명의 일시 예를 보다 상세하게 설명한다. Hereinafter, a temporal example of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른, 제스처 인식 장치(100)의 구성을 나타내는 블럭도이다.1 is a block diagram showing a configuration of a gesture recognition apparatus 100 according to an embodiment of the present invention.

본 발명의 일 실시 예에 따른 제스처 인식 장치(100)는 입력된 영상에서 프레임 간의 차 영상을 획득하고, 획득된 차 영상에서 특징점을 추출한 후 이로부터 동적 시간 정합 및 통계적 상관관계 정보를 사용하여 기저장된 차 영상 특징점과 비교함으로써 기저장된 차 영상 특징점에 대응되는 제스처를 인식한다.The gesture recognition apparatus 100 according to an embodiment of the present invention acquires a difference image between frames from an input image, extracts feature points from the obtained difference image, and then uses the dynamic time matching and statistical correlation information And recognizes the gesture corresponding to the previously stored difference image feature point by comparing with the stored difference image feature point.

이러한 기능을 수행하는 제스처 인식 장치(100)는 도 1에 도시된 바와 같이, 입력부(110), 이미지 처리부(120), 저장부(130), 산출부(140) 및 제어부(150)를 포함한다.1, the gesture recognition apparatus 100 includes an input unit 110, an image processing unit 120, a storage unit 130, a calculation unit 140, and a control unit 150 .

입력부(110)는 사용자를 촬상한 동영상을 입력받는다. 구체적으로, 입력부(110)는 외부의 촬상장치에서 촬상된 동영상을 입력받을 수 있다. 또는 본 발명을 구현하는데 있어, 입력부(110)가 촬상소자로 구현되어 직접 사용자 제스처 동영상을 생성할 수도 있다.The input unit 110 receives a moving image of the user. Specifically, the input unit 110 can receive a moving image captured by an external imaging device. Alternatively, in implementing the present invention, the input unit 110 may be implemented as an imaging device to directly generate a user gesture video.

이미지 처리부(120) 입력받은 동영상을 구성하는 복수의 프레임들 중에서 인접하는 복수의 프레임 사이에 빼기(subtract) 연산을 수행하여 차 영상을 획득한다.The image processing unit 120 performs a subtract operation between a plurality of adjacent frames among a plurality of frames constituting the input moving image to obtain a difference image.

구체적으로, 현재 프레임을 기준으로 인접하는 바로 전 프레임의 픽셀정보를 불러오고, 현재 프레임의 픽셀 정보를 불러온 후, 임시로 픽셀 양을 정하고 빼기(Subtract)연산을 실시하면 차 영상을 추출할 수 있다.Specifically, when the pixel information of the immediately preceding frame is retrieved based on the current frame, the pixel information of the current frame is retrieved, and the pixel amount is temporarily determined and subtracted, the difference image can be extracted have.

저장부(130)는 입력받은 동영상을 저장할 수 있다. 구체적으로, 저장부(130)는 입력부(110)를 통하여 입력받은 동영상을 임시로 저장할 수 있다.The storage unit 130 may store the input moving image. Specifically, the storage unit 130 may temporarily store a moving image received through the input unit 110. For example,

그리고 저장부(130)는 복수의 기준 차 영상 특징점을 저장한다. 여기에서 특징점이란, 특정 제스처의 차 영상에서만 나타나는 중요 픽셀을 의미한다.The storage unit 130 stores a plurality of reference difference image feature points. Here, the feature point means a significant pixel that appears only in a difference image of a specific gesture.

예를 들어 손을 좌우로 흔드는 영상의 차 영상 특징점이 A라고 하고, 손을 상하로 흔드는 영상의 차 영상 특징점이 B라고 하면 저장부(130)는 특징점 A 및 B를 저장할 수 있다.For example, if the differential image feature point of the image shaking the hand to the left and right is A and the differential image feature point of the image shaking the hand is vertically B, the storage unit 130 can store the feature points A and B. [

그리고 입력부(110)부로 촬상한 사용자 제스처의 차 영상 특징점을 획득하고, 저장부(130)에 저장된 복수의 기준 차 영상 특징점과 비교하여 획득된 특징점이 저장부(130)에 저장된 특징점 A와 대응된다고 판단되면, 사용자의 체스쳐가 손을 좌우로 흔드는 것이라고 인식할 수 있다.The feature points obtained by comparing the plurality of reference difference image feature points stored in the storage unit 130 with the feature points A stored in the storage unit 130 are obtained by acquiring the difference image feature points of the user gesture captured by the input unit 110 If it is judged, it can be recognized that the user's chiseler shakes the hand from side to side.

한편, 저장부(130)는 제스처 인식 장치(100) 내의 저장매체 및 외부 저장 매체, 예를 들어 USB 메모리를 포함한 Removable Disk, 호스트(Host)에 연결된 저장매체, 네트워크를 통한 웹서버(Web server) 등으로 구현될 수 있다.The storage unit 130 may be a storage medium in the gesture recognition apparatus 100 and an external storage medium such as a removable disk including a USB memory, a storage medium connected to a host, a Web server via a network, Or the like.

산출부(140)는 획득된 차 영상의 특징점을 추출하여, 추출된 차 영상의 특징점과 저장부에 기 저장된 복수의 기준 차 영상 특징점 간의 통계적 거리를 산출한다.The calculating unit 140 extracts the feature points of the obtained difference images, and calculates the statistical distances between the feature points of the extracted difference images and the plurality of reference difference image feature points stored in the storage unit.

차 영상의 특징점을 추출하는 방법은 여러 가지 기준에 의해 구분될 수 있으며, 대표적인 기준으로는 특징을 추출하는 영역의 범위에 따른 전역적 특징 추출 방법과 지역적 특징 추출 방법이 있다.The method of extracting minutiae points of difference images can be classified according to various criteria. As a representative criterion, there are a global feature extraction method and a regional feature extraction method according to the range of a feature extraction region.

전역적 특징 추출방법은 차 영상 전체에 대해 공분산과 같은 통계적 수치를 이용한 것으로 특정 목적 함수를 만족시키기 위해 고유치, 고유 벡터 등을 적용하는 것이 일반적이다. 따라서 임의의 학습 데이터로부터 획득되는 특징들은 해당 학습 데이터의 전체적인 설명이 가능하도록 표현되어 진다. 대표적인 방법으로 PCA(Principal Component Analysis) 및 LDA(Linear Discriminant Analysis)와 같은 방법이 있다.The global feature extraction method uses statistical values such as covariance for the whole difference image. It is general to apply eigenvalues and eigenvectors to satisfy a specific objective function. Therefore, the features obtained from arbitrary learning data are expressed in such a way that the entire description of the learning data is possible. Representative methods include Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA).

지역적 특징 추출방법은 차 영상의 특징을 잘 설명해 줄 수 있는 지역적인 특징을 추출하는 것으로, 차 영상에서 특정 지역외 다른 지역에 대한 특징은 고려하지 않는다. 지역적 특징 추출방법은 대표적으로 SIFT(Scale Invariant Feature Transform), LPP(Locality Preserving Projection), dense-SIFT와 같은 방법을 들 수 있다.The regional feature extraction method extracts local features that can explain the feature of the difference image, and does not consider the feature of the region other than the specific region in the difference image. Local feature extraction methods are typically SIFT (Scale Invariant Feature Transform), LPP (Locality Preserving Projection), and dense-SIFT.

산출부(140)는 동적 시간 정합(DTW : Dynamic Time Warping)을 사용하여 길이가 다른 두개의 데이터를 조정하고, 추출된 차 영상의 특징점과 기저장된 제스처 차 영상의 특정점과의 통계적 거리를 산출한다.The calculating unit 140 adjusts two data having different lengths by using dynamic time warping (DTW), and calculates a statistical distance between the feature point of the extracted difference image and a predetermined point of the stored gesture difference image do.

본 발명의 일 실시 예에 따르면, 입력부로부터 촬상한 사용자 제스처 영상의 차 영상과 기저장된 제스처 차 영상을 비교하여 제스처를 인식하게 되는데 촬상된 사용자 제스처 영상과 기저장된 제스처 영상의 길이가 상이하면 두 영상을 비교할 수 없다는 문제가 발생된다.According to an embodiment of the present invention, a gesture is recognized by comparing a difference image of a user gesture image captured from an input unit with a previously stored gesture difference image. If the captured user gesture image differs from the stored gesture image in length, Can not be compared with each other.

따라서, 산출부(140)는 촬상된 제스처 영상으로부터 추출된 차 영상의 특징점과 기저장된 제스처 차 영상의 특징점을 비교하기 전에 동적 시간 정합을 사용하여 두 영상의 길이를 조절한다.Accordingly, the calculating unit 140 adjusts the length of the two images using the dynamic time matching before comparing the minutiae points of the difference image extracted from the picked-up gesture image with the minutiae points of the pre-stored gesture difference image.

여기서, 동적 시간 정합은 두 시퀀스 데이터를 정렬하기 위한 최적 경로를 계산할 수 있는 방법으로 시계열 데이터 사이의 거리 계산에 사용될 수 있는 방법이다. 하지만, 제스처 인식에서는 이러한 동적 시간 정합을 통한 거리 계산은 사용자 간 또는 제스처 간의 다양한 변화 요인으로 인해 적합하지 않다. 이에 대한 대안으로 본 실시 예에서는 차 영상 특정점 간의 거리 계산을 위해 통계적 거리를 사용하였는데 이는 아래의 수학식 1과 같다.Here, dynamic time alignment is a method that can calculate an optimal path for sorting two sequence data and can be used for calculating the distance between time series data. However, in gesture recognition, the distance calculation through this dynamic time matching is not suitable due to various factors between users or between gestures. As an alternative to this, a statistical distance is used in the present embodiment to calculate the distance between the differential image specific points, which is expressed by Equation 1 below.

여기서, X는 추출된 차 영상의 특징점, Y는 저장된 복수의 차 영상 특징점, d_MI(X,Y)는 X와 상기 Y간의 통계적 거리, S(X,Y)는 X와 Y간의 상관 계수, σ_XY는 X와 Y간의 공분산, σ_XX와 σ_YY는 X와 Y의 표준편차이다.Wherein, X is a feature point of the extracted difference image, Y is a plurality of primary image feature points are stored, d _MI (X, Y) is a statistical distance between the X and the Y, S (X, Y) is the correlation between X and Y coefficients, σ _XY is the covariance between X and Y, σ _XX and σ _YY are the standard deviation of X and Y.

제어부(150)는 산출된 통계적 거리가 가장 짧은 차 영상 특정점에 대응되는 제스처를 사용자의 체스처로 인식한다.The control unit 150 recognizes the gesture corresponding to the difference image specific point having the shortest statistical distance as the chess destination of the user.

구체적으로 제어부(150)는 산출된 통계적 거리와 기 설정된 값을 비교하여 사용자 제스처를 인식할 수 있다.Specifically, the controller 150 may recognize the user gesture by comparing the calculated statistical distance with a predetermined value.

예를 들어, 손을 좌우로 흔드는 영상의 차 영상 특징점을 X, 상하로 흔드는 영상의 차 영상 특징점을 Y, 원을 그리는 영상의 차 영상 특징점을 Z라고 했을때, 촬상된 사용자 제스처 영상의 차 영상 특징점과 기저장된 기준 차 영상 특징점 X의 통계적 거리가 가장 짧은 경우, 사용자의 제스처를 특징점 X에 대응되는 좌우로 손을 흔드는 동작이라고 인식할 수 있다.For example, when a feature point of a difference image of a hand shaking a hand is X, a feature image of a difference image is shaken up and down, and a feature image of a difference image is Z, the difference image of the captured user gesture image If the statistical distance of the feature point and the previously stored reference difference image feature point X is the shortest, the gesture of the user can be recognized as the shaking motion to the left and right corresponding to the feature point X. [

한편, 기준 차 영상 특정점에 대응되는 제스처와 전혀 상이한 제스처가 입력된 경우, 입력된 제스처가 기저장된 제스처 중 하나로 인식되는 것을 방지하기 위하여, 산출된 통계적 거리와 기 설정된 값을 비교하고, 산출된 통계적 거리 모두가 기 설정된 값보다 큰 경우, 일치되는 제스처가 없는 것으로 인식할 수 있다.On the other hand, in order to prevent the input gesture from being recognized as one of the pre-stored gestures when a gesture completely different from the gesture corresponding to the reference difference image specific point is input, the calculated statistic distance is compared with a preset value, If all of the statistical distances are greater than a predetermined value, it can be recognized that there is no gesture to be matched.

또한, 제어부(150)는 제스처 인식 장치(100) 내의 각 구성을 제어한다.In addition, the control unit 150 controls each component in the gesture recognition apparatus 100. [

구체적으로 입력부(110)를 통하여 영상이 입력되면, 저장부(130)에 동영상을 임시로 저장할 수 있다.Specifically, when an image is inputted through the input unit 110, the moving image can be temporarily stored in the storage unit 130. [

제어부(150)는 이미지 처리부(120)를 제어하여 입력받은 동영상을 구성하는 복수의 프레임들 중에서 인접하는 복수의 프레임 사이에 빼기(subtract) 연산을 수행하여 차 영상을 획득할 수 있다.The control unit 150 controls the image processing unit 120 to perform a subtract operation between a plurality of adjacent frames among the plurality of frames constituting the input moving image to obtain a difference image.

한편, 본 발명의 실시 예에서는 인접하는 2개의 프레임 픽셀 정보에 빼기 연산을 수행하여 차 영상을 획득하는 것으로 설명하였으나, 2개 이상의 프레임으로부터 빼기 연산을 수행하여 차 영상을 추출할 수도 있다.Meanwhile, in the embodiment of the present invention, subtraction operation is performed on two adjacent frame pixel information to obtain a difference image. However, subtraction operation may be performed from two or more frames to extract a difference image.

입력된 영상의 차 영상이 추출되면 제어부(150)는 산출부(150)를 제어하여 차 영상의 특징점을 추출할 수 있다.When the difference image of the input image is extracted, the control unit 150 controls the calculation unit 150 to extract the feature points of the difference image.

차 영상의 특징점을 추출하는 이유는 제스처 인식 및 분류 과정에서 연산의 복잡도를 최소화 하고 저장 공간을 효율적으로 활용하기 위함이다.The reason for extracting the minutiae of the difference image is to minimize the complexity of the computation in the gesture recognition and classification process and utilize the storage space efficiently.

본 발명의 실시 예에 따른 차 영상 특징점 추출 방법은 PCA(Principal Component Analysis). LDA(Linear Discriminant Analysis), LPP(Locality Preserving Projection), SIFT(Scale Invariant Feature Transform), dense-SIFT 중 하나의 방법을 이용할 수 있다.The difference image minutiae point extraction method according to the embodiment of the present invention is PCA (Principal Component Analysis). One of LDA (Linear Discriminant Analysis), LPP (Locality Preserving Projection), SIFT (Scale Invariant Feature Transform) and dense-SIFT can be used.

입력된 영상의 차 영상 특징점이 추출되면, 제어부(150)는 산출부(140)를 제어하여 저장부(130)에 기저장된 복수의 기준 차 영상 특징점과의 통계적 거리를 산출한다.The control unit 150 controls the calculating unit 140 to calculate the statistical distance between the reference image feature points and the plurality of reference difference image feature points previously stored in the storage unit 130. [

이때, 입력된 영상의 차 영상 길이와, 저장부(130)에 기저장된 기준 차 영상의 길이가 다를 수 있으므로 동적 시간 정합(DTW : Dynamic Time Warping)을 사용하여 길이가 다른 두개의 데이터를 조절할 수 있다.Since the length of the difference image of the inputted image may differ from the length of the reference difference image stored in the storage unit 130, two data having different lengths can be adjusted using dynamic time warping (DTW) have.

입력된 제스처 영상의 차 영상 특징점과 기저장된 제스처 차 영상 특징점과의 통계적 거리가 산출되면 제어부(150)는 통계적 거리가 가장 낮은 제스처 차 영상 특징점에 대응되는 제스처를 사용자의 제스처로 인식한다.When the statistical distance between the difference image feature point of the input gesture image and the stored feature point of the gesture difference image is calculated, the controller 150 recognizes the gesture corresponding to the gesture difference image feature point having the lowest statistical distance as the gesture of the user.

구체적으로 제어부(150)는 산출된 통계적 거리와 기 설정된 값을 비교하여, 저장부(130)에 저장되어 있는 기준 차 영상 특징점과의 통계적 거리가 기 설정된 값 이하이면, 그 제스처 차 영상 특정점에 대응되는 제스처를 입력된 영상의 제스처라고 인식할 수 있다.Specifically, the controller 150 compares the calculated statistical distance with a preset value, and if the statistical distance from the reference difference image feature point stored in the storage unit 130 is less than a predetermined value, The corresponding gesture can be recognized as a gesture of the input image.

상술한 바와 같이 입력된 영상의 차 영상을 활용하면, 제스처 탐색 과정을 생략할 수 있고, 제스처 데이터의 배경, 조명 변화와 같은 인식에 불필요한 부분을 배재함으로써 인식률을 높일 수 있다는 효과가 있다.The use of the difference image of the input image as described above can eliminate the gesture search process and increase the recognition rate by eliminating unnecessary parts such as the background of the gesture data and the illumination change.

또한, 차 영상 자체를 비교하여 제스처를 인식하는 것이 아니고, 차 영상의 특징점을 추출하는 단계, 즉 차원 축소 과정을 거치므로 연산의 복잡도를 완화시킬 수 있고 저장 공간을 효율적으로 사용할 수 있다는 효과도 있다.In addition, the gesture is not recognized by comparing the difference images themselves, but the feature point of the difference image is extracted, that is, the size reduction process is performed, so that the complexity of the calculation can be reduced and the storage space can be efficiently used .

도 2는 본 발명의 일 실시 예에 따라 차 영상을 추출하는 방법을 설명하기 위한 도면이다.2 is a diagram for explaining a method of extracting difference images according to an embodiment of the present invention.

차 영상은 사물의 움직임을 감지할 때 쓰는 영상처리 기법이다. 현재 프레임이 연속해서 들어오면 인접하는 바로 전 프레임과 현재 프레임을 빼주게 되는데, 사물이 움직일때만 그 윤곽이 나타나도록 되어있다.A car image is an image processing technique used to detect movement of objects. When the current frame comes in succession, it subtracts the previous frame and the current frame that are adjacent to each other, and the contour is displayed only when the object moves.

구체적으로, 현재 프레임을 기준으로 인접하는 바로 전 프레임의 픽셀정보를 불러온다(210). 그리고 현재 프레임의 픽셀 정보를 불러온 후(220), 임시로 픽셀 양을 정하고 빼기(Subtract)연산을 실시하면 차 영상을 추출할 수 있다.Specifically, the pixel information of the immediately preceding frame adjacent to the current frame is retrieved (210). Then, the pixel information of the current frame is fetched (220), the pixel amount is temporarily determined, and subtraction operation is performed to extract the difference image.

이와 같이 차 영상을 이용하여 움직임을 인식하면, 전체 화면에서 움직이는 Object만 정확하게 구할 수 있다는 효과가 있다.When the motion is recognized using the difference image, only the object moving in the entire screen can be accurately obtained.

한편 본 발명에서는 인접하는 2개의 프레임에 대하여 빼기 연산을 수행하는 것으로 설명하였지만, 발명을 구현하는 과정에서 2개 이상의 프레임에 대해서도 빼기 연산을 수행하여 차 영상을 추출할 수도 있다.Meanwhile, in the present invention, subtraction operation is performed on two adjacent frames. However, subtraction operation may be performed on two or more frames in the course of implementing the invention to extract a difference image.

도 3은 도 2에서 설명한 방법에 따라 구한 차 영상을 설명하기 위한 도면이다.FIG. 3 is a diagram for explaining a difference image obtained by the method described in FIG.

제스처 타입(310)은 사용자의 제스처와 조명의 방향을 표시하고, 입력 영상(320)은 입력부(110)를 통해 입력된 사용자의 제스처 영상에서 인접하는 2개의 프레임을 의미한다.The gesture type 310 indicates the direction of the user's gesture and illumination, and the input image 320 indicates two adjacent frames in the gesture image of the user input through the input unit 110.

그러고 차 영상(330)은 인접하는 2개의 프레임에 대하여 빼기연산을 수행하여 얻은 차 영상을 의미한다.The difference image 330 is a difference image obtained by performing a subtraction operation on two neighboring frames.

도 2의 방법에 의하여 추출한 차 영상(330)을 살펴보면, 움직임이 없는 배경 영역은 값이 작게 나타나고 움직임이 있는 손 영역에 값이 집중적으로 나타나는 것을 알 수 있다.Referring to the difference image 330 extracted by the method of FIG. 2, it can be seen that the value of the background area without motion is small and the value is concentrated in the hand area with motion.

이와 같은 방법으로 제스처를 탐색하면, 배경이나 조명의 방향과 관계없이 정확하게 제스처를 탐색할 수 있게 된다. 제스처 타입(310)에서 각 영상은 사용자의 제스처 뿐만 아니라 조명이 입사되는 방향도 상이함을 알 수 있다.If a gesture is searched in this manner, the gesture can be accurately searched regardless of the direction of the background or illumination. In the gesture type 310, it can be seen that not only the user's gesture but also the direction in which the light is incident is different for each image.

첫 번째 영상의 경우 좌측에서 조명이 입사되고, 두 번째 영상의 경우 우측에서 조명이 입사되지만 결과물인 차 영상(330)을 살펴보면, 조명의 방향과 관계없이 움직임이 있는 손 영역에서만 값이 나타나는 것을 알 수 있다.In the case of the first image, the illumination is incident on the left side. In the case of the second image, the illumination is incident on the right side. However, when the resultant image 330 is viewed, .

따라서, 종래기술에서와 같이 제스처 탐색 과정을 별도로 수행하는 경우보다 인식률을 높일 수 있다는 효과를 얻을 수 있게 된다.Therefore, it is possible to obtain an effect of increasing the recognition rate as compared with the case where the gesture searching process is performed separately as in the related art.

도 4 및 5는 본 실시 예에 따른 제스처 인식 장치의 실험 결과를 설명하기 위한 도면이다.4 and 5 are diagrams for explaining experimental results of the gesture recognition apparatus according to the present embodiment.

차 영상 사용 여부에 따른 인식률(410)을 살펴보면 차원축소 방법, 즉 차 영상의 특징점을 추출하는 방법으로 PCA(Principal Component Analysis)를 사용하는 경우에는 원영상과 차 영상의 인식률 차이가 크게 나타나지 않지만, LPP(Locality Preserving Projection)를 사용한 경우에는 원영상에 비해 차 영상의 인식률이 매우 개선되어 나타남을 확인할 수 있다.The recognition rate (410) according to the use of the difference image does not show a significant difference in the recognition rate between the original image and the difference image when the PCA (Principal Component Analysis) is used as a method of reducing the dimension, When LPP (Locality Preserving Projection) is used, it can be seen that the recognition rate of the difference image is improved compared to the original image.

한편 거리 척도에 따른 인식률(420)을 살펴보면, 맨하탄거리, 유클리디안 거리에 비하여 동적 시간 정합(DTW : Dynamic Time Warping)을 수행하고 통계적 거리를 측정하는 본원 발명의 인식률이 현저히 뛰어남을 알 수 있다.Meanwhile, when the recognition rate 420 according to the distance scale is examined, it can be seen that the recognition rate of the present invention in which dynamic time warping (DTW) is performed and the statistical distance is measured is remarkably excellent compared to the Manhattan distance and the Euclidean distance .

따라서, 바람직한 실시 예에 따른 제스처 인식 장치(100)는 차 영상의 특징점을 추출하는데 있어 LPP(Locality Preserving Projection)를 사용하고 동적 시간 정합(DTW : Dynamic Time Warping)을 수행한 후 통계적 거리를 측정한다.Accordingly, the gesture recognition apparatus 100 according to the preferred embodiment uses a Locality Preserving Projection (LPP) to extract feature points of a difference image, performs dynamic time warping (DTW), and then measures a statistical distance .

도 5는 동일한 데이터를 사용하여 본 발명에서 제안된 방법과 기존의 방법 간의 성능을 비교할 뿐만 아니라, 추가적으로 PCA를 결합한 인식 결과를 정리하였다.FIG. 5 compares the performance between the method proposed in the present invention and the conventional method using the same data, and further summarizes the recognition results combining PCA.

도 5를 참고하여 보면, 차 영상 특징점을 추출하는데 LPP와 더불의 PCA를 결합하는 경우 인식률을 89.17%에서 90.97%로 향상시킬 수 있는 효과가 있다.Referring to FIG. 5, the recognition rate can be improved from 89.17% to 90.97% when the LPA and the PCA are combined to extract the difference image point.

따라서, 본 발명의 다른 실시 예에 따르면 차 영상 특징점을 추출하는 방법으로 LPP(Locality Preserving Projection)와 PCA(Principal Component Analysis)를 결합할 수도 있다.Therefore, according to another embodiment of the present invention, LPP (Locality Preserving Projection) and PCA (Principal Component Analysis) may be combined as a method of extracting difference image feature points.

도 6은 본 발명의 일 실시 예에 따른 제스처 인식 방법을 설명하기 위한 흐름도이다.6 is a flowchart illustrating a gesture recognition method according to an embodiment of the present invention.

제스처 인식 장치의 입력부(110)를 통해 사용자의 제스처를 촬상한 동영상을 입력받는다(S610). 이미지 처리부(130)는 입력받은 동영상의 프레임 픽셀 정보를 추출하고, 인접하는 프레임 간의 픽셀 정보에 빼기(subtract)연산을 수행하여 차 영상을 획득한다(S620).A moving image of the user's gesture is input through the input unit 110 of the gesture recognition apparatus (S610). In operation S620, the image processor 130 extracts frame pixel information of the input moving image and subtracts pixel information between neighboring frames to obtain a difference image.

이후 제어부(150)는 산출부(140)을 제어하여 차 영상의 특징점을 추출하고(S630), 추출된 차 영상의 특징점과 기저장된 복수의 차 영상 특징점 간의 통계적 거리를 산출하여(S640), 산출된 통계적 거리가 가장 짧은 차 영상 특징점에 대응되는 제스처를 상기 사용자의 제스처로 인식한다(S650).Thereafter, the control unit 150 controls the calculation unit 140 to extract the minutiae of the difference image (S630), calculates a statistical distance between the minutiae points of the extracted minutiae and the stored minutiae points of the difference image (S640) The gesture corresponding to the difference image feature point having the shortest statistical distance is recognized as the gesture of the user (S650).

도 7은 본 발명의 다른 실시 예에 따른 제스처 인식 방법을 설명하기 위한 흐름도이다.7 is a flowchart illustrating a gesture recognition method according to another embodiment of the present invention.

사용자의 제스처 영상이 입력되면(S710), 입력된 영상으로부터 차 영상을 획득하고 차 영상의 특징점을 추출한다(S720).If the gesture image of the user is input (S710), the difference image is obtained from the input image and the feature points of the difference image are extracted (S720).

이후 추출된 차 영상 특징점과 기저장된 차 영상 특징점과의 통계적 거리를 산출하여(S730), 기설정된 값 이하인지 여부를 판단한다(S740). 이때 통계적 거리가 기설정된 값 이하이면, 통계적 거리가 가장 짧은 특징점과 대응되는 제스처를 사용자 제스처로 인식하고, 기설정된 값 이상이면 일치되는 제스처가 없는 것으로 인식한다.The statistical distance between the extracted difference image feature point and the stored difference image feature point is calculated (S730), and it is determined whether or not the difference is less than a preset value (S740). If the statistical distance is less than the predetermined value, the gesture corresponding to the feature point having the shortest statistical distance is recognized as a user gesture.

상술한 바와 같이, 차 영상을 활용함으로써 정교하고 복잡한 영상 처리 기법 없이도 영상에 존재하는 불필요한 잡음을 제거할 수 있으며, 차원축소 방법 즉, 차 영상의 특징점을 추출하는 방법으로 LPP(Locality Preserving Projection)를 적용하여 제스처 데이터로부터 효과적인 저차원 특징을 추출할 수 있게 되는 효과가 있다.As described above, unnecessary noise existing in the image can be removed without using a sophisticated and complicated image processing technique by utilizing difference images. In addition, LPP (Locality Preserving Projection) is a method of extracting feature points of a difference image It is possible to extract an effective low-dimensional feature from the gesture data.

이상에서는 본 발명의 바람직한 실시 예에 대해서 도시하고, 설명하였으나, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자라면 누구든지 다양한 변형 실시할 수 있는 것은 물론이고, 그와 같은 변경은 청구범위 기재의 범위 내에 있게 된다. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the appended claims.

100 : 제스처 인식 장치 110 : 입력부
120 : 이미지 처리부 130 : 저장부
140 : 산출부 150 : 제어부100: gesture recognition device 110: input unit
120: image processing unit 130:
140: Calculator 150:

Claims

In the gesture recognition method,
Receiving a moving image of a user;
Obtaining a difference image by subtracting a plurality of adjacent frames among a plurality of frames constituting the input moving image to obtain a difference image;
Extracting feature points of the obtained difference image;
Calculating a statistical distance between a feature point of the extracted difference image and a plurality of previously stored reference difference image feature points;
Recognizing, as a gesture of the user, a gesture corresponding to a reference difference image feature point having a minimum statistical distance from the feature point of the difference image among the plurality of reference difference image feature points;
Wherein the calculating the statistical distance comprises:
Adjusting a data length of the extracted difference image feature point and a data length of the previously stored reference difference image feature points equally using a dynamic time warping (DTW) algorithm; And
And sequentially comparing the extracted difference image feature points and the plurality of reference difference image feature points in a state where the data length is adjusted to be the same, and calculating the statistical distance.

The method according to claim 1,
The step of extracting feature points of the obtained difference image may include:
Wherein one of a Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Scale Invariant Feature Transform (SIFT), Locality Preserving Projection (LPP), or dense-SIFT is used.

delete

3. The method of claim 2,
Wherein each of the plurality of reference difference image feature points comprises:
And a difference image feature point classified through a K-Nearest Neighbor Classifier.

The method according to claim 1,
Wherein the calculating the statistical distance comprises:
Wherein the statistical distance is calculated using the following equation:

Here, the X is the characteristic point, wherein Y is a plurality of the reference difference image feature points, the d _MI (X, Y) is a statistical distance, the S (X, Y) between the Y and the X stored in the difference image in which the extraction is the σ _XY correlation coefficient between the Y and the X, is the covariance between the Y and the X, σ _XX and _YY σ is the standard deviation of the Y and the X.

An input unit for inputting a moving image of a user;
A storage unit storing a plurality of reference difference image feature points;
An image processing unit for subtracting a plurality of adjacent frames among a plurality of frames constituting the input moving image to obtain a difference image;
A calculating unit for extracting feature points of the obtained difference images and calculating a statistical distance between the feature points of the extracted difference images and the plurality of reference difference image feature points;
And a controller for recognizing, as a gesture of the user, a gesture corresponding to a reference difference image feature point having a minimum statistical distance from the feature point of the difference image among the plurality of reference difference image feature points,
The calculating unit calculates,
A gesture recognition device that adjusts the extracted difference image feature point data and the previously stored reference difference image feature point data length equally using a dynamic time warping (DTW) algorithm.

The method according to claim 6,
The calculating unit calculates,
Feature points of a difference image are extracted using one of Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Locality Preserving Projection (LPP), Scale Invariant Feature Transformation (SIFT), or dense-SIFT Gesture recognition device.

delete

8. The method of claim 7,
Wherein the plurality of reference difference image feature points include:
And a difference image feature point classified through a nearest neighbor classifier (K-Nearest Neighbor Classifier).

The method according to claim 6,
The calculating unit calculates,
Wherein the statistical distance is calculated using the following equation: < EMI ID =

Here, the X is the characteristic point, wherein Y is a plurality of the reference difference image feature points, the d _MI (X, Y) is a statistical distance, the S (X, Y) between the Y and the X stored in the difference image in which the extraction is the σ _XY correlation coefficient between the Y and the X, the covariance, σ σ _XX and _YY between the Y and the X is the standard deviation of the Y and the X.