KR101785650B1

KR101785650B1 - Click detecting apparatus and method for detecting click in first person viewpoint

Info

Publication number: KR101785650B1
Application number: KR1020160004289A
Authority: KR
Inventors: 우운택; 장영균; 노승탁; 장형진; 김태균
Original assignee: 한국과학기술원
Priority date: 2016-01-13
Filing date: 2016-01-13
Publication date: 2017-10-17
Also published as: KR20170084892A

Abstract

사용자에게 착용된 단일 깊이 카메라에 의해 촬영되는 손의 제 1 시퀀스 영상을 획득하는 단계; 획득한 제 1 시퀀스 영상 내 복수의 프레임으로부터 제 1 시공간 특징 벡터(spatio-temporal feature vector)를 획득하는 단계; 클릭 동작의 발생 여부 및 클릭 위치에 대한 정보를 알고 있는 손의 제 2 시퀀스 영상의 프레임으로부터 추출된 제 2 시공간 특징 벡터에 기초하여, 랜덤 포레스트(random forest)를 구성하는 단계; 및 제 1 시공간 특징 벡터를 랜덤 포레스트에 입력하여, 제 1 시퀀스 영상에서 손의 클릭 동작의 발생 여부 및 클릭 위치를 판단하는 단계를 포함하는 것을 특징으로 하는, 본 발명의 일 실시예에 따른 클릭 감지 장치에 의한 클릭 감지 방법이 개시된다.Acquiring a first sequence image of a hand photographed by a single depth camera worn by a user; Obtaining a first spatio-temporal feature vector from a plurality of frames in the acquired first sequence image; Constructing a random forest based on a second spatio-temporal feature vector extracted from a frame of a second sequence image of a hand that knows information about the occurrence of the click action and the click position; And inputting the first spatiotemporal feature vector into the random forest to determine whether or not a click action of a hand occurs in the first sequence image and a click position. According to an embodiment of the present invention, A method for detecting clicks by a device is disclosed.

Description

[0001] CLICK DETECTING APPARATUS AND METHOD FOR DETECTING CLICK IN FIRST PERSON VIEWPOINT [0002]

본 발명은 자기 일인칭 시점에서의 클릭 감지 장치 및 이에 의한 클릭 감지 방법에 관한 것이다. 보다 구체적으로, 본 발명은 증강 현실 또는 가상 현실 상황에서 사용자의 손의 클릭 동작과 클릭 위치를 판단하는 방법에 관한 것이다.The present invention relates to a click sensing device at a first-person viewpoint and a click sensing method therefor. More particularly, the present invention relates to a method for determining a click action and a click position of a user's hand in augmented reality or virtual reality situations.

증강 현실(augmented reality: AR) 및 가상 현실(virtual reality: VR) 환경에서 손을 추적하고, 손의 제스쳐를 인식하는 많은 연구들이 여러 가지의 인터랙션 시나리오를 제시하였다. 인터랙션 시나리오로서, 공중(mid-air)에서의 멀티 클릭 인터랙션(참고 문헌 [5]), 웨어러블 AR 환경에서의 제스쳐 기반 입력(참고 문헌 [3], [4]), 손 추적(hand tracking) 기반의 VR 장면의 네비게이션(참고 문헌 [10]), 및 VR 환경에서 HMD(head mounted disply)와 손 자세 예측 센서(참고 문헌 [1])를 착용한 상태에서의 오브젝트의 직접 클릭을 예로 들 수 있다.A number of studies on hand tracking and hand gesture recognition in augmented reality (AR) and virtual reality (VR) environments have presented various interaction scenarios. Interaction scenarios include multi-click interactions in the mid-air [5], gesture-based inputs in the wearable AR environment [3], [4] (Refer to Reference [10]) of a VR scene of a robot and a direct-click of an object in a state of wearing a head mounted display (HMD) and a hand posture prediction sensor (Reference [1]) in a VR environment .

그러나, 대부분의 연구는 일인칭 시점(또는, 자기 중심적 관점(egocentric viewpoint))에서 자기의 손바닥이나 다른 손가락에 의해 가려진 손가락의 핑거 팁(fingtip)의 움직임을 감지하는 데는 적합하지 않다. 일인칭 시점에서 손의 클릭 동작과 클릭 위치를 감지하는 것에는 다음과 같은 어려운 점이 있다.However, most studies are unsuitable for detecting the movement of the fingertip of a finger that is obscured by its palm or other finger in the first person view (or in an egocentric viewpoint). There are the following difficulties in detecting the click movement and the click position of the hand at the first person viewpoint.

첫째, 사용자가 자유롭게 움직일 수 있는 상황에서 클릭 동작이 발생할 때, 클릭을 하는 손가락이 다른 손가락에 의해 가려지거나, 손바닥에 의해 가려질 수 있다.First, when a click occurs in a situation where the user can move freely, the finger to be clicked may be blocked by another finger or may be blocked by the palm of the hand.

둘째, 사용자들은 사용자마다 다양한 방식으로 클릭 동작을 한다. 예를 들어, 어떤 사용자는 오직 하나의 핑거 조인트(finger joint)를 이용하여 클릭 동작을 할 수 있으며, 다른 사용자는 손가락의 모든 핑거 조인트를 이용하여 클릭 동작을 할 수도 있다.Second, users perform click operations in various ways for each user. For example, a user may perform a click operation using only one finger joint, while another user may perform a click operation using all the finger joints of the finger.

셋째, 사용자마다 취하는 자세가 매우 다양하다. 하나의 손가락마다 4 자유도(DOF)가 존재하므로, 5개의 손가락의 자유도와, 손바닥의 자유도까지 모두 고려하면 손의 자유도는 총 26 자유도에 이르게 된다.Third, the attitude of each user varies greatly. Since there is 4 degrees of freedom (DOF) per finger, the degree of freedom of the hand is 26 degrees of freedom in total, considering the degrees of freedom of the five fingers and the degree of freedom of the palm.

위와 같이, 손의 클릭 동작과 클릭 위치를 감지하는 데는 많은 어려운 점이 있으나, VR 환경 및 AR 환경의 일인칭 시점에서 사용자의 자유로운 움직임을 보장하면서 손의 클릭 동작 및 클릭 위치를 정확하게 감지하는 방안은 VR 및 AR 분야의 발전을 위해 필요한 사항이다.As described above, there are many difficulties in detecting the clicking action and the click position of the hand. However, a method of accurately detecting the clicking action and the click position of the hand while ensuring the free movement of the user at the first person viewpoint of the VR environment and the AR environment, It is necessary for development of AR field.

한편, 본 명세서에서 참조되는 참고 문헌의 리스트는 다음과 같다.Meanwhile, a list of references referred to in the present specification is as follows.

[1] Leap motion. http://www.leapmotion.com/. Accessed Sep. 10, 2014.[1] Leap motion. http://www.leapmotion.com/. Accessed Sep. 10, 2014.

[2] M. K. Bhuyan, Neog, D. R., and M. K. Kar. Fingertip detection for handpose recognition. International Journal on Computer Science and Engineering, 4(3):501-511, March 2012.[2] M. K. Bhuyan, Neog, D. R., and M. K. Kar. Fingertip detection for handpose recognition. International Journal of Computer Science and Engineering, 4 (3): 501-511, March 2012.

[3] A. Colaco, A. Kirmani, H. S. Yang, N.-W. Gong, C. Schmandt, and V. K. Goyal. Mime: Compact, low power 3d gesture sensing for interaction with head mounted displays. In UIST, pages 227-236, USA, 2013.[3] A. Colaco, A. Kirmani, H. S. Yang, N.-W. Gong, C. Schmandt, and V. K. Goyal. Mime: Compact, low power 3d gesture sensing for interaction with head mounted displays. In UIST, pages 227-236, USA, 2013.

[4] T. Ha, S. Feiner, and W. Woo. Wearhand: Head-worn, RGB-D camerabased, bare-hand user interface with visually enhanced depth perception. In ISMAR, pages 219-228. IEEE, September 2014.[4] T. Ha, S. Feiner, and W. Woo. Wearhand: Head-worn, RGB-D camerabased, bare-hand user interface with visually enhanced depth perception. In ISMAR, pages 219-228. IEEE, September 2014.

[5] G. Hackenberg, R. McCall, and W. Broll. Lightweight palm and finger tracking for real-time 3d gesture control. In Virtual Reality Conference (VR), 2011 IEEE, pages 19 -26, march 2011.[5] G. Hackenberg, R. McCall, and W. Broll. Lightweight palm and finger tracking for real-time 3d gesture control. In Virtual Reality Conference (VR), 2011 IEEE, pages 19-26, march 2011.

[6] P. Krejov and R. Bowden. Multi-touchless: Real-time fingertip detection and tracking using geodesic maxima. 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 0:1-7, 2013.[6] P. Krejov and R. Bowden. Multi-touchless: Real-time fingertip detection and tracking using geodesic maxima. 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 0: 1-7, 2013.

[7] V. Lepetit and P. Fua. Keypoint Recognition Using Random Forests and Random Ferns, pages 111-124. Springer, 2013.[7] V. Lepetit and P. Fua. Keypoint Recognition Using Random Forests and Random Ferns, pages 111-124. Springer, 2013.

[8] Y. Liao, Y. Zhou, H. Zhou, and Z. Liang. Fingertips detection algorithm based on skin colour filtering and distance transformation. In QSIC, pages 276-281. IEEE, 2012.[8] Y. Liao, Y. Zhou, H. Zhou, and Z. Liang. Fingertips detection algorithm based on skin color filtering and distance transformation. In QSIC, pages 276-281. IEEE, 2012.

[9] S. Melax, L. Keselman, and S. Orsten. Dynamics based 3d skeletal hand tracking. In Proceedings of the 2013 Graphics Interface Conference, GI '13, pages 63-70, Toronto, Ont., Canada, Canada, 2013.[9] S. Melax, L. Keselman, and S. Orsten. Dynamics based 3d skeletal hand tracking. In Proceedings of the 2013 Graphics Interface Conference, GI'13, pages 63-70, Toronto, Ont., Canada, Canada, 2013.

[10] Z. Pan, Y. Li, M. Zhang, C. Sun, K. Guo, X. Tang, and S. Z. Zhou. A real-time multi-cue hand tracking algorithm based on computer vision. In Proceedings of the 2010 IEEE Virtual Reality Conference, VR '10, pages 219-222, Washington, DC, USA, 2010. IEEE Computer Society.[10] Z. Pan, Y. Li, M. Zhang, C. Sun, K. Guo, X. Tang, and S. Z. Zhou. A real-time multi-cue hand tracking algorithm based on computer vision. In Proceedings of the 2010 IEEE Virtual Reality Conference, VR '10, pages 219-222, Washington, DC, USA, 2010. IEEE Computer Society.

본 발명의 일 실시예에 따른 일인칭 시점에서의 클릭 감지 장치 및 이에 의한 클릭 감지 방법은 사용자의 손의 클릭 동작 및 클릭 위치를 정확하게 감지하는 것을 목적으로 한다.A click sensing apparatus and a click sensing method in a first person view according to an embodiment of the present invention aim at accurately detecting a click operation and a click position of a user's hand.

또한, 본 발명의 일 실시예에 따른 일인칭 시점에서의 클릭 감지 장치 및 이에 의한 클릭 감지 방법은 사용자의 자유로운 움직임을 보장하는 것을 목적으로 한다.In addition, the click sensing device and the click sensing method in the first person view according to an embodiment of the present invention are intended to ensure free movement of the user.

또한, 본 발명의 일 실시예에 따른 일인칭 시점에서의 클릭 감지 장치 및 이에 의한 클릭 감지 방법은 사용자에게 착용되는 단일 깊이 카메라 외에 사용자에게 별도의 장치의 장착을 요구하지 않는 것을 목적으로 한다.In addition, according to an embodiment of the present invention, there is provided a click sensing device and a click sensing method in a first person view, in which a user is not required to install a separate device in addition to a single depth camera worn by a user.

본 발명의 일 실시예에 따른 클릭 감지 방법은,According to an embodiment of the present invention,

사용자에게 착용된 단일 깊이 카메라에 의해 촬영되는 손의 제 1 시퀀스 영상을 획득하는 단계; 상기 획득한 제 1 시퀀스 영상 내 복수의 프레임으로부터 제 1 시공간 특징 벡터(spatio-temporal feature vector)를 획득하는 단계; 클릭 동작의 발생 여부 및 클릭 위치에 대한 정보를 알고 있는 손의 제 2 시퀀스 영상의 프레임으로부터 추출된 제 2 시공간 특징 벡터에 기초하여, 랜덤 포레스트(random forest)를 구성하는 단계; 및 상기 제 1 시공간 특징 벡터를 상기 랜덤 포레스트에 입력하여, 상기 제 1 시퀀스 영상에서 손의 클릭 동작의 발생 여부 및 클릭 위치를 판단하는 단계를 포함할 수 있다.Acquiring a first sequence image of a hand photographed by a single depth camera worn by a user; Obtaining a first spatio-temporal feature vector from a plurality of frames in the acquired first sequence image; Constructing a random forest based on a second spatio-temporal feature vector extracted from a frame of a second sequence image of a hand that knows information about the occurrence of the click action and the click position; And inputting the first space time feature vector to the random forest to determine whether a hand click occurs in the first sequence image and a click position.

상기 제 1 시공간 특징 벡터를 획득하는 단계는, 상기 제 1 시퀀스 영상 내 복수의 프레임 각각에서 손의 핑거 팁(fingertip)의 위치 및 핑거 조인트(finger joint)의 위치를 감지하는 단계; 및 상기 감지된 손의 핑거 팁의 위치 및 핑거 조인트의 위치를 프레임 순서대로 나열하여 상기 제 1 시공간 특징 벡터를 획득하는 단계를 포함할 수 있다.Wherein the acquiring of the first space time feature vector comprises: sensing a position of a finger finger of a hand and a position of a finger joint in each of a plurality of frames in the first sequence image; And obtaining the first space time feature vector by arranging the position of the finger tip of the detected hand and the position of the finger joint in a frame order.

상기 랜덤 포레스트를 구성하는 단계는, 상기 랜덤 포레스트의 트리 내 스플릿 노드(split node)의 분할 함수(split function)의 종류를 무작위로 선택하는 단계; 및 상기 스플릿 노드에서 선택된 분할 함수의 파라미터를 최적화하는 단계를 포함할 수 있다.The step of constructing the random forest may include: randomly selecting a type of a split function of a split node in the tree of the random forest; And optimizing parameters of the partition function selected at the split node.

상기 분할 함수는, 하나의 프레임에서 도출되는 위치 벡터를 반환하는 함수, 두 개의 프레임에서 도출되는 속도 벡터를 반환하는 함수 및 세 개의 프레임에서 도출되는 가속도 벡터를 반환하는 함수를 포함할 수 있다.The partitioning function may include a function for returning a position vector derived from one frame, a function for returning a velocity vector derived from two frames, and a function for returning an acceleration vector derived from three frames.

상기 분할 함수는 아래의 수학식에 대응하되, The partition function corresponds to the following equation,

[수학식][Mathematical Expression]

위 수학식에서,

는 분할 함수, m은 분할 함수를 나타내는 인덱스로서 무작위로 선택되는 값,

는 제 1 시퀀스 영상(V) 내 n번째 프레임에서 i 손가락의 공간 특징 벡터, i는 손가락을 가리키는 인덱스, j는 특정 손가락의 핑거 조인트들과 핑거 팁 중 어느 하나를 가리키는 인덱스,

는 단일 프레임에서 j에 해당하는 포인트의 위치 벡터를 반환하는 함수,

는 두 개의 프레임에서 j에 해당하는 포인트의 속도 벡터를 반환하는 함수,

는 세 개의 프레임에서 j에 해당하는 포인트의 가속도 벡터를 반환하는 함수, n은 제 1 시퀀스 영상 내 프레임의 개수 및 p, q, r은 선행 오프셋일 수 있다.In the above equation,

Is a partition function, m is a randomly selected index indicating the partition function,

I is an index indicating a finger, j is an index indicating one of the finger joints and finger tips of a specific finger, i is a spatial feature vector of the i-th finger in the n-th frame in the first sequence image V,

A function that returns a position vector of a point corresponding to j in a single frame,

Is a function that returns the velocity vector of a point corresponding to j in two frames,

Where n is the number of frames in the first sequence image and p, q, r may be the preceding offset.

상기 핑거 팁의 위치를 감지하는 단계는, 상기 제 1 시퀀스 영상 내 프레임에서 손의 윤곽 포인트들을 추출하는 단계; 손바닥의 중앙 포인트와 상기 추출된 윤곽 포인트들 사이의 거리를 계산하고, 각 윤곽 포인트마다 이전 윤곽 포인트와의 거리 변화량을 계산하는 단계; 각각의 윤곽 포인트마다, 소정 개수의 이전 윤곽 포인트 각각에서의 거리 변화량, 현재 윤곽 포인트에서의 거리 변화량 및 소정 개수의 이후 윤곽 포인트 각각에서의 거리 변화량을 엘리먼트로 하는 모양 특징 벡터를 구성하는 단계; 핑거 팁의 위치를 알고 있는 손의 이미지의 모양 특징 벡터에 기초하여, 랜덤 포레스트를 구성하는 단계; 및 상기 모양 특징 벡터에 기초하여 구성된 랜덤 포레스트에 상기 각각의 윤곽 포인트의 모양 특징 벡터를 입력하여, 상기 제 1 시퀀스 영상 내 프레임에서 핑거 팁의 위치를 감지하는 단계를 포함할 수 있다.Wherein the sensing the position of the finger tip comprises: extracting contour points of a hand in a frame in the first sequence image; Calculating a distance between a central point of the palm and the extracted contour points and calculating a distance variation with respect to the previous contour point for each contour point; Constructing a shape feature vector for each of the contour points with a distance variation amount at each of a predetermined number of previous contour points, a distance variation amount at a current contour point, and a distance variation amount at each of a predetermined number of subsequent contour points; Constructing a random forest based on a shape feature vector of an image of a hand that knows the location of the finger tip; And inputting a shape feature vector of each of the contour points to a random forest configured based on the shape feature vector to detect the position of the finger tip in the frame in the first sequence image.

상기 판단하는 단계는, 상기 랜덤 포레스트의 복수의 트리 각각에서 결정된 손의 클릭 동작 발생 확률의 평균과 기 설정된 임계 값을 비교하여, 상기 손의 클릭 동작의 발생 여부를 판단하는 단계; 및 상기 랜덤 포레스트의 복수의 트리 각각에서 결정된 손의 클릭 위치들 중 보우트(vote)가 가장 높은 손의 클릭 위치를 최종 클릭 위치로 판단하는 단계를 포함할 수 있다.Wherein the determining comprises comparing an average probability of occurrence of a click motion of a hand determined in each of the plurality of trees of the random forest with a predetermined threshold value to determine whether or not the click operation of the hand occurs; And determining a click position of a hand having the highest vote among the click positions of the hand determined in each of the plurality of trees of the random forest as a final click position.

상기 사용자에게 착용된 단일 깊이 카메라는, HMD(head mounted display)를 포함할 수 있다.The single depth camera worn by the user may include a head mounted display (HMD).

본 발명의 다른 실시예에 따른 클릭 감지 장치는,According to another aspect of the present invention,

사용자에게 착용된 단일 깊이 카메라에 의해 촬영되는 손의 제 1 시퀀스 영상을 획득하는 영상 획득부; 상기 획득한 제 1 시퀀스 영상 내 복수의 프레임으로부터 제 1 시공간 특징 벡터(spatio-temporal feature vector)를 획득하는 영상 감지부; 및 클릭 동작의 발생 여부 및 클릭 위치에 대한 정보를 알고 있는 손의 제 2 시퀀스 영상의 프레임으로부터 추출된 제 2 시공간 특징 벡터에 기초하여, 랜덤 포레스트(random forest)를 구성하고, 상기 제 1 시공간 특징 벡터를 상기 랜덤 포레스트에 입력하여, 상기 제 1 시퀀스 영상에서 손의 클릭 동작의 발생 여부 및 클릭 위치를 판단하는 제어부를 포함할 수 있다.An image acquiring unit acquiring a first sequence image of a hand photographed by a single depth camera worn by a user; An image sensing unit for acquiring a first spatio-temporal feature vector from a plurality of frames in the acquired first sequence image; And a second spatio-temporal feature vector extracted from a frame of a second sequence image of a hand that knows information about whether a click operation has occurred and a click position, and generates a first spatio-temporal feature And a controller for inputting a vector to the random forest and determining whether a click operation of the hand is generated and a click position in the first sequence image.

본 발명의 일 실시예에 따른 일인칭 시점에서의 클릭 감지 장치 및 이에 의한 클릭 감지 방법이 달성할 수 있는 일부의 효과는 다음과 같다.Some effects that can be achieved by the click sensing device and the click sensing method in the first person view according to an embodiment of the present invention are as follows.

i) 사용자의 손의 클릭 동작 및 클릭 위치를 정확하게 감지할 수 있다.i) It is possible to accurately detect the click operation and the click position of the user's hand.

ii) 사용자의 자유로운 움직임을 보장할 수 있다.ii) it can guarantee the free movement of the user.

iii) 사용자에게 착용되는 단일 깊이 카메라 외에 사용자에게 별도의 장치의 장착을 요구하지 않는다.iii) In addition to a single depth camera worn by the user, the user is not required to install a separate device.

다만, 본 발명의 일 실시예에 따른 일인칭 시점에서의 클릭 감지 장치 및 이에 의한 클릭 감지 방법이 달성할 수 있는 효과는 이상에서 언급한 것들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, the effects that can be achieved by the click sensing device and the click sensing method in the first person view according to the embodiment of the present invention are not limited to those mentioned above, It will be understood by those skilled in the art that the present invention can be understood by those skilled in the art.

도 1은 본 발명의 일 실시예에 따른 클릭 감지 방법을 설명하기 위한 개략적인 도면이다.
도 2는 본 발명의 일 실시예에서의 시공간 특징을 설명하기 위한 예시적인 도면이다.
도 3은 시공간 특징에 기초하여 구성된 랜덤 포레스트의 트리를 나타내는 예시적인 도면이다.
도 4(a) 및 도 4(b)는 프레임에서 손의 핑거 팁을 감지하는 방법을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 클릭 감지 방법의 ROC 커브를 나타내는 도면이다.
도 6a 내지 도 6d는 종래의 기술과 본 발명의 일 실시에에 따른 클릭 감지 방법을 비교하기 위한 다양한 시나리오에서의 실험 결과를 나타내는 도면이다.
도 7은 본 발명의 다른 실시예에 따른 클릭 감지 장치의 구성을 나타내는 블록도이다.FIG. 1 is a schematic diagram for explaining a click detection method according to an embodiment of the present invention. Referring to FIG.
2 is an exemplary diagram for explaining space-time features in an embodiment of the present invention.
Figure 3 is an exemplary diagram illustrating a tree of random forests configured based on space-time features.
4 (a) and 4 (b) are views for explaining a method of detecting a finger tip of a hand in a frame.
5 is a diagram illustrating an ROC curve of the click detection method according to an embodiment of the present invention.
6A to 6D are diagrams showing experimental results in various scenarios for comparing the click detection method according to the conventional technology and the embodiment of the present invention.
7 is a block diagram illustrating a configuration of a click sensing apparatus according to another embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고, 이를 상세한 설명을 통해 상세히 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명은 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. It is to be understood, however, that the intention is not to limit the invention to the specific embodiments, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

본 발명을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서의 설명 과정에서 이용되는 숫자(예를 들어, 제 1, 제 2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별기호에 불과하다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. In addition, numerals (e.g., first, second, etc.) used in the description of the present invention are merely an identifier for distinguishing one component from another.

또한, 본 명세서에서, 일 구성요소가 다른 구성요소와 "연결된다" 거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다고 이해되어야 할 것이다.Also, in this specification, when an element is referred to as being "connected" or "connected" with another element, the element may be directly connected or directly connected to the other element, It should be understood that, unless an opposite description is present, it may be connected or connected via another element in the middle.

또한, 본 명세서에서 '~부(유닛)', '모듈' 등으로 표현되는 구성요소는 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, 이 구성요소는 어떤 역할들을 수행한다. 그렇지만, 구성 요소는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. 구성요소는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있다. 또한, 2개 이상의 구성요소가 하나의 구성요소로 합쳐지거나 또는 하나의 구성요소가 보다 세분화된 기능별로 2개 이상으로 분화될 수도 있다. 또한, 이하에서 설명할 구성요소 각각은 자신이 담당하는 주기능 이외에도 다른 구성요소가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성요소 각각이 담당하는 주기능 중 일부 기능이 다른 구성요소에 의해 전담되어 수행될 수도 있음은 물론이다.In addition, components referred to in this specification as 'units', 'modules', and the like refer to hardware components such as software, FPGA, or ASIC, and these components perform certain roles. However, the components are not limited to software or hardware. The component may be configured to reside on an addressable storage medium. Further, two or more components may be merged into one component, or one component may be divided into two or more functions according to a more refined function. In addition, each of the components to be described below may additionally perform some or all of the functions of the other components in addition to the main functions of the component itself, and some of the main functions And may be performed entirely by components.

이하에서는, 도면을 참조하여 본 발명의 기술적 사상에 따른 예시적인 실시예들에 대해 설명한다.Hereinafter, exemplary embodiments according to the technical idea of the present invention will be described with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 클릭 감지 방법 (또는 터치 감지 방법)을 설명하기 위한 개략적인 도면이다.FIG. 1 is a schematic view for explaining a click sensing method (or a touch sensing method) according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 클릭 감지 장치는 사용자에게 착용된 단일 깊이 카메라(single depth camera)에 의해 촬영되는 손의 제 1 시퀀스 영상(V)를 획득한다. 제 1 시퀀스 영상(V)는 복수의 프레임 이미지를 포함할 수 있다.A click sensing device according to an embodiment of the present invention acquires a first sequence image V of a hand photographed by a single depth camera worn by a user. The first sequence image V may include a plurality of frame images.

클릭 감지 장치는 제 1 시퀀스 영상(V)의 각 프레임에서 손의 핑거 조인트(finger joint)의 3차원 위치, 즉, 단일 깊이 카메라를 원점으로 하는 핑거 조인트의 x 좌표, y 좌표 및 z 좌표를 감지한다. 손의 핑거 조인트의 개수는 총 20개로서, 하나의 프레임에서 20개의 핑거 조인트의 위치는 3차원 벡터인

의 엘리먼트로 포함된다. 클릭 감지 장치는 3D 핸드 포스쳐 추정기(3D hand posture estimator)(R)(참고 문헌 [9])를 이용하여 손의 핑거 조인트의 위치를 감지할 수 있으며, 구현예에 따라서는 참조 문헌 [9]이외에 공지된 다른 방법에 따라 손의 핑거 조인트의 위치를 감지할 수도 있다. 참조 문헌 [9]의 3D 핸드 포스쳐 추정기(R)는 각 핑거 조인트의 기저 벡터(basis vector)를 제공하기 때문에 스케일(scale) 및 회전(rotation)에 대해 불변한 시공간 특징을 획득하기 위해,

의 모든 3차원 벡터들은 아래의 수학식 1에 따라 베이스 조인트

를 원점으로 하는 로컬 좌표로 변환된다. 로컬 좌표로의 변환은 후술하는 핑거 팁(fingertip)의 3차원 벡터에 대해서도 적용된다.The click sensing device senses the x-coordinate, y-coordinate and z-coordinate of a finger joint with a three-dimensional position of the finger joint of the hand in each frame of the first sequence image (V) do. The total number of finger joints in the hand is 20, and the positions of twenty finger joints in one frame are three-dimensional vectors

. &Lt; / RTI > The click sensing device can sense the position of a finger joint of a hand using a 3D hand posture estimator (R) (Ref. 9), and according to an embodiment, Alternatively, the position of the finger joint of the hand may be sensed according to other known methods. Since the 3D hand posture estimator R of Ref. 9 provides a basis vector of each finger joint, in order to obtain an invariant space-time characteristic for scale and rotation,

Lt; RTI ID = 0.0 > 1 < / RTI >

As the origin. The conversion to the local coordinates is also applied to the three-dimensional vector of a finger tip (to be described later).

[수학식 1][Equation 1]

수학식 1에서 v_l은 핑거 조인트의 로컬 좌표, v_g는 핑거 조인트의 글로벌 좌표, M^-1은 베이스 조인트

의 변환 매트릭스의 인버스 매트릭스이다. 또한, R은 3×3의 회전 매트릭스(rotation matrix), T는 3×1의 이동 벡터(translation vector)를 나타내며, 이에 따라 M은 4×4의 변환 매트릭스이다.In Equation 1, v _l is the local coordinate of the finger joint, v _g is the global coordinate of the finger joint, M ^-1 is the base joint

Lt; / RTI > is the inverse matrix of the transformation matrix. Also, R represents a 3 × 3 rotation matrix and T represents a 3 × 1 translation vector, whereby M is a 4 × 4 transformation matrix.

클릭 감지 장치는 참조 문헌 [2], [8]을 변형하여 제 1 시퀀스 영상(V)의 각 프레임에서 손의 핑거 팁(fingertip)의 2차원 위치, 즉, 단일 깊이 카메라를 원점으로 하는 핑거 팁의 x 좌표 및 y 좌표를 감지하고, 단일 깊이 카메라의 깊이 값(depth value)에 따라 각 핑거 조인트의 z 좌표를 추가하여 결국, 각 핑거 팁의 3차원 위치를 감지한다. 하나의 프레임에서 5개의 손가락의 핑거 팁의 위치는 3차원 벡터인

의 엘리먼트로 포함된다. 본 발명의 실시예에서는 각 프레임에서 손의 핑거 팁을 감지하는데 있어, 랜덤 포레스트(random forest)(F_S)를 이용할 수 있는데, 이에 대해서는 후술한다.The click sensing apparatus modifies the reference documents [2], [8] so that the two-dimensional position of the fingertip of the hand in each frame of the first sequence image (V) And adds the z coordinate of each finger joint according to the depth value of the single depth camera to eventually detect the three-dimensional position of each finger tip. The position of the finger tip of the five fingers in one frame is a three-dimensional vector

. &Lt; / RTI > In an embodiment of the present invention, a random forest (F _S ) may be used to detect a finger tip of a hand in each frame, as will be described later.

각 프레임에서 핑거 조인트와 핑거 팁이 감지되면, 클릭 감지 장치는 모든 프레임에서의 특징 벡터

와 특징 벡터 s의 엘리먼트들을 선택적으로 조합하여 공간 특징 벡터(spatial feature vector) x를 아래의 수학식 2와 같이 구성한다.If a finger joint and a finger tip are detected in each frame, the click sensing device detects the feature vector

And the elements of the feature vector s are selectively combined to construct a spatial feature vector x according to Equation (2) below.

[수학식 2]&Quot; (2) "

위 수학식 2에서 i는 손가락의 인덱스, t는 프레임(또는 시간)의 인덱스로서 t는 1 내지 n의 범위에 속하고, n은 전체 프레임의 개수를 의미한다. 위 수학식에서 i가 3이면,

는

중에서

을 포함한다.In Equation (2), i denotes an index of a finger, t denotes an index of a frame (or time), t belongs to a range of 1 to n, and n denotes the total number of frames. If i is 3 in the above equation,

The

Between

.

클릭 감지 장치는 일정 시간 동안의 x를 프레임 순서대로 결합하여 도 2에 도시된 것과 같은 시공간 특징 벡터 X={x_it}를 구성한다. 도 2는 i가 3인 손가락의 공간 특징 벡터가 시간에 따라 나열된 시공간 특징 벡터를 도시하고 있다.The click sensing device combines x for a certain time in frame order to construct a space-time feature vector X = {x _it } as shown in FIG. Fig. 2 shows a space-time feature vector in which the spatial feature vectors of fingers with i = 3 are arranged in time.

클릭 감지 장치는 제 1 시퀀스 영상으로부터 획득한 시공간 특징 벡터로부터 사용자의 클릭 동작이 발생하였는지, 클릭 위치가 어디인지를 판단하기 위해, 클릭 동작의 발생 여부(a)와 클릭 위치(a')에 대한 정보를 미리 알고 있는 제 2 시퀀스 영상으로부터 시공간 특징 벡터를 추출하고, 추출된 시공간 특징 벡터를 학습하여 랜덤 포레스트(F_A)를 구성한다. 제 2 시퀀스 영상으로부터 시공간 특징 벡터를 추출하는 방법은 제 1 시퀀스 영상으로부터 시공간 특징 벡터를 추출하는 방법과 동일하다.The click sensing apparatus determines whether or not the user click action has occurred from the space time feature vector obtained from the first sequence image and whether the click action is generated (a) and the click position (a ') Extracts a space-time feature vector from a second sequence image that knows information in advance, and constructs a random forest (F _A ) by learning the extracted space-time feature vectors. The method of extracting the space-time feature vector from the second sequence image is the same as the method of extracting the space-time feature vector from the first sequence image.

본 발명에서 시공간 특징 벡터에 기초하여 랜덤 포레스트(F_A)를 구성하는 목적은 핑거 팁이 가려져(occluded) 있더라도 분류기(classifier) 내에서 클릭 동작과 클릭 위치를 감지하기 위함이다. 랜덤 포레스트는 이진 결정 트리(binary decision tree)의 앙상블(ensemble)로서, 각각의 트리는 두 가지 종류의 노드, 즉, 스플릿 노드(split node)와 리프 노드(leaf node)를 포함한다. 본 발명에서의 랜덤 포레스트(F_A)의 각 스플릿 노드들은 입력 데이터에 대하여, 무작위로 선택된 파라미터 값 h=(0: action detection, 1: position estimation)에 의해 결정된 특정 업무의 분할 함수를 수행하고, 이를 왼쪽의 하위 노드로 라우팅할지, 오른쪽의 하위 노드로 라우팅할지를 결정한다. 위에서 h는 특정 업무의 종류를 나타낸다. 리프 노드는 클릭 동작의 상태(status)를 나타내는 종단 노드로서 클릭 동작의 확률과 3차원 공간에서의 핑거 팁의 위치를 저장한다.In the present invention, the purpose of constructing a random forest (F _A ) based on the space-time feature vector is to detect a click operation and a click position in a classifier even if the finger tip is occluded. A random forest is an ensemble of binary decision trees. Each tree includes two kinds of nodes: a split node and a leaf node. Each split node of the random forest F _A in the present invention performs a split function of the specific task determined by a randomly selected parameter value h = (0: action detection, 1: position estimation) on the input data, It is determined whether it is routed to the lower left node or the lower right node. In the above, h represents the kind of specific business. The leaf node stores the probability of the click operation and the position of the finger tip in the three-dimensional space as an end node indicating the status of the click operation.

랜덤 포레스트(F_A) 내의 각 트리는 현재의 훈련(또는 학습) 데이터를 재귀적으로(recursively) 분할하고, 두 개의 자손 노드(child node)로 보내면서 성장한다. 트리의 각 노드에서는 무작위로 분할 후보들

이 생성되며, 여기서,

은 분할 함수이고,

은 임계 값으로서, 입력 데이터 D를 두 개의 서브셋인 D^l과 D^r로 분할한다.

이고,

이다. V는 입력되는 시퀀스 영상을 의미한다.Each tree in the random forest (F _A ) grows by recursively partitioning the current training (or learning) data and sending it to two child nodes. At each node of the tree,

Is generated,

Is a partition function,

As the threshold, divides the input data D into two sub-sets, D ^l and D ^r .

ego,

to be. V is the input sequence image.

또한,

은 아래의 수학식 3과 같이 정의된다.Also,

Is defined as Equation (3) below.

[수학식 3]&Quot; (3) "

위 수학식 3에서, m은 분할 함수를 나타내는 인덱스,

는 시퀀스 영상(V) 내 n번째 프레임에서의 i 손가락의 공간 특징 벡터, i는 손가락을 가리키는 인덱스, j는 특정 손가락의 핑거 조인트들과 핑거 팁 중 어느 하나의 포인트의 특징 벡터를 가리키는 인덱스,

는 j에 해당하는 포인트의 위치 벡터를 반환하는 함수,

는 두 개의 프레임 사이의 시간 변화량에 따른, j에 해당하는 포인트의 위치 변화량, 즉 속도 벡터를 반환하는 함수이며,

는 세 개의 프레임 사이에서 j에 해당하는 포인트의 속도의 변화량, 즉, 가속도 벡터를 반환하는 함수이다. 또한, n은 시퀀스 영상 내 프레임의 개수, p, q, r은 선행 오프셋(preceding offset)을 의미한다.In the above equation (3), m is an index indicating a division function,

I is an index indicating a finger, j is an index indicating a feature vector of any one of finger joints and finger tips of a specific finger,

Is a function that returns the position vector of the point corresponding to j,

Is a function that returns a positional change amount of a point corresponding to j, i.e., a velocity vector, according to a time variation amount between two frames,

Is a function that returns the amount of change in velocity of a point corresponding to j between three frames, that is, an acceleration vector. Also, n is the number of frames in the sequence image, and p, q, and r are preceding offset.

m은 무작위로 선택되는 업무 종류 h를 기반으로 결정된다. 예를 들어, h가 0인 경우, 이는 action detection 업무를 의미하므로, m은 2 또는 3으로 무작위로 결정된다. h가 1이라면, 이는 position estimation 업무를 의미하므로, m은 1로 결정된다.m is determined based on a randomly selected task type h. For example, if h is 0, it means action detection task, so m is randomly determined to be 2 or 3. If h is 1, it means position estimation task, so m is determined as 1.

본 발명의 일 실시예에 따른 랜덤 포레스트(F_A)는 일반적인 랜덤 포레스트와는 달리, 트리의 각 스플릿 노드에서 h의 값에 따라 서로 다른 종류의 분할 후보들이 저장된다는 점이다. 예를 들어, h가 0이면, 가장 큰 인포메이션 게인(largest information gain)을 주는 분할 후보가 최적의(best) 분할 후보로 저장되고, h가 1이라면, 가장 작은 회귀적 불확실성(smallest regression uncertainty)를 주는 분할 후보가 최적의 분할 후보로 저장된다.The random forest F _A according to an embodiment of the present invention is different from a general random forest in that split candidates of different kinds are stored according to the value of h in each split node of the tree. For example, if h is 0, the partition candidate that gives the largest information gain is stored as the best best partition candidate, and if h is 1, the smallest regression uncertainty The dividend candidate is stored as the optimal split candidate.

위의 인포메이션 게인은 아래의 수학식 4로 정의되고,The above information gain is defined by Equation (4) below,

[수학식 4]&Quot; (4) "

위 수학식 4에서 H(·)는 섀넌 엔트로피(Shannon's Entropy)로서, 아래의 수학식 5로 정의된다.In Equation (4), H (·) is Shannon's Entropy, which is defined by Equation (5) below.

[수학식 5]&Quot; (5) "

회귀적 불확실성에 관하여, 로컬 좌표 내 핑거 팁의 위치 벡터들의 분산(variance)에 근거하여 아래의 수학식 6으로 정의된다.Regression uncertainty is defined by the following equation (6) based on the variance of the position vectors of the finger tips in the local coordinates.

[수학식 6]&Quot; (6) "

수학식 6에서 ∑x는 핑거 팁의 위치 벡터들의 세트의 샘플 공분산 매트릭스(sample covariance matrix)이고, tr(·)은 트레이스 함수(trace function)이다. 여기서, 핑거 팁의 위치 벡터들은 수학식 1에 따라 로컬 좌표로 변환된 베이스 조인트

의 위치로부터 핑거 팁 위치까지의 오프셋을 가리킨다. In Equation (6), [Sigma] x is a sample covariance matrix of a set of finger tip position vectors, and tr ([eta]) is a trace function. Here, the position vectors of the finger tips are calculated according to Equation (1)

Lt; / RTI > to the finger tip position.

트리의 성장은 중단 기준에 만족할 때까지, 분할 데이터인 D^l과 D^r에 대해 재귀적으로 반복된다. 트리의 성장 프로세스는 데이터 셋(date set)의 샘플의 개수가 기 설정된 최소 개수보다 작거나, 트리의 깊이(depth)가 기 설정된 값을 초과하는 경우에 중단될 수 있다.The growth of the tree is repeated recursively for the split data D ^l and D ^r until the stop criterion is satisfied. The growth process of the tree can be stopped if the number of samples in the date set is smaller than the predetermined minimum number, or if the depth of the tree exceeds a predetermined value.

도 3에 도시된 바와 같이, 랜덤 포레스트(F_A)의 트리들은 학습 단계에서 각 파라미터의 값(예를 들어, h, m, i, j, p, q, r)을 무작위로 선택하면서, 각 스플릿 노드에서 최적의 분할 함수를 찾기 위해 최적의 파라미터 조합을 찾는다. 더 나아가, 트리의 리프 노드에는 클릭 동작의 확률과 손의 베이스 조인트의 로컬 좌표 내에 위치한 클릭 위치의 오프셋 벡터가 학습 단계에서 저장된다.As shown in FIG. 3, the trees of the random forest F _A randomly select values of each parameter (e.g., h, m, i, j, p, q, In order to find the optimal partition function at the split node, we find the optimal parameter combination. Further, at the leaf node of the tree, the probability of the click operation and the offset vector of the click position located in the local coordinate of the base joint of the hand are stored in the learning step.

제 1 시퀀스 영상의 시공간 특징을 랜덤 포레스트(F_A)에 입력하면, 랜덤 포레스트(F_A)로부터 클릭 동작 상태의 확률(예를 들어, 클릭 동작이 맞을(true) 확률 및 클릭 동작이 아닐(false) 확률)과 로컬 좌표 내에서의 핑거 팁 위치의 오프셋이 아래의 수학식 7과 같이 결정된다.First to enter the space-time characteristics of the first sequence images to random forest (F _A), the probability of clicking operation state from a random forest (F _A) (e.g., a click operation right (true) probability and not the click operation (false ) &Lt; / RTI > probability and the offset of the finger tip position within the local coordinate are determined as: < EMI ID = 7.0 >

[수학식 7]&Quot; (7) "

수학식 7에서,

이고,

는 클릭 동작

의 상태의 확률이며, 여기서,

이고,

은 클릭 동작이 발생하였을 때의 추정된 핑거 팁의 로컬 오프셋 위치 x, y, z이다. 클릭 동작

의 확률은 랜덤 포레스트(F_A)의 트리들의 리프 노드에 저장된 값들을 평균하여 계산된다. 클릭 감지 장치는 계산된 평균 값이 기 설정된 임계값보다 큰 경우, 클릭이 실제 발생한 것으로 판단한다. 또한, 랜덤 포레스트(F_A)의 트리들의 리프 노드에 저장된 로컬 오프셋 위치들 중 가장 많은 보우트(vote)의 로컬 오프셋 위치가 최종적인 핑거 팁의 클릭 위치, 즉, 로컬 오프셋 위치로 판단될 수 있다. 클릭 감지 장치는 최종적인 핑거 팁의 로컬 오프셋 위치를 글로벌 좌표로 변환한다.In Equation (7)

ego,

Click action

, Where < RTI ID = 0.0 >

ego,

Is the local offset position x, y, z of the estimated finger tip when the click action occurred. Click action

Is calculated by averaging the values stored in the leaf nodes of the trees of the random forest (F _A ). The click sensing device determines that a click actually occurs if the calculated average value is greater than a predetermined threshold. In addition, the local offset position of the most votes among the local offset positions stored in leaf nodes of the random forest F _A can be determined as the click position of the final finger tip, i.e., the local offset position. The click sensing device converts the local offset position of the final finger tip to global coordinates.

한편, 클릭 감지 장치는 참조 문헌 [2], [8]을 변형하여 제 1 시퀀스 영상(V)의 각 프레임에서 손의 핑거 팁(fingertip)의 2차원 위치를 감지할 수 있다고 전술하였는데, 이의 방법에 대해 도 4를 참조하여 설명한다.On the other hand, the click sensing device has been described as being capable of detecting the two-dimensional position of the finger tip of the hand in each frame of the first sequence image V by modifying reference documents [2] and [8] Will be described with reference to FIG.

참조 문헌 [2], [8]은 손의 이미지에서, 복수의 윤곽 포인트와 손바닥의 중앙 포인트를 추출하고, 중앙 포인트와 각 윤곽 포인트 사이의 거리에 기초하여 손의 핑거 팁을 감지하는 방안을 제안하고 있다. 본 발명의 일 실시예에서는 도 4(a)에 도시된 바와 같이, 스케일에 불변한 핑거 팁의 모양 벡터(shape vector)를 구성하기 위해 아래의 수학식 8에 따라 각 윤곽 포인트에서 이전 윤곽 포인트와의 거리 차이를 계산한다.

은 0으로 설정된다.References [2] and [8] propose a method of extracting a plurality of contour points and a central point of the palm from the image of the hand, and sensing the finger tip of the hand based on the distance between the central point and each contour point . In an embodiment of the present invention, as shown in Fig. 4 (a), in order to construct a shape vector of a finger tip that is constant to the scale, the previous contour point at each contour point .

Is set to zero.

[수학식 8]&Quot; (8) "

수학식 8에서

이고, w는 이미지 내 윤곽 포인트들의 개수이고, c_l은 이미지 내 l번째 윤곽 포인트의 2차원 위치를 나타낸다. 또한, d_l은 윤곽 포인트 c_l과 손바닥 중앙 포인트 p 사이의 거리를 나타내고, 아래의 수학식 9와 같이 계산된다. In Equation (8)

W is the number of contour points in the image, and c _l is the two-dimensional position of the lth contour point in the image. Further, d _l represents the distance between the contour point c _l and the palm center point p, and is calculated as shown in Equation 9 below.

[수학식 9]&Quot; (9) "

각 윤곽 포인트에서의 모양 특징 벡터는 소정 개수의 이전 윤곽 포인트 각각에서의 거리 변화량, 현재 윤곽 포인트에서의 거리 변화량 및 소정 개수의 이후 윤곽 포인트 각각에서의 거리 변화량을 엘리먼트로 포함한다. 다시 말하면, 윤곽 포인트 l의 위치를 고려하여 슬라이딩 윈도우(sliding window)가 결정되는데, 도 4(b)에 도시된 s_u=l-offset으로 설정되고, s_v=l+offset으로 설정되며, 윤곽 포인트 l에서의 모양 특징 벡터

이다. 실험적으로, offset은 7로 설정될 수 있으며, 이에 의하면, 모양 특징 벡터 Y는 총 15개의 엘리먼트를 포함한다.The shape feature vector at each contour point includes as an element a distance variation amount at each of a predetermined number of previous contour points, a distance variation amount at the current contour point, and a distance variation amount at each of a predetermined number of subsequent contour points. In other words, a sliding window is determined in consideration of the position of the contour point l, which is set to s _u = 1-offset shown in FIG. 4 (b), s _v = 1 + offset, Shape feature vector at point l

to be. Experimentally, the offset can be set to 7, whereby the shape feature vector Y contains a total of 15 elements.

클릭 감지 장치는 핑거 팁의 위치를 이미 알고 있는 손의 이미지들의 모양 특징 벡터에 기초하여, 랜덤 포레스트(F_S)를 구성한다. 여기서, 랜덤 포레스트(F_S)의 각 트리의 스플릿 노드들의 분할 함수는 참조 문헌 [7]의 투 픽셀 테스트(two pixel test)와 비슷하게 모양 특징 벡터 Y에서 무작위로 선택된 두 개의 엘리먼트를 비교한다. 그래서, 현재의 스플릿 노드의 특징 데이터 세트 Ds를 두 개의 서브 세트, 즉

과

로 분할하는데, 여기서,

이고,

이며,

과

는 모양 특징 벡터 Y의 엘리먼트이다.The click sensing device constructs a random forest (F _S ) based on shape feature vectors of images of the hand that already know the location of the finger tips. Here, the partition function of the split nodes of each tree of the random forest (F _S ) compares two elements randomly selected from the shape feature vector Y, similar to the two pixel test of Reference [7]. Thus, the feature data set Ds of the current split node is divided into two subsets:

and

Lt; RTI ID = 0.0 >

ego,

Lt;

and

Is an element of the shape feature vector Y.

랜덤 포레스트(F_S)가 구성되면, 제 1 시퀀스 영상의 프레임들을 렌덤 포레스트(F_S)에 입력하고, 핑거 팁일 확률과 팅거 팁이 아닐 확률을 고려하여 최종적인 핑거 팁을 감지할 수 있다. When the random forest F _S is configured, the frames of the first sequence image may be input to the random forest F _S , and the final finger tip may be detected considering the probability of the finger tip and the probability of not being the Tinger tip.

도 5는 본 발명의 일 실시예에 따른 클릭 감지 방법의 ROC 커브를 나타내는 도면이고, 도 6a 내지 도 6d는 종래의 기술과 본 발명의 일 실시에에 따른 클릭 감지 방법을 비교하기 위한 다양한 시나리오에서의 실험 결과를 나타내는 도면이다.FIG. 5 is a diagram illustrating an ROC curve of a click sensing method according to an exemplary embodiment of the present invention. FIGS. 6A to 6D are diagrams illustrating an example of a click sensing method in various scenarios for comparing a click sensing method according to an exemplary embodiment of the present invention Fig.

도 5는 손가락별 클릭 동작에 따른 클릭 감지 방법의 ROC 커브를 나타내는데, 처리 시간은 30.65FPS(frame per second)로 계산되었고, 각 손가락의 클릭 동작 감지 정확도는 엄지 89.80%, 검지 96.90%, 중지 95.68%, 약지 92.82% 및 소지 94.37%의 성공률을 보였다. 이는 참고문헌 [6]의 77.09%의 실패율에 비해 급격하게 향상된 수치이다.FIG. 5 shows the ROC curve of the click detection method according to the finger click operation. The processing time was calculated as 30.65 FPS (frame per second), and the accuracy of the click detection of each finger was 89.80% for the thumb, 96.90% for the index finger, %, Ring finger 92.82% and possession 94.37% respectively. This is a dramatically improved value compared to the failure rate of 77.09% of [6].

도 6a는 정지된(static) 오브젝트가 희박하게(sparse)하게 존재하는 가상 환경에서의 실험 결과, 도 6b는 동적(dynamic) 오브젝트가 희박하게 존재하는 가상 환경에서의 실험 결과, 도 6c는 정지된 오브젝트가 밀집되어(dense) 존재하는 가상 환경에서의 실험 결과 및 도 6d는 동적 오브젝트가 밀집되어 존재하는 가상 환경에서의 실험 결과를 나타낸다. 각 그래프에서 a는 실험자가 중앙에 위치하는 타겟 오브젝트를 클릭하는 실험을 하였을 때, 참고 문헌 [2] 및 [8]에 따라 감지된 클릭 영역을 나타내며, b는 참고 문헌 [9]에 따라 감지된 클릭 영역을 나타내며, c는 본 발명의 일 실시예에 따라 감지된 클릭 영역을 나타낸다.6A shows experimental results in a virtual environment in which static objects exist sparse, FIG. 6B shows experimental results in a virtual environment in which dynamic objects are sparsely present, FIG. 6C shows results of static FIG. 6D shows experimental results in a virtual environment in which objects are dense and FIG. 6D shows experimental results in a virtual environment in which dynamic objects are concentrated. In each graph, a represents the click area sensed according to the reference [2] and [8] when the experimenter clicks on the target object located at the center, and b is detected according to the reference [9] Represents a click area, and c represents a click area sensed according to an embodiment of the present invention.

도 6a 내지 도 6d에 도시된 바와 같이, 참고 문헌 [2] 및 [8]에 의하면 실제 타겟 오브젝트의 위치보다 사용자 쪽으로 치우쳐서 클릭 위치가 감지되는 경향을 확인할 수 있고, 참고 문헌 [9]에 의하면 실제 타겟 오브젝트의 위치보다 높이 치우쳐서 클릭 위치가 감지되는 경향을 확인할 수 있다. 이에 반해, 본 발명의 실시예에 따르면, 어느 한쪽으로 치우치지 않고 타겟 오브젝트의 주위에서 클릭 위치가 고르게 감지된 것을 알 수 있다.As shown in FIGS. 6A to 6D, according to references [2] and [8], it is possible to confirm a tendency that a click position is sensed by biasing toward a user rather than a position of an actual target object. According to [9] It is possible to confirm the tendency of the click position to be detected by being shifted higher than the position of the target object. On the other hand, according to the embodiment of the present invention, it can be seen that the click position is uniformly detected around the target object without being shifted to either side.

도 7은 본 발명의 다른 실시예에 따른 클릭 감지 장치(700)의 구성을 나타내는 블록도이다.FIG. 7 is a block diagram illustrating a configuration of a click sensing apparatus 700 according to another embodiment of the present invention.

도 7을 참조하면, 본 발명의 다른 실시예에 따른 클릭 감지 장치(700)는 영상 획득부(710), 영상 감지부(730) 및 제어부(750)를 포함할 수 있다. 영상 획득부(700), 영상 감지부(730) 및 제어부(750)는 적어도 하나의 마이크로 프로세서로 구현될 수 있으며, 도시되지 않은 메모리에 저장된 프로그램에 따라 동작할 수 있다.7, the click sensing apparatus 700 may include an image acquiring unit 710, an image sensing unit 730, and a controller 750. The click sensing apparatus 700 may include an image sensing unit 710, an image sensing unit 730, The image acquisition unit 700, the image sensing unit 730, and the control unit 750 may be implemented by at least one microprocessor, and may operate according to a program stored in a memory (not shown).

본 발명의 다른 실시예에 따른 클릭 감지 장치(700)는 사용자에게 착용 가능한 장치로서, 단일 깊이 카메라를 더 포함할 수도 있다. 또는, 구현예에 따라서는, 클릭 감지 장치(700)는 사용자에게 착용된 단일 깊이 카메라와는 별개의 컴퓨터로서 구현될 수도 있다.The click sensing device 700 according to another embodiment of the present invention may be a wearable device, and may further include a single depth camera. Alternatively, depending on the implementation, the click sensing device 700 may be implemented as a separate computer from a single depth camera worn by the user.

영상 획득부(710)는 사용자에게 착용된 단일 깊이 카메라에 의해 촬영되는 손의 제 1 시퀀스 영상을 획득한다. The image acquiring unit 710 acquires a first sequence image of a hand photographed by a single depth camera worn by the user.

영상 감지부(730)는 제 1 시퀀스 영상 내 복수의 프레임으로부터 제 1 시공간 특징 벡터를 획득한다. 구체적으로, 영상 감지부(730)는 제 1 시퀀스 영상 내 복수의 프레임 각각에서 손의 핑거 팁의 위치 및 핑거 조인트의 위치를 공간 특징 벡터로서 획득하고, 획득한 공간 특징 벡터를 일정 시간에 따라 나열하여 제 1 시공간 특징 벡터를 획득할 수 있다. The image sensing unit 730 acquires a first space time feature vector from a plurality of frames in the first sequence image. Specifically, the image sensing unit 730 acquires the position of the finger tip of the hand and the position of the finger joint as a spatial feature vector in each of a plurality of frames in the first sequence image, and arranges the acquired spatial feature vector To obtain the first space time feature vector.

제어부(750)는 클릭 동작의 발생 여부 및 클릭 위치에 대한 정보를 미리 알고 있는 손의 제 2 시퀀스 영상의 프레임으로부터 추출된 제 2 시공간 특징 벡터에 기초하여, 랜덤 포레스트(F_A)를 구성하고, 상기 제 1 시공간 특징 벡터를 랜덤 포레스트(F_A)에 입력하여, 제 1 시퀀스 영상에서 손의 클릭 동작의 발생 여부 및 클릭 위치를 판단한다.The control unit 750 constructs a random forest F _A based on the second space time feature vector extracted from the frame of the second sequence image of the hand that knows the click operation and the information about the click position in advance, The first space time feature vector is input to the random forest F _A to determine whether or not a hand click occurs in the first sequence image and a click position.

상기 영상 감지부(730)는 제 1 시퀀스 영상 내 복수의 프레임 각각에서 손의 핑거 팁의 위치를 감지하기 위해, 핑거 팁의 위치를 이미 알고 있는 손 이미지 영상의 모양 특징 벡터로부터 구성된 랜덤 포레스트(F_S)에 제 1 시퀀스 영상 내 복수의 프레임의 손의 모양 특징 벡터를 입력하여, 핑거 팁의 위치를 감지할 수도 있다.In order to detect the position of a finger tip of a hand in each of a plurality of frames in the first sequence image, the image sensing unit 730 may include a random forest F _S ) of a plurality of frames in the first sequence image to detect the position of the finger tip.

한편, 상술한 본 발명의 실시예들은 컴퓨터에서 실행될 수 있는 프로그램으로 작성가능하고, 작성된 프로그램은 매체에 저장될 수 있다.Meanwhile, the embodiments of the present invention described above can be written in a program that can be executed in a computer, and the created program can be stored in a medium.

상기 매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등)와 같은 저장매체를 포함할 수 있으나, 이에 한정되는 것은 아니다.The medium may include, but is not limited to, storage media such as magnetic storage media (e.g., ROM, floppy disks, hard disks, etc.), optical reading media (e.g., CD ROMs,

첨부된 도면을 참조하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims. You will understand. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive.

700: 클릭 감지 장치
710: 영상 획득부
730: 영상 감지부
750: 제어부700: Click detection device
710:
730:
750:

Claims

Acquiring a first sequence image of a hand photographed by a single depth camera worn by a user;
Obtaining a first spatio-temporal feature vector from a plurality of frames in the acquired first sequence image;
Constructing a random forest based on a second spatio-temporal feature vector extracted from a frame of a second sequence image of a hand that knows information about the occurrence of the click action and the click position; And
Inputting the first space time feature vector to the random forest to determine whether a hand click occurs in the first sequence image and a click position,
Wherein configuring the random forest comprises:
Randomly selecting a parameter of a split function of a split node in the tree of the random forest; And
And optimizing parameters of the partition function selected at the split node,
The partition function
A function for returning a position vector derived from one frame, a function for returning a velocity vector derived from two frames, and a function for returning an acceleration vector derived from three frames. Click detection method.

The method according to claim 1,
Wherein acquiring the first space-time feature vector comprises:
Sensing a position of a fingertip of a hand and a position of a finger joint in each of a plurality of frames in the first sequence image; And
And obtaining the first space time feature vector by arranging the position of the finger tip of the detected hand and the position of the finger joint in a frame order.

delete

The method according to claim 1,
The partition function corresponds to the following equation,

[Mathematical Expression]

In the above equation,

J is an index indicating a feature vector of a point of one of finger joints and finger tips of a specific finger, i is a spatial feature vector of an i-th finger in the n-th frame in the first sequence image V,

Is a function for returning an acceleration vector of a point corresponding to j in three frames, n is the number of frames in the first sequence image and p, q, r are the preceding offsets.

3. The method of claim 2,
Wherein sensing the position of the finger tip comprises:
Extracting contour points of a hand in a frame in the first sequence image;
Calculating a distance between a central point of the palm and the extracted contour points and calculating a distance variation with respect to the previous contour point for each contour point;
Constructing a shape feature vector for each of the contour points with a distance variation amount at each of a predetermined number of previous contour points, a distance variation amount at a current contour point, and a distance variation amount at each of a predetermined number of subsequent contour points;
Constructing a random forest based on a shape feature vector of an image of a hand that knows the location of the finger tip; And
And detecting a position of a finger tip in a frame in the first sequence image by inputting a shape feature vector of each of the contour points to a random forest configured based on the shape feature vector. .

The method according to claim 1,
Wherein the determining step comprises:
Comparing a probability of occurrence of a click operation of a hand determined in each of the plurality of trees of the random forest with a preset threshold value to determine whether or not the click operation of the hand occurs; And
And determining a click position of a hand having the highest vote among the click positions of the hand determined in each of the plurality of trees of the random forest as a final click position.

The method according to claim 1,
The single depth camera worn by the user,
And a head mounted display (HMD).

8. A computer program stored on a medium in combination with hardware for executing the click detection method of any one of claims 1, 2, 5 to 8.

An image acquiring unit acquiring a first sequence image of a hand photographed by a single depth camera worn by a user;
An image sensing unit for acquiring a first spatio-temporal feature vector from a plurality of frames in the acquired first sequence image; And
Constructs a random forest based on a second space-time feature vector extracted from a frame of a second sequence image of a hand that knows information about whether a click operation has occurred and a click position, And a controller for inputting the first sequence image to the random forest and determining whether a click operation of the hand is generated and a click position in the first sequence image,
Wherein the controller randomly selects a parameter of a split function of a split node in the tree of the random forest and optimizes a parameter of a partition function selected at the split node, A function for returning a position vector derived from a frame of the frame, a function for returning a velocity vector derived from two frames, and a function for returning an acceleration vector derived from three frames.