KR20220078420A

KR20220078420A - Hierarchical estimation method for hand poses using random decision forests, recording medium and device for performing the method

Info

Publication number: KR20220078420A
Application number: KR1020200177416A
Authority: KR
Inventors: 김계영; 김설호
Original assignee: 숭실대학교산학협력단
Priority date: 2020-12-03
Filing date: 2020-12-17
Publication date: 2022-06-10
Also published as: KR102540560B9; KR102540560B1

Abstract

랜덤 포레스트를 사용한 손 자세의 계층적 추정 방법은, 2차원 영상 좌표계에서 표현되는 깊이영상을 3차원 카메라 좌표계로 변환하여 전처리하는 단계; 입력데이터로 손 모델과 깊이영상 및 GT자세를 사용하여 학습된 손바닥 자세에 대한 랜덤 포레스트를 생성하는 단계; 학습된 손바닥 자세에 의존적으로 학습된 손가락 자세에 대한 랜덤 포레스트를 생성하는 단계; 학습된 손바닥 자세에 대한 랜덤 포레스트를 사용하여 전역 회전에 대한 정보를 포함하는 손바닥 자세를 추정하는 단계; 학습된 손바닥 자세에 의존적으로 학습된 손가락 자세에 대한 랜덤 포레스트를 사용하여 손가락 자세를 추정하는 단계; 및 추정된 손바닥 자세와 손가락 자세를 합하여 최종적인 손 자세를 추정하는 단계;를 포함한다. 이에 따라, 손 자세를 정확하고 빠르게 추정할 수 있다.A hierarchical method for estimating hand posture using a random forest includes the steps of converting a depth image expressed in a two-dimensional image coordinate system into a three-dimensional camera coordinate system and pre-processing; generating a random forest for the learned palm posture using a hand model, a depth image, and a GT posture as input data; generating a random forest for the learned finger posture depending on the learned palm posture; estimating a palm posture including information on global rotation using a random forest for the learned palm posture; estimating a finger posture using a random forest for the learned finger posture dependent on the learned palm posture; and estimating a final hand posture by adding the estimated palm posture and finger posture. Accordingly, it is possible to accurately and quickly estimate the hand posture.

Description

Hierarchical estimation method of hand posture using random forest, recording medium and apparatus for performing the same

본 발명은 랜덤 포레스트를 사용한 손 자세의 계층적 추정 방법, 이를 수행하기 위한 기록 매체 및 장치에 관한 것으로서, 더욱 상세하게는 전역 회전에 대한 정보를 포함하는 손바닥을 먼저 추정하고 그 결과에 기반하여 각 손가락을 개별적으로 추정하는 계층적인 절차로 랜덤 포레스트를 학습하고 손 자세를 추정하는 기술에 관한 것이다.The present invention relates to a hierarchical method for estimating hand posture using a random forest, a recording medium and an apparatus for performing the same, and more particularly, to first estimate a palm including information on global rotation, and based on the result, each It relates to a technique for learning a random forest and estimating hand posture with a hierarchical procedure for estimating fingers individually.

인간과 컴퓨터의 상호작용(Human Computer Interaction, HCI)을 위해 일반적으로 마우스와 키보드가 사용되어 왔다. 하지만, HCI 사용자 인터페이스에 대한 관심이 높아짐에 따라 별도의 장비를 사용하지 않으면서 자연스러운 입력방법을 제공하기 위한 손 자세와 제스처에 대한 연구의 중요성이 커지고 있다. For human computer interaction (HCI), a mouse and keyboard have been generally used. However, as interest in HCI user interfaces increases, the importance of research on hand postures and gestures to provide a natural input method without using additional equipment is increasing.

종래 기술들은 인간의 손을 HCI 응용 프로그램을 위한 가장 자연스러우면서 효과적이며, 중요한 상호작용 도구로 설명한다. 손은 여러 가지 복잡한 작업을 수행할 수 있는 신체의 부위이며, 키보드 없는 공중 입력, 가상현실, 증강현실에서의 객체 조작, 게임, 수화인식, 로봇조작, 원격수술등과 같은 응용프로그램에서 활용 가능성을 가지고 있는 것으로 평가되고 있다. The prior art describes the human hand as the most natural, effective, and important interactive tool for HCI applications. The hand is a part of the body that can perform a variety of complex tasks, and has potential to be used in applications such as keyboard-free air input, virtual reality, object manipulation in augmented reality, games, sign language recognition, robot manipulation, and remote surgery. evaluated as having

컴퓨터 비전에서 손의 자세와 제스처에 대한 연구는 지속적으로 다양한 접근법이 제안되어온 분야이다. 하지만, 손이 가지는 여러 문제점 때문에 어려운 분야로 분류되고 있다.The study of hand postures and gestures in computer vision is a field in which various approaches have been continuously proposed. However, it is classified as a difficult field due to various problems with the hands.

손 자세와 손 제스처와 관련된 연구는 지난 20년간 다양한 접근법이 제안된 연구 분야이다. 그 동안 대부분의 연구는 RGB 카메라를 사용하여 획득된 영상에서 수행되었다. RGB 영상은 3차원 정보를 포함하고 있지 않기 때문에 손 영역을 분할하는 것 자체가 중요한 문제였고, 손 자세를 추정하는 것은 운동학, 환경역학, 배경소음의 복잡성 등으로 인해 더욱 어려운 문제로 다루어졌다.Research related to hand posture and hand gestures is a research field in which various approaches have been proposed over the past 20 years. In the meantime, most studies have been conducted on images acquired using RGB cameras. Since RGB images do not contain 3D information, segmenting the hand region itself was an important problem, and estimating the hand posture was treated as a more difficult problem due to the complexity of kinematics, environmental dynamics, and background noise.

제스처 감지를 위한 기존의 센서는 크게 세 가지로 구분된다. 탑재 기반 센서는 손과 손가락의 움직임을 포착하기 위해 가속도계 또는 자이로 센서를 사용한다. 멀티터치스크린 센서는 모바일용 장치에서 주로 사용되며 사용자와 컴퓨터 사이의 거리를 제한하는 단점을 가진다. 시각 기반 센서는 카메라를 사용한다. Existing sensors for gesture detection are largely divided into three types. Onboard-based sensors use accelerometers or gyro sensors to capture hand and finger movements. The multi-touch screen sensor is mainly used in mobile devices and has the disadvantage of limiting the distance between the user and the computer. Vision-based sensors use cameras.

그러므로, 멀티터치스크린 센서보다 작동거리가 훨씬 멀고, 사용자와의 물리적 접촉이 없으므로 탑재 기반 센서보다 자연스럽다는 장점이 있지만 계산복잡성이 높은 문제점이 있다. 그리고 이러한 센서는 손가락의 상세한 모양을 감지하지 못하고 제스처만을 감지하는 한계를 가진다.Therefore, the operating distance is much longer than that of the multi-touch screen sensor, and there is no physical contact with the user, so it has the advantage of being more natural than the onboard-based sensor, but there is a problem of high computational complexity. In addition, these sensors do not detect the detailed shape of the finger and have a limit of detecting only a gesture.

글러브 기반 모션 캡처 자기 장치는 손의 움직임에 부자연스러움을 유발하고 사용자에 따라 보정을 해야 하는 문제와 장비구입에 따른 추가적인 비용부담이 발생한다. 또한, 깊이 카메라를 사용한 손 자세의 추정은 설치에 대한 비용을 증가시키고, 하드웨어적인 제약으로 인해 배포가 제한되는 문제가 있다. 또한, 마커기반 장비는 기본적으로 손의 움직임에 부자연스러움을 유발한다.The glove-based motion capture magnetic device causes unnatural hand movements, and there is a problem that needs to be corrected according to the user, and additional costs are incurred due to the purchase of equipment. In addition, estimation of hand posture using a depth camera increases the cost of installation, and there is a problem in that distribution is limited due to hardware limitations. In addition, marker-based equipment basically causes unnatural hand movements.

이와 같이, 기존에 뛰어난 성능을 보여주던 신체 자세 추정 방법을 손 자세 추정을 위해 적용하였지만 그 결과가 신체 자세 추정의 결과보다 정확하지 않은 문제점이 있다.As described above, although the body posture estimation method, which has shown excellent performance in the past, is applied for hand posture estimation, there is a problem in that the result is less accurate than the body posture estimation result.

손의 경우에는 높은 자유도를 가지고 있으며, 얼굴이나 신체보다 상대적으로 축의 회전에 따른 영상의 변화가 큰 편이기 때문이다. 특히, 요우 방향의 회전에서 큰 변화를 가지기 때문에 손의 고차원 자유도에 기인하는 이동 및 회전에 의한 모양변화와 자기폐색문제를 해결하기 위한 기술이 필요하다.This is because the hand has a high degree of freedom, and the image change according to the rotation of the axis is relatively larger than that of the face or body. In particular, since there is a large change in rotation in the yaw direction, a technique is needed to solve the problem of shape change and magnetic occlusion due to movement and rotation caused by the high degree of freedom of the hand.

KR 10-1994311 B1KR 10-1994311 B1 KR 10-2020-0107311 AKR 10-2020-0107311 A KR 10-1853276 B1KR 10-1853276 B1

Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake, “Real-Time Human Pose Recognition in Parts from Single Depth Images”, Computer Vision and Pattern Recognition, 2011. Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake, “Real-Time Human Pose Recognition in Parts from Single Depth Images”, Computer Vision and Pattern Recognition, 2011. Danhang Tang, Jonathan Taylor, Pushmeet Kohli, Cem Keskin, Tae-Kyun Kim, Jamie Shotton, “Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose”, IEEE International Conference on Computer Vision, pp. 3325-3333, Dec 07-13, 2015. Danhang Tang, Jonathan Taylor, Pushmeet Kohli, Cem Keskin, Tae-Kyun Kim, Jamie Shotton, “Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose”, IEEE International Conference on Computer Vision, pp. 3325-3333, Dec 07-13, 2015. Markus Oberweger Paul Wohlhart Vincent Lepetit, "Hands Deep in Deep Learning for Hand Pose Estimation", arXiv, 1502.06807v2, 2015. Markus Oberweger Paul Wohlhart Vincent Lepetit, “Hands Deep in Deep Learning for Hand Pose Estimation”, arXiv, 1502.06807v2, 2015.

이에, 본 발명의 기술적 과제는 이러한 점에서 착안된 것으로 본 발명의 목적은 랜덤 포레스트를 사용한 손 자세의 계층적 추정 방법을 제공하는 것이다.Accordingly, it is an object of the present invention to provide a hierarchical method for estimating hand posture using a random forest.

본 발명의 다른 목적은 상기 랜덤 포레스트를 사용한 손 자세의 계층적 추정 방법을 수행하기 위한 컴퓨터 프로그램이 기록된 기록 매체를 제공하는 것이다.Another object of the present invention is to provide a recording medium in which a computer program for performing the hierarchical method of estimating hand posture using the random forest is recorded.

본 발명의 또 다른 목적은 상기 랜덤 포레스트를 사용한 손 자세의 계층적 추정 방법을 수행하기 위한 장치를 제공하는 것이다.Another object of the present invention is to provide an apparatus for performing a hierarchical method of estimating hand posture using the random forest.

상기한 본 발명의 목적을 실현하기 위한 일 실시예에 따른 랜덤 포레스트를 사용한 손 자세의 계층적 추정 방법은, 2차원 영상 좌표계에서 표현되는 깊이영상을 3차원 카메라 좌표계로 변환하여 전처리하는 단계; 입력데이터로 손 모델과 깊이영상 및 GT자세를 사용하여 학습된 손바닥 자세에 대한 랜덤 포레스트를 생성하는 단계; 학습된 손바닥 자세에 의존적으로 학습된 손가락 자세에 대한 랜덤 포레스트를 생성하는 단계; 학습된 손바닥 자세에 대한 랜덤 포레스트를 사용하여 전역 회전에 대한 정보를 포함하는 손바닥 자세를 추정하는 단계; 학습된 손바닥 자세에 의존적으로 학습된 손가락 자세에 대한 랜덤 포레스트를 사용하여 손가락 자세를 추정하는 단계; 및 추정된 손바닥 자세와 손가락 자세를 합하여 최종적인 손 자세를 추정하는 단계;를 포함한다.A hierarchical method for estimating hand posture using a random forest according to an embodiment of the present invention for realizing the object of the present invention includes the steps of converting a depth image expressed in a two-dimensional image coordinate system into a three-dimensional camera coordinate system and pre-processing; generating a random forest for the learned palm posture using a hand model, a depth image, and a GT posture as input data; generating a random forest for the learned finger posture depending on the learned palm posture; estimating a palm posture including information on global rotation using a random forest for the learned palm posture; estimating a finger posture using a random forest for the learned finger posture dependent on the learned palm posture; and estimating a final hand posture by adding the estimated palm posture and finger posture.

본 발명의 실시예에서, 상기 학습된 손바닥 자세에 대한 랜덤 포레스트를 생성하는 단계; 상기 손가락 자세에 대한 랜덤 포레스트를 생성하는 단계; 상기 손바닥 자세를 추정하는 단계; 및 상기 손가락 자세를 추정하는 단계는, 각각 N(여기서, N은 자연수)만큼 반복되어 계층적으로 수행될 수 있다.In an embodiment of the present invention, generating a random forest for the learned palm posture; generating a random forest for the finger posture; estimating the palm posture; And the step of estimating the finger posture may be performed hierarchically by repeating each N (here, N is a natural number).

본 발명의 실시예에서, 상기 학습된 손바닥 자세에 대한 랜덤 포레스트를 생성하는 단계 및 상기 손가락 자세에 대한 랜덤 포레스트를 생성하는 단계는, 각각, 변환행렬을 사용하여 학습을 위해 손 모델과 GT자세를 깊이영상과 함께 정렬하는 모델정렬 단계; 구형 3차원 오프셋 특징을 추출하고 정렬된 손 모델과 GT자세의 잔차의 분산을 최소화하는 특징을 선택하는 형태로 랜덤 포레스트를 학습하여 출력하는 랜덤 포레스트 학습 단계; 학습된 N번째 랜덤 포레스트를 사용하여 구형 3차원 오프셋 특징을 추출하고 손 모델의 잔차를 추정하는 자세추정 단계; 및 추정된 잔차를 손 모델에 반영하여 손 모델을 변형하는 모델갱신 단계;를 포함할 수 있다.In an embodiment of the present invention, the steps of generating a random forest for the learned palm posture and generating a random forest for the finger posture include, respectively, a hand model and a GT posture for learning using a transformation matrix. Model alignment step of aligning with the depth image; A random forest learning step of extracting a spherical three-dimensional offset feature and learning and outputting a random forest in the form of selecting a feature that minimizes the variance of the residual of the aligned hand model and GT posture; a posture estimation step of extracting a spherical three-dimensional offset feature using the learned N-th random forest and estimating the residual of the hand model; and a model update step of transforming the hand model by reflecting the estimated residual to the hand model.

본 발명의 실시예에서, 상기 추정된 잔차를 손 모델에 반영하여 손 모델을 변형하는 모델갱신 단계는, N+1번 째 학습을 위해 변환행렬을 갱신하는 단계;를 더 포함할 수 있다.In an embodiment of the present invention, the updating of the model for transforming the hand model by reflecting the estimated residual to the hand model may further include updating the transformation matrix for the N+1-th learning.

본 발명의 실시예에서, 상기 손바닥 자세를 추정하는 단계 및 상기 손가락 자세를 추정하는 단계는, 각각, 변환행렬을 사용하여 학습을 위해 손 모델과 깊이영상을 함께 정렬하는 모델정렬 단계; 학습된 N번째 랜덤 포레스트를 사용하여 구형 3차원 오프셋 특징을 추출하고 손 모델의 잔차를 추정하는 자세추정 단계; 및 추정된 잔차를 손 모델에 반영하여 손 모델을 변형하는 모델갱신 단계;를 포함할 수 있다.In an embodiment of the present invention, the step of estimating the palm posture and estimating the finger posture may include, respectively, a model alignment step of aligning a hand model and a depth image for learning using a transformation matrix; a posture estimation step of extracting a spherical three-dimensional offset feature using the learned N-th random forest and estimating the residual of the hand model; and a model update step of transforming the hand model by reflecting the estimated residual to the hand model.

본 발명의 실시예에서, 상기 자세추정 단계는, 학습된 랜덤 포레스트를 사용하여 현재 정렬되어 있는 손 자세와 다음에 정렬되어야 하는 손 자세 사이의 잔차를 추정할 수 있다.In an embodiment of the present invention, the posture estimation step may estimate a residual between the currently aligned hand posture and the hand posture to be aligned next using the learned random forest.

본 발명의 실시예에서, 상기 자세추정 단계는, 랜덤 포레스트를 구성하는 트리 노드들은 분기를 통해 리프노드에 도달하는 단계; 분기를 위해 노드에 할당되어 있는 3차원 오프셋 특징을 사용하여 입력 깊이영상에서 특징 값을 계산하는 단계; 노드에 할당된 임계값과의 대소비교를 통해 자식노드로의 분기과정을 반복적으로 수행하는 단계; 및 분기를 통해 리프노드에 도달하면 잔차 값을 추정하는 단계;를 포함할 수 있다.In an embodiment of the present invention, the posture estimating step includes: tree nodes constituting the random forest reaching leaf nodes through branching; calculating a feature value from an input depth image using a three-dimensional offset feature assigned to a node for branching; repeatedly performing a branching process to a child node through comparison with a threshold value assigned to the node; and estimating a residual value when a leaf node is reached through branching.

본 발명의 실시예에서, 상기 자세추정 단계는, 랜덤 포레스트를 구성하는 모든 트리에 대해 수행하는 단계; 및 모든 도달된 리프노드가 추정한 잔차 값을 평균하여 최종적으로 추정된 잔차로 사용하는 단계;를 더 포함할 수 있다.In an embodiment of the present invention, the posture estimating step is performed on all trees constituting the random forest; and averaging the residual values estimated by all reached leaf nodes and using them as the finally estimated residual values.

본 발명의 실시예에서, 변환행렬은 손 모델을 포인트 클라우드와 정렬하고, 구형 3차원 오프셋 특징을 카메라 좌표계로 정렬하고, 자세를 추정하는 단계에서 추정된 잔차를 카메라 좌표계로 정렬하여 모델갱신 단계에서 손 모델을 변형할 수 있다.In an embodiment of the present invention, the transformation matrix aligns the hand model with the point cloud, aligns the spherical three-dimensional offset feature with the camera coordinate system, and aligns the residual estimated in the step of estimating the posture with the camera coordinate system in the model update step You can transform the hand model.

상기한 본 발명의 다른 목적을 실현하기 위한 일 실시예에 따른 컴퓨터로 판독 가능한 저장 매체에는, 상기 랜덤 포레스트를 사용한 손 자세의 계층적 추정 방법을 수행하기 위한 컴퓨터 프로그램이 기록되어 있다. In a computer-readable storage medium according to an embodiment for realizing another object of the present invention, a computer program for performing a hierarchical method of estimating a hand posture using the random forest is recorded.

상기한 본 발명의 또 다른 목적을 실현하기 위한 일 실시예에 따른 랜덤 포레스트를 사용한 손 자세의 계층적 추정 장치는, 2차원 영상 좌표계에서 표현되는 깊이영상을 3차원 카메라 좌표계로 변환하는 전처리부; 입력데이터로 손 모델과 깊이영상 및 GT자세를 사용하여 학습된 손바닥 자세에 대한 랜덤 포레스트를 출력하는 손바닥 자세 학습부; 학습된 손바닥 자세에 의존적으로 학습된 손가락 자세에 대한 랜덤 포레스트를 출력하는 손가락 자세 학습부; 학습된 손바닥 자세에 대한 랜덤 포레스트를 사용하여 전역 회전에 대한 정보를 포함하는 손바닥 자세를 추정하는 손바닥 자세 추정부; 학습된 손바닥 자세에 의존적으로 학습된 손가락 자세에 대한 랜덤 포레스트를 사용하여 손가락 자세를 추정하는 손가락 자세 추정부; 및 추정된 손바닥 자세와 손가락 자세를 합하여 최종적인 손 자세를 추정하는 손자세 표현부;를 포함한다.A hierarchical apparatus for estimating hand posture using a random forest according to an embodiment for realizing another object of the present invention includes: a preprocessing unit for converting a depth image expressed in a two-dimensional image coordinate system into a three-dimensional camera coordinate system; a palm posture learning unit that outputs a random forest for palm postures learned using a hand model, a depth image, and a GT posture as input data; a finger posture learning unit for outputting a random forest for the learned finger posture depending on the learned palm posture; a palm posture estimator for estimating a palm posture including information on global rotation using a random forest for the learned palm posture; a finger posture estimator for estimating a finger posture using a random forest for the learned finger posture dependent on the learned palm posture; and a hand posture expression unit for estimating a final hand posture by adding the estimated palm posture and finger posture.

본 발명의 실시예에서, 상기 손바닥 자세 학습부; 상기 손가락 자세 학습부; 상기 손바닥 자세 추정부; 및 상기 손가락 자세 추정부의 학습 및 추정은, 각각 N(여기서, N은 자연수)만큼 반복되어 계층적으로 수행될 수 있다.In an embodiment of the present invention, the palm posture learning unit; the finger posture learning unit; the palm posture estimation unit; And, learning and estimation of the finger posture estimator may be performed hierarchically by repeating each N (here, N is a natural number).

본 발명의 실시예에서, 상기 손바닥 자세 학습부 및 상기 손가락 자세 학습부는, 각각, 변환행렬을 사용하여 학습을 위해 손 모델과 GT자세를 깊이영상과 함께 정렬하는 모델정렬부; 구형 3차원 오프셋 특징을 추출하고 정렬된 손 모델과 GT자세의 잔차의 분산을 최소화하는 특징을 선택하는 형태로 랜덤 포레스트를 학습하여 출력하는 랜덤 포레스트 학습부; 학습된 N번째 랜덤 포레스트를 사용하여 구형 3차원 오프셋 특징을 추출하고 손 모델의 잔차를 추정하는 자세추정부; 및 추정된 잔차를 손 모델에 반영하여 손 모델을 변형하고, N+1번 째 학습을 위해 변환행렬을 갱신하는 모델갱신부;를 포함할 수 있다.In an embodiment of the present invention, the palm posture learning unit and the finger posture learning unit include: a model aligning unit for aligning the hand model and the GT posture with the depth image for learning using a transformation matrix, respectively; a random forest learning unit that extracts a spherical three-dimensional offset feature and selects a feature that minimizes the variance of the aligned hand model and GT posture residual by learning and outputting a random forest; a posture estimator that extracts a spherical three-dimensional offset feature using the learned N-th random forest and estimates the residual of the hand model; and a model updater that transforms the hand model by reflecting the estimated residual to the hand model, and updates the transformation matrix for the N+1th learning.

본 발명의 실시예에서, 상기 손바닥 자세 추정부 및 상기 손가락 자세 추정부는, 각각, 변환행렬을 사용하여 학습을 위해 손 모델과 깊이영상을 함께 정렬하는 모델정렬부; 학습된 N번째 랜덤 포레스트를 사용하여 구형 3차원 오프셋 특징을 추출하고 손 모델의 잔차를 추정하는 자세추정부; 및 추정된 잔차를 손 모델에 반영하여 손 모델을 변형하는 모델갱신부;를 포함할 수 있다.In an embodiment of the present invention, the palm posture estimator and the finger posture estimator may include a model aligner for aligning the hand model and the depth image together for learning using a transformation matrix, respectively; a posture estimator that extracts a spherical three-dimensional offset feature using the learned N-th random forest and estimates the residual of the hand model; and a model updater configured to transform the hand model by reflecting the estimated residual to the hand model.

본 발명의 실시예에서, 상기 자세추정부는, 랜덤 포레스트를 구성하는 트리 노드들은 분기를 통해 리프노드에 도달하고, 분기를 위해 노드에 할당되어 있는 3차원 오프셋 특징을 사용하여 입력 깊이영상에서 특징 값을 계산하고, 노드에 할당된 임계값과의 대소비교를 통해 자식노드로의 분기과정을 반복적으로 수행하여, 분기를 통해 리프노드에 도달하면 잔차 값을 추정하는 과정을, 랜덤 포레스트를 구성하는 모든 트리에 대해 수행하여, 모든 도달된 리프노드가 추정한 잔차 값을 평균하여 최종적으로 추정된 잔차로 사용할 수 있다.In an embodiment of the present invention, the posture estimator, the tree nodes constituting the random forest reach the leaf node through branching, and use the 3D offset feature assigned to the node for branching to the feature value in the input depth image. The process of estimating the residual value when reaching a leaf node through branching by repeatedly performing the branching process to the child node through comparison with the threshold assigned to the node By performing it on the tree, the residual values estimated by all reached leaf nodes can be averaged and used as the final estimated residual.

이와 같은 랜덤 포레스트를 사용한 손 자세의 계층적 추정 방법에 따르면, 역변환행렬을 사용하여 손바닥과 손가락을 개별적으로 다루는 계층적인 추정방법을 통해 손 자세의 고차원 자유도, 모양 변화, 폐색 문제를 해결할 수 있다. 또한, 단순한 특징을 사용하는 랜덤 포레스트를 통해 실시간조건 문제를 해결하여, 손 자세를 정확하고 빠르게 추정할 수 있다.According to the hierarchical estimation method of hand posture using such a random forest, the high-order degree of freedom, shape change, and occlusion problems of hand posture can be solved through a hierarchical estimation method that handles palms and fingers individually using an inverse transformation matrix. . In addition, it is possible to accurately and quickly estimate the hand posture by solving the real-time condition problem through the random forest using simple features.

도 1은 본 발명의 일 실시예에 따른 랜덤 포레스트를 사용한 손 자세의 계층적 추정 장치의 블록도이다.
도 2는 도 1의 계층적인 손 자세 추정의 예들을 보여주는 도면이다.
도 3은 영상 좌표계 데이터와 카메라 좌표계 데이터를 보여주는 도면이다.
도 4는 다양한 경계 볼륨의 형태를 보여주는 도면이다.
도 5는 카메라 좌표계상의 포인트 클라우드에 적용된 방향 경계 상자로부터 얻어진 3개의 벡터를 시각화한 결과를 보여주는 도면이다.
도 6은 정렬 전의 손바닥 모델을 보여주는 도면이다.
도 7은 변환된 손바닥 모델과 정렬된 손바닥 모델을 보여주는 도면이다.
도 8은 역변환행렬을 사용하여 변환된 손가락 모델을 보여주는 도면이다.
도 9는 역변환행렬을 사용하여 정렬된 손가락 모델을 보여주는 도면이다.
도 10은 신체의 각 부분을 찾기 위해 사용된 2차원 오프셋 특징을 보여주는 도면이다.
도 11은 구형 3차원 오프셋 특징을 보여주는 도면이다.
도 12는 랜덤 포레스트를 설명하기 위한 도면이다.
도 13은 추정된 손바닥 자세를 보여주는 도면이다.
도 14는 추정된 손가락 자세를 보여주는 도면이다.
도 15는 본 발명의 일 실시예에 따른 손 자세의 계층적 학습 방법의 흐름도이다.
도 16은 본 발명의 일 실시예에 따른 손 자세의 추정 방법의 흐름도이다.
도 17은 본 발명의 계층적 추정의 반복 횟수에 대한 오차를 나타내는 그래프이다.
도 18은 본 발명과 종래기술의 평균 관절 오차에 대한 정량적 평가 결과를 보여주는 그래프이다.1 is a block diagram of a hierarchical apparatus for estimating a hand posture using a random forest according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating examples of hierarchical hand posture estimation of FIG. 1 .
3 is a view showing image coordinate system data and camera coordinate system data.
4 is a view showing the shape of various boundary volumes.
5 is a view showing the results of visualization of three vectors obtained from the direction bounding box applied to the point cloud on the camera coordinate system.
6 is a view showing a palm model before alignment.
7 is a diagram showing the converted palm model and the aligned palm model.
8 is a diagram showing a finger model transformed using an inverse transformation matrix.
9 is a diagram showing a finger model aligned using an inverse transformation matrix.
10 is a diagram showing a two-dimensional offset feature used to find each part of the body.
11 is a diagram showing a spherical three-dimensional offset feature.
12 is a diagram for explaining a random forest.
13 is a diagram illustrating an estimated palm posture.
14 is a diagram illustrating an estimated finger posture.
15 is a flowchart of a hierarchical method of learning a hand posture according to an embodiment of the present invention.
16 is a flowchart of a method for estimating a hand posture according to an embodiment of the present invention.
17 is a graph showing the error with respect to the number of iterations of the hierarchical estimation of the present invention.
18 is a graph showing the results of quantitative evaluation of the average joint error of the present invention and the prior art.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0012] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0010] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0010] Reference is made to the accompanying drawings, which show by way of illustration specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the present invention. It should be understood that the various embodiments of the present invention are different but need not be mutually exclusive. For example, certain shapes, structures, and characteristics described herein with respect to one embodiment may be implemented in other embodiments without departing from the spirit and scope of the invention. In addition, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the present invention. Accordingly, the detailed description set forth below is not intended to be taken in a limiting sense, and the scope of the present invention, if properly described, is limited only by the appended claims, along with all scope equivalents as those claimed. Like reference numerals in the drawings refer to the same or similar functions throughout the various aspects.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다. Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 랜덤 포레스트를 사용한 손 자세의 계층적 추정 장치의 블록도이다.1 is a block diagram of a hierarchical apparatus for estimating a hand posture using a random forest according to an embodiment of the present invention.

본 발명에 따른 랜덤 포레스트를 사용한 손 자세의 계층적 추정 장치(10, 이하 장치)는 손의 고차원 자유도에 기인하는 이동 및 회전에 의한 모양변화와 자기폐색 문제를 해결하기 위해 계층적인 학습과 추정을 제시한다. The hierarchical apparatus for estimating hand posture using a random forest according to the present invention (10, hereinafter apparatus) performs hierarchical learning and estimation in order to solve the problem of shape change and self-occlusion due to movement and rotation caused by the high-dimensional freedom of the hand. present.

도 1을 참조하면, 본 발명에 따른 장치(10)는 전처리부(100), 손바닥 자세 학습부(200), 손가락 자세 학습부(300), 손바닥 자세 추정부(400), 손가락 자세 추정부(500) 및 손자세 표현부(700)를 포함한다.Referring to FIG. 1 , the device 10 according to the present invention includes a preprocessor 100 , a palm posture learning unit 200 , a finger posture learning unit 300 , a palm posture estimating unit 400 , and a finger posture estimating unit ( 500) and a hand posture expression unit 700 .

본 발명의 장치(10)는 랜덤 포레스트를 사용한 손 자세의 계층적 추정을 수행하기 위한 소프트웨어(애플리케이션)가 설치되어 실행될 수 있으며, 전처리부(100), 손바닥 자세 학습부(200), 손가락 자세 학습부(300), 손바닥 자세 추정부(400), 손가락 자세 추정부(500) 및 손자세 표현부(700)의 구성은 장치(10)에서 실행되는 상기 랜덤 포레스트를 사용한 손 자세의 계층적 추정을 수행하기 위한 소프트웨어에 의해 제어될 수 있다. In the device 10 of the present invention, software (application) for performing hierarchical estimation of hand posture using random forest may be installed and executed, and the preprocessor 100, palm posture learning unit 200, and finger posture learning The configuration of the unit 300 , the palm posture estimator 400 , the finger posture estimator 500 , and the hand posture expression unit 700 allows hierarchical estimation of hand posture using the random forest executed in the device 10 . It can be controlled by software to perform.

본 발명의 장치(10)는 별도의 단말이거나 또는 단말의 일부 모듈일 수 있다. 또한, 전처리부(100), 손바닥 자세 학습부(200), 손가락 자세 학습부(300), 손바닥 자세 추정부(400), 손가락 자세 추정부(500) 및 손자세 표현부(700)의 구성은 통합 모듈로 형성되거나, 하나 이상의 모듈로 이루어 질 수 있다. 그러나, 이와 반대로 각 구성은 별도의 모듈로 이루어질 수도 있다.The device 10 of the present invention may be a separate terminal or a part of a module of the terminal. In addition, the configuration of the preprocessor 100, the palm posture learning unit 200, the finger posture learning unit 300, the palm posture estimating unit 400, the finger posture estimating unit 500 and the hand posture expression unit 700 is It may be formed as an integrated module, or may consist of one or more modules. However, on the contrary, each configuration may be formed of a separate module.

본 발명의 장치(10)는 이동성을 갖거나 고정될 수 있다. 본 발명의 장치(10)는, 서버(server) 또는 엔진(engine) 형태일 수 있으며, 디바이스(device), 기구(apparatus), 단말(terminal), UE(user equipment), MS(mobile station), 무선기기(wireless device), 휴대기기(handheld device) 등 다른 용어로 불릴 수 있다. The device 10 of the present invention may be mobile or stationary. The apparatus 10 of the present invention may be in the form of a server or an engine, and may include a device, an application, a terminal, a user equipment (UE), a mobile station (MS), It may be called by other terms such as a wireless device, a handheld device, and the like.

본 발명의 장치(10)는 운영체제(Operation System; OS), 즉 시스템을 기반으로 다양한 소프트웨어를 실행하거나 제작할 수 있다. 상기 운영체제는 소프트웨어가 장치의 하드웨어를 사용할 수 있도록 하기 위한 시스템 프로그램으로서, 안드로이드 OS, iOS, 윈도우 모바일 OS, 바다 OS, 심비안 OS, 블랙베리 OS 등 모바일 컴퓨터 운영체제 및 윈도우 계열, 리눅스 계열, 유닉스 계열, MAC, AIX, HP-UX 등 컴퓨터 운영체제를 모두 포함할 수 있다.The device 10 of the present invention may execute or manufacture various software based on an operating system (OS), that is, the system. The operating system is a system program for software to use the hardware of the device, and is a mobile computer operating system such as Android OS, iOS, Windows Mobile OS, Bada OS, Symbian OS, Blackberry OS and Windows series, Linux series, Unix series, It can include all computer operating systems such as MAC, AIX, and HP-UX.

본 발명에서 계층적인 학습과 추정방법은 손의 고차원 자유도에 기인하는 이동 및 회전에 의한 모양변화와 자기폐색문제를 해결하기 위해 제안된다. 이 문제는 전역적인 손의 위치와 회전, 그리고 손가락의 개별적인 움직임 때문에 발생한다. In the present invention, the hierarchical learning and estimation method is proposed to solve the problem of shape change and magnetic occlusion due to movement and rotation caused by the high degree of freedom of the hand. This problem arises from the global hand position and rotation, and the individual movements of the fingers.

일반적으로 손의 전역 위치와 회전은 손목의 전역 위치와 회전으로 표현된다. 손가락의 자세는 손목의 전역 위치와 회전에 의존적이며, 다섯 개의 손가락은 각각 개별적인 위치와 움직임을 가진다. In general, the global position and rotation of the hand is expressed as the global position and rotation of the wrist. The posture of the fingers depends on the global position and rotation of the wrist, and each of the five fingers has an individual position and movement.

본 발명에서 21개의 전체 손 관절은 6개의 손바닥 관절과 15개의 손가락 관절로 구분된다. 그리고 손의 전역 회전 정보를 포함하는 손목 관절은 손바닥 관절에 포함된다. 손바닥은 손가락보다 단순하며 모양변화가 작기 때문에 추정하기 쉬울 수 있다. In the present invention, 21 total hand joints are divided into 6 palmar joints and 15 finger joints. And the wrist joint including the global rotation information of the hand is included in the palm joint. The palm is simpler than the fingers and may be easier to guess because the shape change is small.

따라서, 본 발명에서는 전역 회전에 대한 정보를 포함하는 손바닥을 먼저 추정하고, 그 결과에 기반하여 각 손가락을 개별적으로 추정하는 계층적인 절차로 랜덤 포레스트를 학습하고, 손 자세를 추정하는 방법을 제안한다. 손바닥 자세와 손가락 자세의 학습과 추정절차는 사용자가 지정한 반복 횟수 N 만큼 반복되어 계층적으로 수행될 수 있다. Therefore, the present invention proposes a method of estimating a palm including information on global rotation first, learning a random forest with a hierarchical procedure of estimating each finger individually based on the result, and estimating a hand posture . The learning and estimation procedures for palm posture and finger posture can be performed hierarchically by repeating the number of repetitions N designated by the user.

또한, 추정된 손 자세를 표현하기 위해 손 모델을 사용한다. 손 모델은 변환행렬을 사용하여 깊이영상과 함께 정렬되고 랜덤 포레스트를 사용한 추정과정을 거쳐 변형되어 최종적인 손 자세를 표현한다.In addition, a hand model is used to represent the estimated hand posture. The hand model is aligned with the depth image using a transformation matrix and transformed through an estimation process using a random forest to express the final hand posture.

랜덤 포레스트는 실시간조건 문제를 해결하기 위해 사용된다. 또한, 랜덤 포레스트의 학습과 추정을 위해 구형 3차원 오프셋 특징이 제안된다. 랜덤 포레스트는 다수의 트리로 구성되는 트리의 앙상블이다. 각 트리는 여러 개의 분할노드와 리프노드로 구성된다. Random forest is used to solve the real-time condition problem. In addition, a spherical three-dimensional offset feature is proposed for training and estimation of random forests. A random forest is an ensemble of trees composed of multiple trees. Each tree consists of several split nodes and leaf nodes.

트리에 입력되는 정보는 분할노드의 분기를 통해 리프노드에 도달 하게 되고 리프노드는 손의 자세에 대한 정보를 추정한다. 분할노드의 분기를 위해 수행되는 연산은 특징값과 임계값의 대소를 비교하는 단순한 연산이며 이 과정이 리프노드에 도달할 때까지 반복된다. The information input to the tree reaches the leaf node through the branching of the split node, and the leaf node estimates the information about the hand posture. The operation performed for the branching of the split node is a simple operation that compares the magnitude of the feature value and the threshold value, and this process is repeated until a leaf node is reached.

따라서, 손 자세를 매우 빠르게 추정할 수 있는 장점이 있다. 하지만, 학습 시에는 데이터의 양이 증가함에 따라 소비시간이 기하급수적으로 증가하는 문제점을 가진다. 그러나, 학습과정은 오프라인에서 수행되기 때문에 추정속도에 영향을 끼치지 않는다. 구형 3차원 오프셋 특징은 랜덤 포레스트의 학습과 추정을 위한 특징으로 사용된다. 일반적으로 2차원 평면영상에서 추출되던 특징을 3차원 공간으로 확장하는 방법에 대해 제안한다. Accordingly, there is an advantage in that the hand posture can be estimated very quickly. However, there is a problem in that the consumption time increases exponentially as the amount of data increases during learning. However, since the learning process is performed offline, it does not affect the estimation speed. The spherical three-dimensional offset feature is used as a feature for training and estimation of the random forest. In general, we propose a method for extending features extracted from 2D flat images into 3D space.

학습부(200, 300)는 랜덤 포레스트를 계층적으로 학습하는 부분이다. 학습된 랜덤 포레스트는 추정부(400, 500)에서 손 자세의 추정을 위해 사용된다. 학습부(200, 300)의 입력데이터로 손 모델과 깊이영상, GT자세가 사용되고, N개의 학습된 랜덤 포레스트가 출력된다.The learning units 200 and 300 hierarchically learn the random forest. The learned random forest is used for estimating the hand posture in the estimators 400 and 500 . A hand model, a depth image, and a GT posture are used as input data of the learning units 200 and 300 , and N learned random forests are output.

전처리 단계를 제외한 모델정렬, 랜덤 포레스트 학습, 자세추정, 모델갱신 단계는 손바닥과 손가락에 대해 별도로 수행된다. 또한, 손가락에 대한 계층적 학습과정은 손바닥에 대한 학습결과에 의존적이게 된다. Except for the preprocessing step, the model alignment, random forest learning, posture estimation, and model update steps are performed separately for the palm and fingers. In addition, the hierarchical learning process for the fingers becomes dependent on the learning results for the palm.

학습부(200, 300)의 입력데이터는 학습을 위해 깊이영상 내에 존재하는 손의 자세에 대해 알려진 관절 위치를 나타내는 그라운드 트루스(Ground Truth, GT)자세를 포함한다. The input data of the learning units 200 and 300 includes a ground truth (GT) posture indicating a known joint position with respect to a posture of a hand existing in a depth image for learning.

추정부(400, 500)는 GT자세를 입력 데이터로 사용하지 않는다. 학습부(200, 300)에서 사용되는 깊이영상과 GT자세는 합성데이터 생성기로부터 만들어진다. 첫 번째, 전처리부(100)는 깊이영상을 3차원 카메라 좌표계상에서 표현되는 포인트 클라우드로 변환한다. 본 발명에서 제안된 계층적 학습 절차는 대부분 카메라 좌표계에서 수행된다. The estimation units 400 and 500 do not use the GT posture as input data. The depth image and GT posture used in the learning units 200 and 300 are created from the synthesized data generator. First, the preprocessor 100 converts a depth image into a point cloud expressed in a three-dimensional camera coordinate system. Most of the hierarchical learning procedure proposed in the present invention is performed in the camera coordinate system.

두 번째, 모델정렬은 학습을 위해 손 모델과 GT자세를 깊이영상과 함께 정렬한다. 정렬을 위해 변환행렬이 사용되며, 손 영역의 방향 경계 상자로부터 생성된다. Second, model alignment aligns the hand model and GT posture with the depth image for learning. A transformation matrix is used for alignment and is generated from the direction bounding box of the hand region.

세 번째, 포레스트 학습은 구형 3차원 오프셋 특징을 추출하고 정렬된 손 모델과 GT자세의 잔차의 분산을 최소화하는 특징을 선택하는 형태로 랜덤 포레스트를 학습한다. 학습은 사용자가 지정한 반복횟수 N만큼 수행될 수 있으며 N개의 학습된 랜덤 포레스트를 출력한다. Third, forest learning learns a random forest in the form of extracting a spherical three-dimensional offset feature and selecting a feature that minimizes the variance of the residual of the aligned hand model and GT posture. Learning can be performed as many times as N iterations specified by the user, and N learned random forests are output.

네 번째, 자세추정은 학습된 N번째 랜덤 포레스트를 사용하여 구형 3차원 오프셋 특징을 추출하고 손 모델의 잔차를 추정한다. 위와 마찬가지로 지정된 횟수만큼 N번 반복 수행된다. Fourth, posture estimation uses the learned N-th random forest to extract spherical three-dimensional offset features and estimate the residual of the hand model. As above, it is repeated N times as many times as specified.

다섯 번째, 모델갱신은 자세추정에서 추정된 잔차를 손 모델에 반영하여 손 모델을 변형한다. 변형된 손 모델은 추정된 손 자세를 표현한다. 그리고 N+1번 째 학습을 위해 변환행렬을 갱신한다. Fifth, the model update transforms the hand model by reflecting the residual estimated from the posture estimation in the hand model. The modified hand model represents the estimated hand posture. And the transformation matrix is updated for the N+1th learning.

본 발명에서 변환행렬은 학습부와 추정부 전체에 걸쳐 사용되며 크게 세 가지 역할을 수행한다. 첫 번째는 손 모델을 포인트 클라우드와 정렬하는 역할이다. 두 번째는 구형 3차원 오프셋 특징을 카메라 좌표계로 정렬하는 역할이다. 세 번째는 자세 추정단계에서 추정된 잔차를 카메라 좌표계로 정렬하여 모델갱신 단계에서 손 모델을 변형 하는 역할이다. In the present invention, the transformation matrix is used throughout the learning unit and the estimation unit, and performs three major roles. The first is to align the hand model with the point cloud. The second is to align the spherical 3D offset features with the camera coordinate system. The third is to transform the hand model in the model update step by aligning the residuals estimated in the posture estimation step with the camera coordinate system.

추정부(400, 500)는 학습된 랜덤 포레스트를 사용하여 손 자세를 계층적으로 추정한다. 추정부의 입력데이터로 손 모델과 깊이영상이 사용되고, 추정된 손 자세가 출력된다.The estimation units 400 and 500 hierarchically estimate the hand posture using the learned random forest. The hand model and depth image are used as input data of the estimation unit, and the estimated hand posture is output.

학습부(200, 300)와 동일하게 전처리 단계를 제외한 모든 단계는 손바닥과 손가락에 대해 별도로 수행된다. 손가락 자세의 추정결과는 손바닥 자세의 추정결과에 의존적이며 두 종류의 추정된 자세를 더하여 최종적인 추정된 손 자세가 출력된다. 추정부(400, 500)의 모든 단계는 학습부(200, 300)의 과정과 동일하게 처리되며 학습부의 랜덤 포레스트 학습단계는 제외된다.In the same manner as in the learning units 200 and 300 , all steps except the pre-processing step are performed separately for the palm and fingers. The estimation result of the finger posture is dependent on the estimation result of the palm posture, and the final estimated hand posture is output by adding two types of estimated postures. All steps of the estimation units 400 and 500 are processed the same as those of the learning units 200 and 300, and the random forest learning step of the learning unit is excluded.

도 2는 시각화를 위해 임의로 그려진 계층적 손 자세 추정의 예를 나타낸다. 손바닥 자세 추정과 손가락 자세 추정은 각각 3번씩 반복되었다. 먼저 손바닥 모델을 추정하여 갱신하고, 추정된 손바닥 자세에 의존적으로 손가락 모델을 추정하고 갱신한다. Figure 2 shows an example of hierarchical hand posture estimation drawn arbitrarily for visualization. The palm posture estimation and the finger posture estimation were each repeated 3 times. First, the palm model is estimated and updated, and the finger model is estimated and updated depending on the estimated palm posture.

학습부(200, 300)는 랜덤 포레스트의 학습을 위한 입력데이터로 손 모델과 함께 합성 깊이영상과 합성 GT자세를 사용한다. 추정부(400, 500)는 학습된 랜덤 포레스트를 사용하여 손 자세를 추정하기 위한 입력데이터로 손 모델과 깊이 카메라로부터 획득된 깊이영상을 사용한다. The learning units 200 and 300 use the synthesized depth image and the synthesized GT posture together with the hand model as input data for learning the random forest. The estimation units 400 and 500 use the hand model and the depth image obtained from the depth camera as input data for estimating the hand posture using the learned random forest.

전처리부(100)는 2차원 영상 좌표계에서 표현되는 깊이영상을 3차원 카메라 좌표계로 변환하여 전처리한다.The preprocessor 100 pre-processes the depth image expressed in the two-dimensional image coordinate system by converting it into a three-dimensional camera coordinate system.

전처리는 2차원 영상 좌표계에서 표현되는 깊이영상을 3차원 카메라 좌표계로 변환하는 과정이다. 영상 평면에 투영된 깊이영상은 카메라의 왜곡정보를 포함한다. Preprocessing is a process of converting a depth image expressed in a two-dimensional image coordinate system into a three-dimensional camera coordinate system. The depth image projected on the image plane includes distortion information of the camera.

따라서, 형태를 가지는 정보를 다루는 경우, 왜곡 정보를 포함하는 영상 좌표계 상에서 처리하는 것보다 왜곡 정보가 포함되지 않는 카메라 좌표계에서 처리하는 것이 유리하다고 알려져 있다. 또한, 깊이영상은 거리에 해당하는 데이터를 화소 값으로 가지고 있기 때문에 2차원의 영상 좌표계보다 3차원의 카메라 좌표계에서 표현하는 것이 유리하다. Therefore, when dealing with information having a shape, it is known that processing in a camera coordinate system that does not include distortion information is advantageous rather than processing in an image coordinate system that includes distortion information. In addition, since the depth image has data corresponding to distance as pixel values, it is advantageous to express it in a three-dimensional camera coordinate system rather than a two-dimensional image coordinate system.

따라서, 전처리를 통해 깊이영상은 영상 좌표계에서 카메라 좌표계로 변환된다. 변환된 화소 집합은 포인트 클라우드라고 표기한다. 학습부(200, 300)와 추정부(400, 500)에서 이후로 진행되는 대부분의 단계는 카메라 좌표계를 기준으로 수행된다.Therefore, the depth image is converted from the image coordinate system to the camera coordinate system through preprocessing. The transformed pixel set is referred to as a point cloud. Most of the subsequent steps in the learning units 200 and 300 and the estimators 400 and 500 are performed based on the camera coordinate system.

카메라의 내부 매개변수를 알고 있다면 그 관계를 나타내는 행렬연산을 통해 영상 좌표계 데이터를 카메라 좌표계로 변환하거나 반대로 카메라 좌표계 데이터를 영상 좌표계로 변환 할 수 있다. 하지만, 카메라 내부 매개변수는 대부분 알려져 있지 않다. 그렇기 때문에 일반적으로 카메라 캘리브레이션 작업을 통해 카메라 내부 매개변수를 추정하여 사용한다. If you know the camera's internal parameters, you can convert the image coordinate system data to the camera coordinate system or convert the camera coordinate system data to the image coordinate system through matrix operation that represents the relationship. However, most of the camera internal parameters are unknown. Therefore, it is generally used by estimating camera internal parameters through camera calibration.

일 례로, 카메라 캘리브레이션 방법은 공개된 데이터베이스에서 사전에 정의된 카메라 내부 매개변수를 사용하거나 깊이 카메라의 제조사에서 제공하는 라이브러리에 구현된 카메라 내부 매개변수를 리턴 하는 함수를 통해 카메라 내부 매개변수를 획득할 수 있다.As an example, the camera calibration method uses a predefined camera internal parameter in a public database or obtains the camera internal parameter through a function that returns the internal camera parameter implemented in a library provided by the manufacturer of the depth camera. can

2차원 RGB영상의 영상 좌표계상의 한 화소

를 대응하는 카메라 좌표계상의 한 점

로 변환하는 식은 다음의 수학식 1과 같다. One pixel in the image coordinate system of a two-dimensional RGB image

a point in the camera coordinate system corresponding to

The formula for converting to is the same as Equation 1 below.

[수학식 1][Equation 1]

,

여기서,

와

는 초점거리를 말하며,

와

는 주점을 의미한다. 깊이영상은 거리에 대한 데이터를 화소 값으로 가지고 있으므로, 이 정보를 포함하는 행렬표현을 사용하여 정리하면 다음과 같은 수학식 2로 깊이영상의 영상 좌표계 데이터를 카메라 좌표계로 변환할 수 있다.here,

Wow

is the focal length,

Wow

means pub. Since the depth image has distance data as pixel values, if it is summarized using a matrix expression including this information, the image coordinate system data of the depth image can be converted into the camera coordinate system by Equation 2 below.

[수학식 2][Equation 2]

수학식 2에서

는 영상 좌표계에 표현된 깊이영상의 한 화소를 의미하고,

는 그에 대응하는 카메라 좌표계상의 한 점을 나타낸다. 그리고,

는 깊이영상의 화소 값, 즉 거리를 나타내는 깊이 값이다.in Equation 2

means one pixel of the depth image expressed in the image coordinate system,

denotes a point on the camera coordinate system corresponding to it. and,

is a pixel value of the depth image, that is, a depth value indicating distance.

도 3은 영상 좌표계상에 투영된 손 깊이영상을 카메라 좌표계로 변환한 결과이며 Matlab을 사용하여 시각화 하였다. 우측 하단부의 빨간색 손은 영상 좌표계상의 데이터이며 좌측 상단부의 검은색 손은 카메라 좌표계로 변환된 데이터를 나타낸다.3 is the result of converting the hand depth image projected on the image coordinate system to the camera coordinate system, and visualized using Matlab. The red hand in the lower right part represents data in the image coordinate system, and the black hand in the upper left part represents data converted to the camera coordinate system.

먼저, 손 모델과 포인트 클라우드를 정렬하는 모델정렬 단계에 대해 설명한다. 손 모델은 포인트 클라우드와 정렬된 후, 추정과정을 거쳐 변형되고, 변형된 손 모델은 손이 취하고 있는 자세를 표현한다. First, the model alignment step of aligning the hand model and the point cloud will be described. After the hand model is aligned with the point cloud, it is transformed through an estimation process, and the transformed hand model expresses the posture the hand is taking.

손 자세의 추정과정은 손바닥 자세를 먼저 추정하고 그 결과에 의존적으로 손가락 자세를 추정하는 계층적 절차로 수행된다. 이러한 계층적 추정절차에 알맞게끔 모델정렬 단계는 손바닥 모델을 먼저 정렬하고, 추정된 손바닥 자세에 의존적으로 손가락 모델을 정렬한다. The hand posture estimation process is performed as a hierarchical procedure of estimating the palm posture first and estimating the finger posture depending on the result. To suit this hierarchical estimation procedure, the model alignment step aligns the palm model first, and then aligns the finger model depending on the estimated palm posture.

역변환행렬은 손 모델을 정렬하기 위해 사용된다. 일반적으로 3차원 좌표계를 설명하기 위해 왼손좌표계와 오른손좌표계가 사용된다. 두 좌표계는 모두 검지가 가리키는 방향이 Y축, 엄지 방향이 X축, 중지 방향은 Z축을 나타낸다. 만약 손의 포인트 클라우드로부터 좌표계를 추정할 수 있다면 각 축은 손의 대략적인 회전정보를 표현할 수 있을 것이다. The inverse transformation matrix is used to align the hand model. In general, a left-handed coordinate system and a right-handed coordinate system are used to describe a three-dimensional coordinate system. In both coordinate systems, the direction indicated by the index finger indicates the Y axis, the direction of the thumb indicates the X axis, and the direction of the middle finger indicates the Z axis. If the coordinate system can be estimated from the point cloud of the hand, each axis can represent the approximate rotation information of the hand.

변환행렬과 역변환행렬은 추정된 좌표계로부터 생성되며, 손 모델은 역변환행렬을 통해 포인트 클라우드와 함께 정렬된다. 손바닥 모델을 정렬하는 역변환행렬은 방향 경계상자를 사용하여 생성된다. 그리고 손가락 모델을 정렬하는 역변환행렬은 추정된 손바닥 자세로부터 생성된다.A transformation matrix and an inverse transformation matrix are generated from the estimated coordinate system, and the hand model is aligned with the point cloud through the inverse transformation matrix. The inverse transformation matrix that aligns the palm model is created using a direction bounding box. And the inverse transformation matrix that aligns the finger model is generated from the estimated palm posture.

손바닥 모델을 정렬하기 위한 역변환행렬은 경계 볼륨(bounding volume) 접근법을 사용하여 생성된다. 경계 볼륨 접근법은 포인트 클라우드의 주방향을 찾아서 좌표계를 대략적으로 추정하기 위해 사용된다. 경계 볼륨 방법은 구형 볼륨, 축 정렬 경계 상자, 방향 경계 상자, 컨벡스 헐 등의 방법 등이 있으며 각 방법의 예는 도 4와 같다. 각 경계 볼륨 방법은 3차원으로 확장될 수 있으며 좌측에서 우측으로 갈수록 일반적으로 시간 복잡도와 공간 밀집도가 증가한다.An inverse transformation matrix for aligning the palm model is generated using a bounding volume approach. The boundary volume approach is used to roughly estimate the coordinate system by finding the principal direction of the point cloud. Boundary volume methods include a spherical volume, an axis-aligned bounding box, a directional bounding box, a convex hull, and the like, and an example of each method is shown in FIG. 4 . Each boundary volume method can be extended in three dimensions, and time complexity and spatial density generally increase from left to right.

포인트 클라우드에 대한 구형 볼륨과 컨벡스헐은 주방향을 정의하기 어렵다. 축 정렬 경계상자는 좌표축을 기준으로 정렬되기 때문에 주방향이 일정한 축으로만 표현되는 문제가 있다. 따라서, 포인트 클라우드의 주방향에 대한 정보를 가장 긴 변으로 표현할 수 있는 방향 경계 상자 방법을 사용한다. 포인트 클라우드의 주방향 정보를 포함하는 좌표계는 상자의 각 변을 나타내는 벡터로부터 계산된다. 또한, 벡터를 사용하여 손바닥을 모델을 정렬하기 위한 변환행렬과 역변환행렬을 생성한다.Spherical volumes and convex hulls for point clouds are difficult to define. Since the axis alignment bounding box is aligned on the basis of the coordinate axis, there is a problem in that the main direction is expressed only as an axis. Therefore, the direction bounding box method that can express information about the main direction of the point cloud with the longest side is used. The coordinate system including the circumferential direction information of the point cloud is calculated from the vector representing each side of the box. Also, using vectors, we create a transformation matrix and an inverse transformation matrix for aligning the palm model.

카메라 좌표계로 변환된 포인트 클라우드는 이상점(outlier)을 포함할 수 있다. 이상점은 깊이 카메라의 성능과 주변 환경에 의해 발생할 수 있으며, 경계 볼륨의 크기에 영향을 미치기 때문에 방향 경계 상자의 결과에 악영향을 가져온다. 따라서, 이상점을 제거하기 위해 반경 이상점 제거(radius outlier removal)와 통계적 이상점 제거(statistical outlier removal)방법 등이 적용될 수 있다. 그 후, 필터링 된 포인트 클라우드에 대해 방향 경계 상자는 포인트 클라우드의 주방향을 대략적으로 추정하기 위해 적용된다.The point cloud transformed into the camera coordinate system may include outliers. Outliers can be caused by the performance of the depth camera and the surrounding environment, and adversely affect the outcome of the directional bounding box as it affects the size of the bounding volume. Therefore, in order to remove the outlier, a radial outlier removal and a statistical outlier removal method may be applied. Then, for the filtered point cloud, a direction bounding box is applied to roughly estimate the principal direction of the point cloud.

방향 경계 상자로부터 그 무게 중심을 시작점으로 하는 서로 직교하는 세 개의 벡터를 추출할 수 있다. 각각의 벡터는 방향 경계 상자의 축을 나타내며 서로 직교하는 성질을 가지기 때문에 그 자체를 하나의 좌표계로써 간주할 수 있다. 변환행렬은 이처럼 좌표계로 간주되는 벡터의 성질을 이용하여 생성된다. 변환행렬을 생성하는 과정은 다음과 같다. 먼저, 아래의 수학식 3과 같이 방향 경계 상자로부터 추출된 벡터

를 정규화 하여 기저벡터

를 계산한다. From the direction bounding box, three vectors orthogonal to each other with the center of gravity as the starting point can be extracted. Each vector represents the axis of the direction bounding box, and since it has a property of being orthogonal to each other, it can be regarded as a coordinate system by itself. The transformation matrix is created using the properties of vectors considered as coordinate systems. The process of creating a transformation matrix is as follows. First, the vector extracted from the direction bounding box as shown in Equation 3 below

by normalizing the basis vector

to calculate

[수학식 3][Equation 3]

그 후, 기저벡터를 행렬의 형태로 나열하여 좌표계에 대한 변환행렬

을 다음의 수학식 4와 같이 생성한다. After that, the basis vectors are arranged in the form of a matrix, and the transformation matrix for the coordinate system is

is generated as in Equation 4 below.

[수학식 4][Equation 4]

손 모델을 정렬하기 위한 역변환행렬은 변환행렬

에 대한 역행렬

으로 계산된다. 하지만 앞서 산출된 기저벡터는 서로 직교하는 성질을 가지고 있기 때문에 서로의 내적은 항상 0이 된다. 그러므로 변환행렬

의 역행렬

은 전치행렬

로써 아래의 수학식 5와 같이 표현된다.The inverse transformation matrix for aligning the hand model is the transformation matrix

inverse matrix for

is calculated as However, since the basis vectors calculated above are orthogonal to each other, the dot product of each other is always 0. Therefore, the transformation matrix

inverse of

is the transpose matrix

It is expressed as Equation 5 below.

[수학식 5][Equation 5]

변환행렬과 역변환행렬은 손바닥과 손가락에 대해 각각 생성되며 각각

와

, 그리고

와

로 표기된다.A transformation matrix and an inverse transformation matrix are generated for the palm and the finger, respectively, and each

Wow

, and

Wow

is marked with

도 5는 카메라 좌표계상의 포인트 클라우드에 적용된 방향 경계 상자로부터 얻어진 3개의 벡터를 시각화한 결과를 보여준다. 방향 경계상자의 무게중심은 빨간색 점으로 표시된다. 방향 경계 상자를 적용하여 얻어진 길이가 가장 긴 벡터는 주방향 Y축으로 설정하며 빨간색으로 나타낸다. 오른쪽을 향하는 파란색 벡터는 X축, X-Y축에 직교하며 손바닥을 향하는 벡터는 Z축이며 초록색으로 그려진다. 도 5에서는 시각적 표현을 위해 각 벡터를 확대 출력하였다.5 shows the visualization result of three vectors obtained from the direction bounding box applied to the point cloud on the camera coordinate system. The center of gravity of the direction bounding box is indicated by a red dot. The longest vector obtained by applying the direction bounding box is set as the principal Y-axis and is shown in red. The blue vector pointing to the right is orthogonal to the X and X-Y axes, and the vector pointing to the palm is the Z axis and is drawn in green. In FIG. 5, each vector is enlarged for visual expression.

손 모델

의 손바닥 관절을 손바닥 모델

로 표기하며, 이것을 단순히 좌표계에 시각화 하여 표현하면 도 6과 같이 빨간색 별 모양으로 그려진다. hand model

the palmar joint of the palm model

, and if it is expressed simply by visualizing it in the coordinate system, it is drawn in the shape of a red star as shown in FIG. 6 .

손 모델을 포인트 클라우드와 정렬하기 위해 역변환 행렬

에 손바닥 모델

을 곱하면 기저벡터로 구성된 좌표계로 손바닥 모델의 좌표가 아래의 수학식 6과 같이 변환된다.Inverse transformation matrix to align the hand model with the point cloud

on palm model

When multiplied by , the coordinates of the palm model are transformed as shown in Equation 6 below into a coordinate system composed of basis vectors.

[수학식 6][Equation 6]

좌표계가 변환된 손바닥 모델

은

의 좌표를 기준으로 변환되었기 때문에 도 7과 같이 하단부에 위치하는 빨간색 별 모양으로 그려진다. The palm model with the transformed coordinate system

silver

Since it is converted based on the coordinates of , it is drawn in the shape of a red star located at the lower part as shown in FIG. 7 .

따라서, 포인트 클라우드와 함께 정렬하기 위해

의 정렬중심을 중지의 MCP 관절인

로 설정하고 방향 경계 상자의 중심점

와의 거리

를 계산한다. 그 후,

를 모든 관절에 대해 더해주면 정렬된 손바닥 모델

를 아래의 수학식 7과 같이 얻을 수 있다. So, to align with the point cloud

Alignment of the center of the middle of the MCP joint

set to and the center point of the direction bounding box

distance from

to calculate After that,

If we add for all joints, we get an aligned palm model.

can be obtained as in Equation 7 below.

[수학식 7][Equation 7]

최종적으로 정렬된 손 모델은 위의 그림에서 포인트 클라우드와 겹쳐서 초록색으로 점으로 표시된다. 손바닥 관절들의 관계를 쉽게 표현하기 위해

와

의 각 관절 사이에 직선을 임의로 추가하여 하였으며, 모든 관절을 향해 직선이 뻗어 나오는 원점이 손목 관절에 해당한다.The finally aligned hand model is displayed as a green dot overlaid with the point cloud in the figure above. In order to easily express the relationship between the palm joints

Wow

This was done by adding a straight line arbitrarily between each joint of , and the origin of the straight line extending toward all joints corresponds to the wrist joint.

손가락 모델정렬을 위한 변환행렬과 역변환행렬은 방향 경계 상자 대신 손바닥 자세 추정 과정에서 사용된 손바닥 변환행렬

을 사용하여 생성된다. 즉, 아래의 수학식 8과 같이

은

와 동일하다.The transformation matrix and inverse transformation matrix for finger model alignment are palm transformation matrices used in the palm posture estimation process instead of the direction bounding box.

is created using That is, as in Equation 8 below

silver

same as

[수학식 8][Equation 8]

손가락 모델은 손바닥 모델과 유사한 방법으로 정렬된다. 손가락 모델

는 손가락 모델을 정렬하기 위한 역변환행렬

를 사용하여 아래의 식과 같이 행렬연산을 통해 아래의 수학식 9와 같이 변환된다.The finger model is aligned in a similar way to the palm model. finger model

is the inverse transformation matrix for aligning the finger model.

is transformed as in Equation 9 below through matrix operation as shown in the following equation using

[수학식 9][Equation 9]

변환된 손가락 모델은 도 8과 같이 다섯 개의 손가락 모델이 손가락 원점을 기준으로 함께 정렬되는 잘못된 정렬 결과를 보여준다. 그 이유는 모든 손가락 관절에 대해 한 개의 역변환행렬

만 존재하기 때문이다. The converted finger model shows an incorrect alignment result in which the five finger models are aligned together based on the finger origin as shown in FIG. 8 . The reason is that one inverse transformation matrix for all finger joints

only because it exists.

따라서, 추정된 손바닥 모델

를 구성하는 MCP에 해당하는 관절들

을 정렬의 중심으로 하여

를 구성하는 각 손가락 모델을 이동시켜 손가락을 개별적으로 아래의 수학식 10과 같이 정렬하고, 그 결과는 도 9와 같다.Thus, the estimated palm model

Joints corresponding to the MCP constituting

as the center of sorting

By moving each finger model constituting the , the fingers are individually aligned as in Equation 10 below, and the result is shown in FIG. 9 .

[수학식 10][Equation 10]

이하에서는, 학습부(200, 300)에서 수행되는 랜덤 포레스트 학습단계와 학습된 랜덤 포레스트를 사용하여 자세를 추정하는 자세추정 단계, 그리고 학습과 추정에 사용되는 특징을 추출하는 방법을 설명한다.Hereinafter, a random forest learning step performed by the learning units 200 and 300, a posture estimation step of estimating a posture using the learned random forest, and a method of extracting features used for learning and estimation will be described.

랜덤 포레스트는 실시간 조건 문제를 해결하기 위해 사용된다. 랜덤 포레스트는 노드의 분기를 위해 단순히 특징 값과 임계 값의 대소를 비교하는 연산만 수행한다. 따라서, 손 자세를 매우 빠르게 추정하는 장점이 있다. 특징 값은 랜덤 포레스트의 학습과 추정을 위해 추출된다. Random forests are used to solve real-time conditional problems. The random forest simply performs an operation that compares the magnitude of the feature value and the threshold value for node branching. Therefore, there is an advantage of estimating the hand posture very quickly. Feature values are extracted for training and estimation of random forests.

랜덤 포레스트의 학습에 일반적으로 2차원 오프셋 특징이 사용된다. 그러나, 깊이영상은 3차원 거리 값을 가지고 있기 때문에 본 발명에서는 2차원 특징을 3차원으로 확장하는 구형 3차원 오프셋 특징을 제시한다.A two-dimensional offset feature is commonly used for learning random forests. However, since the depth image has a three-dimensional distance value, the present invention proposes a spherical three-dimensional offset feature that extends the two-dimensional feature into three dimensions.

랜덤 포레스트 학습과 자세추정에서 사용되는 특징을 추출하기 위해 일반적으로 사용된 특징(예를들어, 선행기술문헌의 비특허문헌 1)과 유사한 오프셋 특징이 주로 사용된다. 이러한 특징은 개별적으로는 정확한 분류를 수행하기에는 매우 약한 수준의 특징 값만을 제공한다. 하지만, 랜덤 포레스트를 통해 특징 값들이 결합되면 서로 다른 형태의 손 자세를 정확하게 추정할 수 있게 된다. An offset feature similar to a feature generally used in random forest learning and posture estimation (eg, non-patent document 1 of the prior art document) is mainly used. These features individually provide only a very weak level of feature values to perform accurate classification. However, when feature values are combined through a random forest, different types of hand postures can be accurately estimated.

도 10은 선행기술문헌의 비특허문헌 1에서 참조된 도면으로 신체의 각 부분을 찾기 위해 사용된 2차원 오프셋 특징을 보여준다. 주어진 참조 화소의 위치

는 특징이 참조되는 화소의 원점을 나타내고 노란색 엑스 모양으로 표기된다. 그리고 특징 매개변수

는 영상에 대한 2차원 오프셋을 나타내며

로 정의된다.

과

는 각각 빨간색 동그라미로 표기된다. 특징에 대한 응답 값은 아래의 수학식 11과 같이 계산된다. 10 is a diagram referenced in Non-Patent Document 1 of the prior art document and shows the two-dimensional offset feature used to find each part of the body. position of a given reference pixel

denotes the origin of the pixel to which the feature is referenced and is denoted by a yellow X. and feature parameters

represents the two-dimensional offset to the image,

is defined as

class

Each is indicated by a red circle. The response value for the feature is calculated as in Equation 11 below.

[수학식 11][Equation 11]

함수

는 특정한 영상내의 화소

의 위치에 해당하는 깊이 값을 나타낸다. 즉, 수학식 11로 계산되는 특징 값은 깊이영상에서 두 오프셋 위치에 해당하는 화소가 가지는 깊이 값의 차이이다. 깊이카메라로 촬영된 객체는 카메라로부터의 거리에 따라 깊이 영상에서 서로 다른 화소 값을 가진다. function

is a pixel in a particular image.

Indicates the depth value corresponding to the position of . That is, the feature value calculated by Equation 11 is the difference between the depth values of the pixels corresponding to the two offset positions in the depth image. Objects photographed with the depth camera have different pixel values in the depth image according to the distance from the camera.

수학식 11에서

에 의한 정규화는 특징응답이 깊이 불변임을 보장한다. 만약 오프셋이 배경 또는 영상좌표 외부에 존재하는 경우 선행기술문헌의 비특허문헌 1과 동일하게 큰 양의 상수 값을 지정한다. 트리를 학습하는 동안 오프셋은 고정된 크기의 영역 내에서 랜덤하게 선택 된다. in Equation 11

Normalization by , guarantees that the feature response is depth invariant. If the offset exists outside the background or image coordinates, a large positive constant value is designated as in Non-Patent Document 1 of the prior art document. During tree learning, the offset is randomly selected within a fixed-sized region.

도 10은 두 가지 종류의 특징을 보여준다. 매개변수가

경우에 대해 위의 식을 적용하면 (a)에서는 큰 값을 가지게 되지만 (b)에서는 작은 값을 가지게 되고 신체부위와 배경을 구분하기 위한 특징으로 사용할 수 있다. 유사한 이유로

매개변수를 사용하는 경우 팔과 같은 얇은 구조를 찾아내는데 도움이 될 수 있다. 10 shows two types of features. parameter

If the above formula is applied to the case, it has a large value in (a), but a small value in (b), and can be used as a feature to distinguish body parts and backgrounds. for similar reasons

When using parameters, it can help to spot thin structures such as arms.

이러한 단순한 깊이 차이 특징은 매우 뛰어난 계산 효율성을 보인다. 사전 처리가 필요하지 않고, 각 특징은 최대 3개의 영상 화소만 사용하며 최대 5회의 산술연산만 수행하여 특징 값을 얻는다. 그리고 이 과정은 랜덤 포레스트 내에서 병렬로 처리 가능한 장점이 있다.This simple depth difference feature shows very good computational efficiency. No pre-processing is required, each feature only uses up to 3 image pixels and only performs arithmetic operations up to 5 times to obtain feature values. And this process has the advantage that it can be processed in parallel within the random forest.

앞서 설명한 오프셋 특징은 2차원 영상에서 수행된다. 깊이영상은 2차원 공간의 영상에서 표현되지만 거리에 대한 정보를 화소 값으로 가지고 있기 때문에 3차원 데이터로 간주할 수 있다. 따라서, 오프셋 특징 추출을 위한 공간을 2차원에서 3차원으로 확장하는 방법을 제안한다.The offset feature described above is performed on a 2D image. Although a depth image is expressed in an image in a two-dimensional space, it can be regarded as three-dimensional data because it has distance information as a pixel value. Therefore, we propose a method of extending the space for offset feature extraction from 2D to 3D.

먼저, 3차원 공간 내에서 오프셋의 위치를 추출하기 위한 공간을 한정하기 위해 구의 방정식을 사용하여 특징 추출 영역을 제한한다. 중심이

반지름이

인 구의 방정식은

이다. First, we limit the feature extraction region using a spherical equation to define the space for extracting the position of the offset in the three-dimensional space. center

radius

The equation for the population is

to be.

본 발명에서는 중심이

이고, 각 축의 범위를 -1부터 1 사이로 정규화한 3차원 오프셋 추출 공간을 기본설정으로 사용한다. 따라서, 구의 방정식은

이 된다. 그 후

각각에 대해 -1부터 1사이의 값을 랜덤함수를 사용하여 랜덤하게 할당한다. In the present invention, the center

and a three-dimensional offset extraction space in which the range of each axis is normalized from -1 to 1 is used as a default setting. So, the sphere's equation is

becomes this After that

For each, a value between -1 and 1 is randomly assigned using a random function.

3차원 오프셋 특징은 구의 방정식에 의해 도 11처럼 구의 공간 내에서 설정된다. 만약, 할당된

의 값이

을 만족하면 생성된 랜덤 값이 구 공간 내에 포함되므로 3차원 오프셋 특징의 특징 매개변수

로 사용될 수 있다. 즉

일 때,

이 된다. 특징 추출을 위한 공간의 크기를 변경 하려면

값의 크기를 변경한다. The three-dimensional offset feature is set in the space of the sphere as shown in FIG. 11 by the equation of the sphere. If assigned

the value of

If is satisfied, the generated random value is contained within the sphere space, so the feature parameter of the three-dimensional offset feature

can be used as In other words

when,

becomes this To change the size of the space for feature extraction

Change the size of the value.

카메라를 통해 촬영된 RGB영상은 투영광선 상에 존재하는 모든 점들이 한 점으로 표현되기 때문에 3차원 공간에 대한 일부 정보가 손실되는 문제가 있다. 이러한 문제는 깊이카메라를 사용하여 촬영되는 깊이영상에서도 여전히 존재한다. The RGB image captured by the camera has a problem in that some information about the three-dimensional space is lost because all points on the projection ray are expressed as one point. This problem still exists in the depth image taken using the depth camera.

따라서, 카메라 좌표계에 투영된 손의 포인트 클라우드는 3차원 정보의 일부분을 포함하기는 하지만 완전한 3차원 손 형태를 표현하지는 못한다. 그러므로 위와 같은 형태의 3차원 오프셋을 사용하더라도

위치에 데이터가 존재하지 않을 수 있기 때문에 완벽한 오프셋 특징 값을 추출하지 못하는 문제가 발생한다. Therefore, although the point cloud of the hand projected on the camera coordinate system contains a part of the 3D information, it does not represent the complete 3D hand shape. Therefore, even if a three-dimensional offset of the form above is used,

Since data may not exist at the location, a problem arises in that it is not possible to extract a perfect offset feature value.

따라서, 3차원 오프셋의 위치를 영상 좌표계로 역 투영하여 오프셋 클라우드 포인트로부터 특징 값을 계산한다. 역 투영을 위해 먼저 선택된 3차원 오프셋을 카메라 좌표계로 역변환 한다. 역변환 된 3차원 오프셋은

로 정의된다. 역변환 과정의 수식은 다음의 수학식 12와 같다.Therefore, the feature value is calculated from the offset cloud point by inversely projecting the position of the 3D offset into the image coordinate system. For inverse projection, the 3D offset selected first is inversely transformed into the camera coordinate system. The inversely transformed 3D offset is

is defined as Equation of the inverse transformation process is as Equation 12 below.

[수학식 12][Equation 12]

여기서,

는 구 공간에서 선택된 3차원 오프셋이며,

는 손 모델의 각 관절 위치를 나타내는 3차원 좌표이다.

은 역변환행렬을 의미한다. 카메라 좌표계로 변환된 3차원 오프셋은 역 투영 과정을 통해 다음의 수학식 13과 같이 영상 좌표계로 변환된다. here,

is the selected three-dimensional offset in spherical space,

is a three-dimensional coordinate indicating the position of each joint in the hand model.

is the inverse transformation matrix. The 3D offset converted to the camera coordinate system is converted into the image coordinate system as shown in Equation 13 below through the reverse projection process.

[수학식 13][Equation 13]

수학식 13에서 카메라 좌표계상의 오프셋은

로 표현된다. 영상 좌표계상의 오프셋 위치는

로 표현된다.

은 초점거리를 의미하고

와

는 주점을 의미한다. 계산된 영상 좌표계의 오프셋 위치가 배경 밖에 존재하는 경우 해당 오프셋 위치의 최 외곽 화소의 위치로 대체한다. 그 후에 두 오프셋 화소의 깊이 차이를 아래의 수학식 14와 같이 계산하여 특징 값으로 사용한다.In Equation 13, the offset in the camera coordinate system is

is expressed as The offset position in the image coordinate system is

is expressed as

is the focal length

Wow

means pub. If the calculated offset position of the image coordinate system exists outside the background, it is replaced with the position of the outermost pixel of the offset position. Thereafter, the difference in depth between the two offset pixels is calculated as in Equation 14 below and used as a feature value.

[수학식 14][Equation 14]

는 영상 좌표계의 오프셋 위치

의 깊이 값에 해당하며

는 두 오프셋의 깊이 값의 차이이다.

is the offset position of the image coordinate system

corresponds to the depth value of

is the difference between the depth values of the two offsets.

트리는 예전부터 많이 사용 해오던 분류 방법 중 하나이다. 그러나, 과적합에 대한 문제점을 가지는 것으로 알려져 있다. 트리는 사전에 학습된 데이터에 대해서는 잘 추정 할 수 있지만 알려지지 않은 데이터에 대해서는 그 성능이 떨어진다. A tree is one of the classification methods that have been widely used for a long time. However, it is known to have problems with overfitting. Trees can estimate well on pre-trained data, but perform poorly on unknown data.

랜덤 포레스트는 도 12에서처럼 분할노드(파란색)와 말단부의 리프노드(초록색)로 구성된 트리

의 앙상블 형태로 표현된다.

을 사용하여 트리를 구성하는 다수의 노드 중에서 특정한 노드를 정의한다. 그리고,

을 사용하여 구체적인 리프노드를 정의한다. A random forest is a tree composed of a split node (blue) and a leaf node (green) at the end as shown in FIG.

expressed in the form of an ensemble of

is used to define a specific node among many nodes composing the tree. and,

to define a specific leaf node.

각 분할 노드는 매개변수

로 정의되는 약한 학습자를 포함한다. 첫 번째 매개변수는

로 정의된다. 여기서

는 특징추출에 사용되는 오프셋 특징을 나타낸다. 두 번째 매개변수

는 스칼라 임계값이다. 깊이영상에서 참조되는 화소

에 대한 추정 값을 얻기 위해 약한 학습자 함수를 반복적으로 평가한다. 이 과정은 루트에서 시작하여 리프노드에 도달 할 때까지 경로를 순회하며 반복된다.Each split node has a parameter

Includes weak learners defined as the first parameter is

is defined as here

denotes an offset feature used for feature extraction. second parameter

is a scalar threshold. Pixels referenced in the depth image

Iteratively evaluates the weak learner function to obtain an estimate for This process repeats, starting at the root and traversing the path until reaching a leaf node.

특정한 노드에서 어느 방향의 자식노드로 분기를 할 것인지의 여부는 아래의 수학식 15로 판단한다. 여기서 우측 항의 결과는 0 과 1을 나타낸다. 함수

의 출력 값과 임계값

의 크기가 0과 1을 구분하는 기준이 된다. 만약

이 0으로 평가 되면 노드

의 왼쪽 자식노드 방향으로 분기하고 1로 평가되면 오른쪽 자식노드 방향으로 분기한다.Whether to branch from a specific node to a child node in which direction is determined by Equation 15 below. Here, the results of the right term represent 0 and 1. function

output value and threshold of

The size of is the criterion for distinguishing 0 and 1. what if

If this evaluates to 0, the node

Branch in the direction of the left child of , and if evaluated as 1, branch in the direction of the right child.

[수학식 15][Equation 15]

이 작업을 임의의 리프노드

에 도달 할 때까지 반복 수행한다. 그리고,

를 사용하여 참조되는 화소

가 도달한 특정한 리프노드를 정의한다. 동일한 절차를 각각의 트리

에 대해 깊이영상의 각 화소에 적용하여, 리프 노드의 집합이 생성되며, 아래의 수학식 16과 같이 표현된다.Do this on any leaf node

Repeat until reaching . and,

Pixels referenced using

Defines the specific leaf node reached by The same procedure for each tree

is applied to each pixel of the depth image to generate a set of leaf nodes, which is expressed as Equation 16 below.

[수학식 16][Equation 16]

각 트리의 리프노드

에는 학습된 예측모델이 저장된다. 예측모델은 일반적으로 분류트리와 회귀트리를 기반으로 하는 방법을 사용한다. 분류트리의 경우 일반적으로 입력되는 데이터 집합을 특정한 부류로 할당하는데 사용된다. 예측모델은 어떠한 부류

에 대한 확률 매스 함수

이다. leaf node of each tree

The learned predictive model is stored in Predictive models generally use methods based on classification trees and regression trees. In the case of a classification tree, it is generally used to assign an input data set to a specific class. What kind of predictive model is

probability mass function for

to be.

회귀트리의 경우 가중치가 있는 각 관절에 대한 상대적 투표수를 사용할 수 있다. 본 발명에서는 모든 학습 영상에 대한 해당 관절의 그라운드 트루스 관절의 위치와 정렬된 손 관절 사이의 잔차의 평균을 추정되는 값으로 사용한다. For regression trees, we can use the relative number of votes for each weighted joint. In the present invention, the average of the residual between the position of the ground truth joint of the corresponding joint and the aligned hand joint for all training images is used as an estimated value.

깊이영상이 주어지면 랜덤 포레스트의 트리는 입력 화소

에 대해 리프노드

에 도달할 때까지 하강되며 분포

가 검색된다. 분포는 포레스트의 모든 트리에 대해 평균화 되며 아래의 수학식 17과 같이 평균하여 최종 분류 결과를 얻는다. Given a depth image, the tree of the random forest is an input pixel

about leaf nodes

It descends until it reaches

is searched for The distribution is averaged over all trees in the forest, and the final classification result is obtained by averaging as in Equation 17 below.

[수학식 17][Equation 17]

학습부(200, 300)의 순서도에 포함된 랜덤 포레스트의 학습과정은 다음과 같다. 먼저, 손 모델과 전체 GT자세 그리고 전체 깊이영상을 각각 역변환행렬을 사용하여 정렬한다. 그리고, 전체 GT자세와 전체 정렬된 손 모델에 대한 잔차를 계산한다. 이후, 전체 학습영상에 대한 잔차의 분산을 구한다. 마지막으로, 3차원 오프셋 특징을 사용하여 잔차의 최대 분산감소를 가져오는 특징을 선택하는 것이 노드의 학습 과정이다. The learning process of the random forest included in the flowchart of the learning units 200 and 300 is as follows. First, the hand model, the entire GT posture, and the full depth image are aligned using an inverse transformation matrix, respectively. Then, the residuals are calculated for the overall GT posture and the overall aligned hand model. Then, the variance of the residuals for the entire training image is calculated. Finally, the learning process of the node is to select the feature that brings the maximum variance reduction of the residual by using the three-dimensional offset feature.

이 과정이 트리 생성을 위한 특정한 임계값에 도달할 때까지 수행되면 한 개의 학습된 트리가 완성되고, 모든 트리에 대해 이 과정이 완료되면 학습된 랜덤 포레스트가 생성된다.When this process is performed until a certain threshold for tree generation is reached, one learned tree is completed, and when this process is completed for all trees, a learned random forest is created.

예를 들어, 학습영상 1,000개에 대한 잔차는 관절 개수와 위치에 해당하는 21행과 좌표 값 x, y, z를 나타내는 3열로 구성된 1,000개의 행렬로 표현된다. 구형 3차원 오프셋 특징을 랜덤하게 생성하여 각 특징마다 전체 영상에 대해서 깊이 차이 값을 계산한다. 1,000개의 영상에 대해 이 과정을 수행한다면 깊이 차이 값은 512 x 1,000 행렬로 표현된다. For example, the residual for 1,000 training images is expressed as 1,000 matrices consisting of 21 rows corresponding to the number and position of joints and 3 columns indicating the coordinate values x, y, and z. By randomly generating a spherical 3D offset feature, a depth difference value is calculated for the entire image for each feature. If this process is performed for 1,000 images, the depth difference value is expressed as a 512 x 1,000 matrix.

앞서 계산된 전체 행렬로부터 각 관절에 대한 잔차를 모아서 21개의 1,000 x 3 행렬을 만든다. 21개의 각 행렬에 대해 분산을 계산하고 앞서 계산된 깊이 차이 값을 임계값으로 하여 최대 분산감소를 가져오는 특징 한 개를 선택한다. 이 특징이 바로 노드의 매개변수인 오프셋의 위치와 임계값에 해당한다. By collecting the residuals for each joint from the entire matrix calculated earlier, we make 21 1,000 x 3 matrices. The variance is calculated for each of the 21 matrices, and one feature with the maximum variance reduction is selected using the previously calculated depth difference value as a threshold. This characteristic corresponds to the position and threshold value of the offset, which are parameters of the node.

그리고, 임계값을 기준으로 학습영상을 좌측 자식노드와 우측 자식노드로 나눈 뒤 앞의 노드 학습과정을 반복한다. 마지막으로, 영상의 개수가 10개 이하에 해당하는 경우 해당하는 노드를 리프노드로 설정하고 해당하는 관절 잔차의 평균값을 추정된 잔차 값으로 사용한다. 이 과정이 본 발명에서 랜덤 포레스트를 구성하는 한 개의 트리를 학습하는 과정이다. Then, based on the threshold, the learning image is divided into a left child node and a right child node, and then the previous node learning process is repeated. Finally, when the number of images is 10 or less, the corresponding node is set as a leaf node, and the average value of the corresponding joint residual is used as the estimated residual value. This process is a process of learning one tree constituting a random forest in the present invention.

추정부(400, 500) 학습된 랜덤 포레스트를 사용하여 현재 정렬되어 있는 손 자세와 다음에 정렬되어야 하는 손 자세 사이의 잔차를 추정한다. 랜덤 포레스트를 구성하는 트리 노드들은 분기를 통해 리프노드에 도달한다. 분기를 위해 노드에 할당되어 있는 3차원 오프셋 특징을 사용하여 입력 깊이영상에서 특징 값을 계산한다. The estimation units 400 and 500 estimate the residual between the currently aligned hand posture and the hand posture to be aligned next using the learned random forest. Tree nodes composing a random forest reach leaf nodes through branches. For branching, the feature value is calculated from the input depth image using the 3D offset feature assigned to the node.

그 후, 노드에 할당된 임계값과의 대소비교를 통해 자식노드로의 분기과정을 반복적으로 수행한다. 분기를 통해 리프노드에 도달하면 잔차 값이 추정된다. 이 과정을 랜덤 포레스트를 구성하는 모든 트리에 대해 수행하여 모든 도달된 리프노드가 추정한 잔차 값을 평균한 것이 최종적으로 추정된 잔차로 사용된다.After that, the branching process to the child node is repeatedly performed through comparison with the threshold value assigned to the node. When a leaf node is reached via branching, the residual value is estimated. This process is performed for all trees constituting the random forest, and the average of the residual values estimated by all reached leaf nodes is used as the final estimated residual.

모델갱신은 추정부(400, 500)에서 얻어진 잔차를 사용하여 손 모델을 변형하는 과정과 반복되는 계층적인 추정과정에 사용되는 변환행렬과 역변환행렬을 갱신하는 과정으로 구성된다. 모델갱신은 손바닥과 손가락에 대해 다른 방법으로 수행된다. 손 자세의 추정은 손바닥을 먼저 추정하고 그 결과에 의존적으로 손가락을 추정하는 계층적인 순서로 진행된다. 그러므로 먼저 손바닥 모델갱신 단계를 설명하고, 손가락 모델갱신 단계에 대해 설명한다.The model update consists of a process of transforming a hand model using the residuals obtained from the estimation units 400 and 500 and a process of updating a transformation matrix and an inverse transformation matrix used in the repeated hierarchical estimation process. The model update is done in different ways for the palm and fingers. The estimation of hand posture proceeds in a hierarchical order of estimating the palm first and estimating the fingers depending on the result. Therefore, the palm model update step will be described first, and the finger model update step will be described.

추정단계에서 랜덤 포레스트를 사용하여 추정된 손바닥 모델의 잔차

는 정렬된 손바닥 모델

의 각 관절위치에 더해져 손바닥 모델의 관절 위치를 변형시키고 손바닥 자세로써 표현된다. 먼저, 잔차

는 정렬된 손바닥 모델의 포인트 클라우드와 동일한 좌표계에서 더해져야 하므로 역변환행렬에

을 곱해서 카메라 좌표계로 변환해야 한다. The residual of the palm model estimated using random forest in the estimation step

is an aligned palm model

The joint position of the palm model is modified by adding it to each joint position of , and it is expressed as a palm posture. First, the residual

must be added in the same coordinate system as the point cloud of the aligned palm model, so it is added to the inverse transformation matrix.

Multiply by to convert to the camera coordinate system.

계층적 추정과정은 사용자가 지정한 반복횟수

만큼 수행되기 때문에, 현재의 손바닥 모델 자세는 이전단계의 모델 자세에 의존적이게 된다. 따라서, 현재의 손바닥 자세는 이전단계의 손바닥 자세에 잔차를 더하여 아래의 수학식 18과 같이 갱신된다. The hierarchical estimation process is the number of iterations specified by the user.

, the current palm model posture becomes dependent on the model posture of the previous stage. Therefore, the current palm posture is updated as in Equation 18 below by adding the residual to the palm posture of the previous stage.

[수학식 18][Equation 18]

랜덤 포레스트로 추정된 잔차를 더해서 추정된 현재의 손바닥 모델의 관절 위치를 나타내는 손바닥 자세

은 도 13과 같이 그려지고, 계층적 추정이 반복 될 때마다 손바닥 모델은 변형된다. The palm posture representing the joint position of the current palm model estimated by adding the residuals estimated with the random forest.

is drawn as in Fig. 13, and each time the hierarchical estimation is repeated, the palm model is transformed.

손바닥 변환행렬은 현재의 손바닥 자세

를 추정하기 위해 이전에 추정된 손바닥 자세

를 이용하여 갱신된다. Y축 벡터는 다음의 수학식 19와 같이 중지 MCP와 손목의 차 연산으로 계산한다.The palm transformation matrix is the current palm posture.

Previously estimated palm posture to estimate

is updated using The Y-axis vector is calculated by calculating the difference between the middle MCP and the wrist as shown in Equation 19 below.

[수학식 19][Equation 19]

Z축 벡터는 MCP 관절들의 평균벡터와 Y축의 외적으로 계산된다. MCP의 평균벡터는 각 MCP관절과 손목관절의 차 연산으로 얻어진 다섯 개의 벡터를 합하고 평균하여 다음의 수학식 20과 같이 얻어진다.The Z-axis vector is calculated as the cross product of the average vector of the MCP joints and the Y-axis. The average vector of the MCP is obtained as in Equation 20 below by summing and averaging the five vectors obtained by the difference calculation between each MCP joint and the wrist joint.

[수학식 20][Equation 20]

마지막으로, X축 벡터는 Y축 벡터와 Z축 벡터를 외적하여 다음의 수학식 21과 같이 계산된다.Finally, the X-axis vector is calculated as in Equation 21 below by cross-producting the Y-axis vector and the Z-axis vector.

[수학식 21][Equation 21]

이렇게 얻어진 3개의 축 벡터는 세 개의 벡터를 정규화하여 기저벡터를 계산한다. 그 후, 행렬의 형태로 나열하면 변환행렬이 생성된다. 역변환행렬은 세 개의 기저벡터가 서로 직교하는 성질을 가지기 때문에 전치행렬로써 표현된다.The obtained three axis vectors are calculated by normalizing the three vectors to calculate the basis vector. After that, if they are arranged in the form of a matrix, a transformation matrix is created. The inverse transformation matrix is expressed as a transpose matrix because the three basis vectors are orthogonal to each other.

손가락 모델 변형과정은 손바닥 모델 변형과정과 동일하게 수행된다. 추정된 손가락 모델의 잔차

은 정렬된 손가락 모델

와 동일한 좌표계에서 더해지기 위해 역변환행렬을 사용하여 변환된다. 그 후 이전 단계의 손가락 자세에 더해져서 변형된 손가락 자세를 다음의 수학식 22와 같이 나타낸다.The finger model transformation process is performed in the same way as the palm model transformation process. Residuals of the estimated finger model

silver aligned finger model

It is transformed using an inverse transformation matrix to be added in the same coordinate system as . Thereafter, the modified finger posture by adding to the finger posture of the previous step is expressed as in Equation 22 below.

[수학식 22][Equation 22]

손가락 변환행렬은 다음번의 계층적 추정을 위해 이전에 추정된 손가락의 자세를 이용하여 갱신된다. 추정된 손가락 자세

는 도 14와 같이 그려지고, 계층적 추정이 반복 수행될 때마다 손가락 자세는 변형된다.

는 다섯 개의 손가락으로 구성되며, 각 손가락은 세 개의 관절 PIP, DIP, TIP로 구성된다.The finger transformation matrix is updated using the previously estimated finger posture for the next hierarchical estimation. estimated finger posture

is drawn as shown in FIG. 14, and each time the hierarchical estimation is repeatedly performed, the finger posture is changed.

is composed of five fingers, and each finger is composed of three joints PIP, DIP, and TIP.

Y축 벡터는 다음과 같이 각 MCP와 각 PIP의 차 연산으로 다음의 수학식 23과 같이 계산된다.The Y-axis vector is calculated as in Equation 23 below by calculating the difference between each MCP and each PIP as follows.

[수학식 23][Equation 23]

X축 벡터는

의 평균벡터와 Y축의 외적으로 계산된다.

의 평균벡터는 손가락을 구성하는 세 관절과 MCP 관절의 차 연산으로 계산된 세 개의 벡터를 더한 뒤 평균하여 연산으로 다음의 수학식 24와 같이 계산된다.The x-axis vector is

It is calculated as the cross product of the mean vector of and the Y axis.

The average vector of is calculated as in Equation 24 below by adding and averaging three vectors calculated by the difference operation between the three joints constituting the finger and the MCP joint.

[수학식 24][Equation 24]

Z축 벡터는 Y축 벡터와 X축 벡터를 외적하여 다음의 수학식 25와 같이 계산된다.The Z-axis vector is calculated as in Equation 25 below by cross-producting the Y-axis vector and the X-axis vector.

[수학식 25][Equation 25]

그 후, 정규화과정을 통해 변환행렬을 생성하고, 역변환행렬은 전치행렬로 생성한다.After that, a transformation matrix is generated through a normalization process, and the inverse transformation matrix is created as a transpose matrix.

손자세 표현부(700)는 추정된 손바닥 자세와 손가락 자세를 합하여 최종적인 손 자세를 추정하여 출력한다.The hand posture expression unit 700 estimates and outputs the final hand posture by adding the estimated palm posture and finger posture.

이에 따라, 본 발명은, 역변환행렬을 사용하여 손바닥과 손가락을 개별적으로 다루는 계층적인 추정방법을 통해 손 자세의 고차원 자유도, 모양 변화, 폐색 문제를 해결할 수 있다. 또한, 단순한 특징을 사용하는 랜덤 포레스트를 통해 실시간조건 문제를 해결하여, 손 자세를 정확하고 빠르게 추정할 수 있다.Accordingly, the present invention can solve the high-order degree of freedom of hand posture, shape change, and occlusion problems through a hierarchical estimation method that separately handles palms and fingers using an inverse transformation matrix. In addition, it is possible to accurately and quickly estimate the hand posture by solving the real-time condition problem through the random forest using simple features.

도 15는 본 발명의 일 실시예에 따른 손 자세의 계층적 학습 방법의 흐름도이다. 도 16은 본 발명의 일 실시예에 따른 손 자세의 추정 방법의 흐름도이다.15 is a flowchart of a hierarchical method of learning a hand posture according to an embodiment of the present invention. 16 is a flowchart of a method for estimating a hand posture according to an embodiment of the present invention.

본 실시예에 따른 랜덤 포레스트를 사용한 손 자세의 계층적 추정 방법은, 도 1의 장치(10)와 실질적으로 동일한 구성에서 진행될 수 있다. 따라서, 도 1의 장치(10)와 동일한 구성요소는 동일한 도면부호를 부여하고, 반복되는 설명은 생략한다. The hierarchical method for estimating the hand posture using the random forest according to the present embodiment may proceed in substantially the same configuration as the apparatus 10 of FIG. 1 . Accordingly, the same components as those of the device 10 of FIG. 1 are given the same reference numerals, and repeated descriptions are omitted.

또한, 본 실시예에 따른 랜덤 포레스트를 사용한 손 자세의 계층적 추정 방법은 랜덤 포레스트를 사용한 손 자세의 계층적 추정을 수행하기 위한 소프트웨어(애플리케이션)에 의해 실행될 수 있다.In addition, the hierarchical method for estimating the hand posture using the random forest according to the present embodiment may be executed by software (application) for performing the hierarchical estimation of the hand posture using the random forest.

도 15를 참조하면, 본 실시예에 따른 랜덤 포레스트를 사용한 손 자세의 계층적 추정 방법은, 2차원 영상 좌표계에서 표현되는 깊이영상을 3차원 카메라 좌표계로 변환하여 전처리한다(단계 S10). 입력데이터는 깊이영상 및 손 모델일 수 있다.Referring to FIG. 15 , in the hierarchical estimation method of hand posture using a random forest according to the present embodiment, a depth image expressed in a two-dimensional image coordinate system is converted into a three-dimensional camera coordinate system and pre-processed (step S10). The input data may be a depth image and a hand model.

입력데이터로 손 모델과 깊이영상 및 GT자세를 사용하여 학습된 손바닥 자세에 대한 랜덤 포레스트를 생성하고, 학습된 손바닥 자세에 의존적으로 학습된 손가락 자세에 대한 랜덤 포레스트를 생성한다.A random forest for the learned palm posture is generated using the hand model, depth image, and GT posture as input data, and a random forest is generated for the learned finger posture depending on the learned palm posture.

구체적으로, 먼저 손바닥에 대해 랜덤 포레스트를 생성하여 학습한다. 변환행렬을 사용하여 학습을 위해 손바닥 모델과 GT자세를 깊이영상과 함께 정렬한다(단계 S21).Specifically, first, a random forest is generated for the palm and learned. Align the palm model and GT posture with the depth image for learning using the transformation matrix (step S21).

구형 3차원 오프셋 특징을 추출하고 정렬된 손바닥 모델과 GT자세의 잔차의 분산을 최소화하는 특징을 선택하는 형태로 손바닥의 랜덤 포레스트를 학습하여 출력한다(단계 S22). 출력된 랜덤 포레스트를 사용하여 구형 3차원 오프셋 특징을 추출하고 손바닥 모델의 잔차를 추정한다(단계 S23). 추정된 잔차를 손바닥 모델에 반영하여 손바닥 모델을 변형하는 모델갱신하고, N+1번 째 학습을 위해 변환행렬을 갱신한다(단계 S24). A spherical three-dimensional offset feature is extracted, and a random forest of the palm is learned and output in the form of selecting a feature that minimizes the variance of the residual of the aligned palm model and GT posture (step S22). Using the output random forest, a spherical three-dimensional offset feature is extracted and the residual of the palm model is estimated (step S23). The model that transforms the palm model is updated by reflecting the estimated residual to the palm model, and the transformation matrix is updated for the N+1th learning (step S24).

손바닥 자세의 학습 절차는 사용자가 지정한 반복 횟수 N 만큼 반복되어 계층적으로 수행될 수 있다(단계 S25). 이에 따라, 손바닥에 대한 랜덤 포레스트가 N개 생성된다.The palm posture learning procedure may be performed hierarchically by repeating the number of repetitions N designated by the user (step S25). Accordingly, N random forests for the palm are generated.

손바닥에 대해 랜덤 포레스트의 학습이 완료되면, 그 결과에 의존적으로 손가락에 대한 랜덤 포레스트를 생성하여 학습한다. When the learning of the random forest for the palm is completed, a random forest for the finger is generated and learned depending on the result.

먼저, 변환행렬을 사용하여 학습을 위해 손가락모델과 GT자세를 깊이영상과 함께 정렬한다(단계 S31).First, the finger model and the GT posture are aligned with the depth image for learning using the transformation matrix (step S31).

구형 3차원 오프셋 특징을 추출하고 정렬된 손가락 모델과 GT자세의 잔차의 분산을 최소화하는 특징을 선택하는 형태로 손가락의 랜덤 포레스트를 학습하여 출력한다(단계 S32). 출력된 랜덤 포레스트를 사용하여 구형 3차원 오프셋 특징을 추출하고 손가락 모델의 잔차를 추정한다(단계 S33). 추정된 잔차를 손가락 모델에 반영하여 손가락 모델을 변형하는 모델갱신하고, N+1번 째 학습을 위해 변환행렬을 갱신한다(단계 S34). The spherical three-dimensional offset feature is extracted and the random forest of the finger is learned and output in the form of selecting the feature that minimizes the variance of the residual of the aligned finger model and the GT posture (step S32). Using the output random forest, a spherical three-dimensional offset feature is extracted and the residual of the finger model is estimated (step S33). The model that transforms the finger model is updated by reflecting the estimated residual to the finger model, and the transformation matrix is updated for the N+1th learning (step S34).

손가락 자세의 학습 절차는 사용자가 지정한 반복 횟수 N 만큼 반복되어 계층적으로 수행될 수 있다(단계 S35). 이에 따라, 손가락에 대한 랜덤 포레스트가 N개 생성된다.The finger posture learning procedure may be performed hierarchically by repeating the number of repetitions N designated by the user (step S35). Accordingly, N random forests for the fingers are generated.

손바닥과 손가락의 학습으로 랜덤 포레스트가 생성되면, 그 결과에 기초하여 전역 회전에 대한 정보를 포함하는 손바닥 자세를 추정하고, 학습된 손바닥 자세에 의존적으로 학습된 손가락 자세에 대한 랜덤 포레스트를 사용하여 손가락 자세를 추정한다.When a random forest is generated by learning the palm and fingers, the palm posture including information on global rotation is estimated based on the result, and the finger using the random forest for the learned finger posture depends on the learned palm posture. Estimate your posture

손바닥 자세의 추정과 손가락 자세의 추정에서, 전처리 단계를 제외한 모델정렬, 랜덤 포레스트 학습, 자세추정, 모델갱신 단계는 손바닥과 손가락에 대해 별도로 수행된다. 또한, 손가락에 대한 계층적 학습과정은 손바닥에 대한 학습결과에 의존적이게 된다. 추정 단계는 도 15의 학습 과정과 동일하게 처리되며 다만 랜덤 포레스트 학습 단계는 제외된다.In the estimation of palm posture and finger posture, model alignment, random forest learning, posture estimation, and model update steps except for the pre-processing step are performed separately for the palm and fingers. In addition, the hierarchical learning process for the fingers becomes dependent on the learning results for the palm. The estimation step is processed in the same way as the learning process of FIG. 15, except that the random forest learning step is excluded.

도 16을 참조하면, 깊이 영상과 손 모델을 포함하는 입력데이터의 전처리 후(단계 S11), 변환행렬을 사용하여 학습을 위해 손바닥 모델과 GT자세를 깊이영상과 함께 정렬한다(단계 S41). Referring to FIG. 16 , after preprocessing the input data including the depth image and the hand model (step S11), the palm model and the GT posture are aligned with the depth image for learning using the transformation matrix (step S41).

학습된 N번째 랜덤 포레스트를 사용하여 구형 3차원 오프셋 특징을 추출하고 손바닥 모델의 잔차를 추정한다(단계 S43). 추정된 잔차를 손바닥 모델에 반영하여 손바닥 모델을 변형한다(단계 S44).Extract the spherical three-dimensional offset feature using the learned N-th random forest and estimate the residual of the palm model (step S43). The palm model is transformed by reflecting the estimated residual to the palm model (step S44).

상기 자세추정 단계(단계 S43)는, 학습된 랜덤 포레스트를 사용하여 현재 정렬되어 있는 손 자세와 다음에 정렬되어야 하는 손 자세 사이의 잔차를 추정한다.The posture estimation step (step S43) estimates the residual between the currently aligned hand posture and the hand posture to be aligned next using the learned random forest.

구체적으로, 랜덤 포레스트를 구성하는 트리 노드들은 분기를 통해 리프노드에 도달하는 단계, 분기를 위해 노드에 할당되어 있는 3차원 오프셋 특징을 사용하여 입력 깊이영상에서 특징 값을 계산하는 단계, 노드에 할당된 임계값과의 대소비교를 통해 자식노드로의 분기과정을 반복적으로 수행하는 단계 및 분기를 통해 리프노드에 도달하면 잔차 값을 추정하는 단계를 포함할 수 있다.Specifically, the tree nodes constituting the random forest reach leaf nodes through branching, calculating feature values from the input depth image using the 3D offset feature assigned to the nodes for branching, and assigning them to the nodes. It may include repeatedly performing a branching process to a child node through comparison with a threshold value and estimating a residual value when a leaf node is reached through branching.

상기 자세추정 단계는, 랜덤 포레스트를 구성하는 모든 트리에 대해 수행하고, 모든 도달된 리프노드가 추정한 잔차 값을 평균하여 최종적으로 추정된 잔차로 사용할 수 있다.The posture estimation step may be performed on all trees constituting the random forest, and the residual values estimated by all reached leaf nodes may be averaged and used as the final estimated residual.

손가락 자세의 추정 절차는 사용자가 지정한 반복 횟수 N 만큼 반복되어 계층적으로 수행될 수 있다(단계 S45).The finger posture estimation procedure may be performed hierarchically by repeating the number of repetitions N designated by the user (step S45).

손바닥 자세의 추정이 완료되면, 추정된 손바닥 자세에 의존적으로 학습된 손가락 자세에 대해 추정한다. 손가락 자세의 추정 단계(단계 S51 내지 S55)는 손바닥 자세의 추정 과정과 동일하되, 별도로 수행된다.When estimation of the palm posture is completed, it is estimated for the learned finger posture depending on the estimated palm posture. The steps of estimating the finger posture (steps S51 to S55) are the same as the process of estimating the palm posture, but are performed separately.

추정된 손바닥 자세와 손가락 자세를 합하여 최종적인 손 자세를 추정한다.The final hand posture is estimated by adding the estimated palm posture and finger posture.

도 17은 본 발명의 계층적 추정의 반복 횟수에 대한 오차를 나타내는 그래프이다.17 is a graph showing the error with respect to the number of iterations of the hierarchical estimation of the present invention.

도 17을 참조하면, 계층적 추정의 반복에 대한 오차 감소 결과는 아래와 같다. 손 모델이 단순히 정렬된 상태에서의 평균 오차는 손바닥 24.22, 검지 23.99, 중지19.37, 약지 32.45, 소지 36.34, 엄지 34.54였다. Referring to FIG. 17 , the error reduction results for the iteration of hierarchical estimation are as follows. When the hand model was simply aligned, the average errors were 24.22 for the palm, 23.99 for the index finger, 19.37 for the middle finger, 32.45 for the ring finger, 36.34 for the small finger, and 34.54 for the thumb.

도 17의 그래프의 1단계에서 보이듯이 추정 오차는 첫 추정 시에 10mm이상의 가장 큰 폭으로 낮아졌으며 각 단계가 진행되면서 변화량이 점점 줄어든다. 세 번째 회귀 이후부터는 오차에 변화가 없거나, 큰 차이를 보이지 않으며 오히려 오차가 높아지는 현상도 발생하였다. 따라서 계층적 추정을 위한 반복횟수는 손바닥 자세에 대해 3회, 손가락 자세에 대해 3회로 설정할 수 있다.As shown in step 1 of the graph of FIG. 17 , the estimation error was reduced to the largest width of 10 mm or more during the first estimation, and the amount of change gradually decreased as each step progressed. After the third regression, there was no change in the error or there was no significant difference, but rather the error increased. Therefore, the number of iterations for the hierarchical estimation can be set to 3 times for the palm posture and 3 times for the finger posture.

도 18은 본 발명과 종래기술의 평균 관절 오차에 대한 정량적 평가 결과를 보여주는 그래프이다.18 is a graph showing the results of quantitative evaluation of the average joint error of the present invention and the prior art.

도 18을 참조하면, 그래프의 가로축은 각 손가락 관절을 나타낸다. [37]이라고 표시된 종래기술은 선행기술문헌의 비특허문헌 3이다. 엄지, 검지, 중지, 약지, 소지의 순서로 1, 2, 3, 4, 5 가 사용되며, MCP관절은 M, PIP관절은 P, DIP관절은 D로 표기되고, 손목 관절은 Palm으로 표기된다. 그리고, AVERAGE는 전체 평균을 의미한다. 그래프의 세로축은 오차를 나타내며 단위는 mm이다. Referring to FIG. 18 , the horizontal axis of the graph represents each finger joint. The prior art indicated by [37] is Non-Patent Document 3 of the prior art document. 1, 2, 3, 4, 5 are used in the order of thumb, index finger, middle finger, ring finger, and little finger. MCP joint is marked as M, PIP joint is marked as P, DIP joint is marked as D, and wrist joint is marked as Palm. . And, AVERAGE means the overall average. The vertical axis of the graph indicates the error and the unit is mm.

평균 관절 오차에 대한 실험결과는 본 발명에서 제안된 계층적인 손 자세 추정방법이 [37]에서 제안된 방법보다 전체적으로 뛰어남을 보인다. [37]은 모든 관절에 대해 평균 11.4mm 의 오차를 포함하며, 제안된 방법은 9.72mm의 오차를 포함한다.Experimental results on the mean joint error show that the hierarchical hand posture estimation method proposed in the present invention is superior to the method proposed in [37] overall. [37] contains an average error of 11.4 mm for all joints, and the proposed method contains an error of 9.72 mm.

따라서, 본 발명을 종래 기술들과 정성적, 정량적으로 평가한 실험결과는 본 발명에서 제안된 방법이 평균오차 10mm, 속도 28FPS로 손 자세를 정확하고 빠르게 추정하는 것을 보여준다.Therefore, the experimental results of qualitative and quantitative evaluation of the present invention with the prior art show that the method proposed in the present invention accurately and quickly estimates the hand posture with an average error of 10 mm and a speed of 28 FPS.

이와 같은, 랜덤 포레스트를 사용한 손 자세의 계층적 추정 방법은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. Such a hierarchical method for estimating hand posture using a random forest may be implemented as an application or implemented in the form of program instructions that may be executed through various computer components and recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. The program instructions recorded on the computer readable recording medium are specially designed and configured for the present invention, and may be known and used by those skilled in the art of computer software.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. Examples of the computer-readable recording medium include hard disks, magnetic media such as floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks. media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules for carrying out the processing according to the present invention, and vice versa.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to the embodiments, those skilled in the art can variously modify and change the present invention within the scope without departing from the spirit and scope of the present invention described in the claims below You will understand.

본 발명은 키넥트v2와 소프트키넥 장비의 변경에 따른 영향을 받지 않고 우수한 성능을 보였기에 다양한 장비에서의 사용에 대한 가능성도 확인되었다. 또한, 추정된 손 자세정보를 사용하여 특정한 손 자세를 인식하는 방법에 대한 연구도 수행될 수 있다. Since the present invention showed excellent performance without being affected by changes in Kinect v2 and Soft Kinect equipment, the possibility of use in various equipment was also confirmed. In addition, a study on a method of recognizing a specific hand posture using the estimated hand posture information may also be performed.

이에 따라, 인간과 컴퓨터의 상호작용에서 손을 사용하는 인터페이스는 수화인식, 게임, 가상현실에서의 객체조작, 원격 수술 등의 다양한 분야에서 활용될 수 있다. Accordingly, the interface using the hand in human-computer interaction can be utilized in various fields such as sign language recognition, games, object manipulation in virtual reality, and remote surgery.

10: 랜덤 포레스트를 사용한 손 자세의 계층적 추정 장치
100: 전처리부
200: 손바닥 자세 학습부
300: 손가락 자세 학습부
400: 손바닥 자세 추정부
500: 손가락 자세 추정부
700: 손자세 표현부10: Hierarchical estimation of hand posture using random forest
100: preprocessor
200: palm posture learning unit
300: finger posture learning unit
400: palm posture estimation unit
500: finger posture estimation unit
700: hand posture expression unit

Claims

pre-processing by converting a depth image expressed in a two-dimensional image coordinate system into a three-dimensional camera coordinate system;
generating a random forest for the learned palm posture using a hand model, a depth image, and a GT posture as input data;
generating a random forest for the learned finger posture depending on the learned palm posture;
estimating a palm posture including information on global rotation using a random forest for the learned palm posture;
estimating a finger posture using a random forest for the learned finger posture dependent on the learned palm posture; and
A hierarchical method of estimating a hand posture using a random forest, including; estimating a final hand posture by summing the estimated palm posture and finger posture.

According to claim 1,
generating a random forest for the learned palm posture; generating a random forest for the finger posture; estimating the palm posture; and the step of estimating the finger posture is performed hierarchically by repeating each N (here, N is a natural number).

The method of claim 2, wherein generating a random forest for the learned palm posture and generating a random forest for the finger posture comprises:
A model alignment step of aligning the hand model and the GT posture with the depth image for learning using the transformation matrix;
A random forest learning step of extracting a spherical three-dimensional offset feature and learning and outputting a random forest in the form of selecting a feature that minimizes the variance of the residual of the aligned hand model and GT posture;
a posture estimation step of extracting a spherical three-dimensional offset feature using the learned N-th random forest and estimating the residual of the hand model; and
A hierarchical method for estimating hand postures using random forests, including; a model update step of transforming the hand model by reflecting the estimated residuals in the hand model.

The method of claim 3, wherein the updating of the model by reflecting the estimated residual to the hand model to transform the hand model,
Updating the transformation matrix for the N+1th learning; hierarchical estimation method of hand posture using a random forest, further comprising a.

The method of claim 2, wherein estimating the palm posture and estimating the finger posture comprises:
A model alignment step of aligning the hand model and the depth image together for learning using a transformation matrix;
a posture estimation step of extracting a spherical three-dimensional offset feature using the learned N-th random forest and estimating the residual of the hand model; and
A hierarchical method for estimating hand postures using random forests, including; a model update step of transforming the hand model by reflecting the estimated residuals in the hand model.

According to claim 5, wherein the posture estimation step,
A hierarchical estimating method of hand posture using a random forest that estimates the residual between the currently aligned hand posture and the hand posture that should be aligned next using the learned random forest.

According to claim 5, wherein the posture estimation step,
tree nodes constituting the random forest reach leaf nodes through branching;
calculating a feature value from an input depth image using a three-dimensional offset feature assigned to a node for branching;
repeatedly performing a branching process to a child node through comparison with a threshold value assigned to the node; and
A hierarchical method of estimating hand posture using a random forest, including; estimating a residual value when a leaf node is reached through branching.

According to claim 7, wherein the posture estimation step,
performing on all trees constituting the random forest; and
A hierarchical method for estimating hand posture using a random forest, further comprising; averaging the residual values estimated by all reached leaf nodes and using it as a final estimated residual.

According to claim 1,
The transformation matrix aligns the hand model with the point cloud, aligns the spherical three-dimensional offset feature with the camera coordinate system, and aligns the residual estimated in the posture estimation step with the camera coordinate system to transform the hand model in the model update step. Hierarchical estimation of hand posture using forest.

A computer-readable storage medium having recorded thereon a computer program for performing the hierarchical method for estimating hand posture using the random forest according to claim 1 .

a preprocessor for converting a depth image expressed in a two-dimensional image coordinate system into a three-dimensional camera coordinate system;
a palm posture learning unit that outputs a random forest for palm postures learned using a hand model, a depth image, and a GT posture as input data;
a finger posture learning unit for outputting a random forest for the learned finger posture depending on the learned palm posture;
a palm posture estimator for estimating a palm posture including information on global rotation using a random forest for the learned palm posture;
a finger posture estimator for estimating a finger posture using a random forest for the learned finger posture dependent on the learned palm posture; and
A hierarchical estimating device for hand posture using a random forest, including; a hand posture expression unit for estimating a final hand posture by adding the estimated palm posture and finger posture.

12. The method of claim 11,
the palm posture learning unit; the finger posture learning unit; the palm posture estimation unit; and the learning and estimation of the finger posture estimator is hierarchically performed by repeating each N (here, N is a natural number).

The method of claim 12, wherein the palm posture learning unit and the finger posture learning unit, respectively,
a model alignment unit that aligns the hand model and the GT posture with the depth image for learning using the transformation matrix;
a random forest learning unit that extracts a spherical three-dimensional offset feature and selects a feature that minimizes the variance of the aligned hand model and GT posture residual by learning and outputting a random forest;
a posture estimator that extracts a spherical three-dimensional offset feature using the learned N-th random forest and estimates the residual of the hand model; and
A hierarchical apparatus for estimating hand postures using random forests, including; a model updater that transforms the hand model by reflecting the estimated residual to the hand model, and updates the transformation matrix for the N+1th learning.

The method of claim 12, wherein the palm posture estimating unit and the finger posture estimating unit,
a model alignment unit that aligns the hand model and the depth image together for learning using a transformation matrix;
a posture estimator that extracts a spherical three-dimensional offset feature using the learned N-th random forest and estimates the residual of the hand model; and
A hierarchical apparatus for estimating hand posture using a random forest, including; a model update unit that transforms the hand model by reflecting the estimated residual to the hand model.

15. The method of claim 14, wherein the posture estimator,
The tree nodes constituting the random forest reach the leaf node through branching, calculate the feature value from the input depth image using the 3D offset feature assigned to the node for branching, and match the threshold value assigned to the node. By repeatedly performing the branching process to child nodes through comparison, the process of estimating the residual value when reaching a leaf node through branching is performed for all trees constituting the random forest, so that all reached leaf nodes are A hierarchical device for estimating hand postures using random forests, which averages the estimated residual values and uses them as the final estimated residuals.