KR101741671B1

KR101741671B1 - Play Presentation Device and Method

Info

Publication number: KR101741671B1
Application number: KR1020160024402A
Authority: KR
Inventors: 임정모; 조현수; 이준범; 김동철
Original assignee: 중앙대학교 산학협력단
Priority date: 2016-02-29
Filing date: 2016-02-29
Publication date: 2017-06-16

Abstract

본 발명은 동작 인식을 통한 슬라이드 쇼 조작으로 발표를 혼자서도 좀 더 효율적으로 할 수 있도록 한 특수카메라와 업샘플링(Up-sampling) 기술을 3D 동작기반의 프레젠테이션 장치 및 방법에 관한 것이다. 이를 위하여 본 발명은 프레젠테이션 하는 사람의 손동작을 일정 시간 동안 하나 또는 둘 이상의 이미지로 촬영하는 이미지 수집, 촬영된 이미지를 캘리브레이션(Calibration)을 통해 보정하는 이미지 보정, 보정된 이미지에서 손 영역에 해당하는 이미지를 추출하는 대상 추출, 추출된 손 영역 이미지의 해상도를 판별하는 해상도 판별, 판별된 하나 또는 둘 이상의 손 영역 이미지를 조합하여 명령 행동을 생성하는 명령 행동 생성, 생성된 명령 행동을 명령 신호로 변환하는 명령 신호 생성, 명령 신호에 따라 해당 명령을 수행하는 명령 수행을 제공한다. 이와 같이 본 발명은 더욱 향상된 동작 인식을 제공할 수 있으며, 이러한 우수한 동작 인식은 프레젠테이션 장치에 한정되지 않고 다른 모든 사물을 아무런 도구 없이 사람의 동작으로만 조작할 수 있는 편리한 도구로써 현재 각광받고 있는 IoT에 적용 가능한 유용한 기술로 자리 잡을 수 있다.The present invention relates to a 3D camera-based presentation device and method for performing a slide show operation through motion recognition, and a special camera and an up-sampling technique for performing presentation alone more efficiently. In order to achieve the above object, the present invention provides an image processing method, including image collection for photographing a hand gesture of a person in a presentation for one or two or more images for a predetermined period of time, image correction for correcting the photographed image through calibration, Extracting a target to be extracted, resolving the resolution of the extracted hand region image, generating a command behavior to combine the discriminated one or two hand region images, and converting the generated command action into a command signal Generating a command signal, and performing an instruction to execute the command in accordance with the command signal. As described above, according to the present invention, it is possible to provide more improved motion recognition. Such a superior motion recognition is not limited to the presentation device but can be operated as a convenient tool that can manipulate all other objects only by human actions without any tools. Which can be applied as a useful technique.

Description

[0001] The present invention relates to a 3D presentation-based presentation apparatus and method,

본 발명은 3D 동작기반의 프레젠테이션 장치 및 방법에 관한 것으로서, 특히 프레젠테이션을 하는 사람의 동작 인식을 통한 슬라이드 쇼 제어로 혼자서도 효율적으로 발표할 수 있는 장치 및 방법에 관한 것이다. 특히, 3D 동작기반의 프레젠테이션 장치 및 방법으로서, 프레젠테이션을 하는 사람의 더욱 향상된 동작 인식을 제공한다.BACKGROUND OF THE INVENTION 1. Field of the Invention [0002] The present invention relates to a 3D motion-based presentation apparatus and method, and more particularly, to an apparatus and method for efficiently and independently presenting a slide show control through motion recognition of a person performing a presentation. In particular, it is a 3D motion-based presentation device and method that provides improved perception of motion of a person performing a presentation.

일반적으로 기존의 프레젠테이션 툴은 컴퓨터 마우스나 키보드로 슬라이드 쇼를 조작하거나, 스마트 포인터를 이용한다. 이 때 키보드의 경우 정확도는 높지만 보조인원이 필요하다는 단점이 있고, 스마트 포인터의 경우 가리키는 위치를 정확하게 표시할 수 있지만 동작시킬 수 있는 명령체계가 한정되어있다는 단점이 있다. 또한, 색상카메라를 이용하여 사람의 특정 신체 부위와 모션을 검출하고 판단하여 프레젠테이션에 사용하는 연구도 진행되어 왔다. 그러나 색상 카메라를 통하여 정확하고 실시간으로 관심객체를 추출하고 다양한 프레젠테이션 제어를 제공하기에는 한계가 있다.Typically, existing presentation tools manipulate slideshows with a computer mouse or keyboard, or use smart pointers. In this case, the accuracy of the keyboard is high, but there is a disadvantage that the assistant is needed. In the case of the smart pointer, the pointing position can be accurately displayed, but the command system that can be operated is limited. In addition, studies have been conducted to detect and determine a specific body part and motion of a person using a color camera and to use it in a presentation. However, there is a limitation in extracting objects of interest and providing various presentation controls in an accurate and real time through a color camera.

그 밖에 최근 몇 년간 깊이 감지 기술의 발달로 실시간 깊이 카메라를 이용한 포즈 추적 및 제스처 인식을 이용한 컴퓨터 제어에 관한 연구가 많이 이루어지고 있다. 특히, 2010년 말에 출시된 Microsoft 사의 Kinect와 Asus의 Xtion계열이 널리 보급되어 깊이 카메라를 이용한 프레젠테이션 제어 소프트웨어의 연구가 많이 이루어지고 있다.In recent years, research on computer control using pose tracking and gesture recognition using real - time depth camera has been conducted with the development of depth sensing technology. In particular, the Xtion series of Microsoft's Kinect and Asus launched at the end of 2010 has been widely used, and a lot of research on presentation control software using depth camera has been done.

깊이 카메라의 방식은 두 종류로 TOF (Time of Flight) 방식과 스테레오 정합방식이 있다. 이 방식들 중에서 조사파와 반사파의 위상 차이를 이용하는 스테레오 정합방식이 가장 많이 사용된다. Kinect와 Xtion계열은 빛을 조사하는 발신부와 반사파를 감지하는 수신부가 일정 간격 분리되어 있다. 따라서 특정한 패턴을 갖는 적외선 구조광을 조사하고, 물체에 반사되어 돌아오는 파를 조사한 파와 스테레오 정합하여 깊이 정보를 추출한다. 하지만 어느 파를 기준으로 하건, 물체의 좌측 또는 우측 면에 오차가 발생할 수 밖에 없는 문제점이 있었다. There are two types of depth cameras: Time of Flight (TOF) and stereo matching. Among these methods, the stereo matching method using the phase difference between the irradiation wave and the reflected wave is most used. In the Kinect and Xtion series, there is a separation between the transmitter that emits light and the receiver that senses the reflected wave. Accordingly, the infrared light having a specific pattern is irradiated, and the depth information is extracted by stereo matching with the wave reflected from the object. However, there is a problem that an error occurs on the left or right side of the object regardless of the wave.

한국등록특허공보 제1407249호(2014.06.09)Korean Patent Registration No. 1407249 (2014.06.09)

이에 본 발명은 상기와 같은 종래의 제반 문제점을 해소하기 위해 제안된 것으로, 본 발명의 목적은 동작 인식을 통한 슬라이드 쇼 조작으로 혼자서도 좀 더 발표를 효율적으로 할 수 있도록 하는 장치 및 방법을 제시하는 데 있다.Accordingly, the present invention has been made to solve the above problems of the related art, and it is an object of the present invention to provide an apparatus and a method for efficiently performing announcement by a slide show manipulation through motion recognition alone have.

상기와 같은 목적을 달성하기 위하여 본 발명의 기술적 사상에 의한 3D 동작기반의 프레젠테이션 장치는, 이미지 수집부와 이미지 보정부와 대상 추출부와 해상도 판별부와 명령 행동 생성부와 명령 신호 생성부와 명령 수행부를 포함하는 것을 그 기술적 구성상의 특징으로 한다. 이미지 수집부는 프레젠테이션 하는 사람의 손동작을 일정 시간 동안 하나 또는 둘 이상의 이미지로 촬영하고, 이미지 보정부는 촬영된 이미지를 캘리브레이션(Calibration)을 통해 보정하고, 대상 추출부는 보정된 이미지에서 손 영역에 해당하는 이미지를 추출하고, 해상도 판별부는 추출된 손 영역 이미지의 해상도를 판별하고, 명령 행동 생성부는 판별된 하나 또는 둘 이상의 손 영역 이미지를 조합하여 명령 행동을 생성하고, 명령 신호 생성부는 생성된 명령 행동을 명령 신호로 변환하고, 명령 수행부는 명령 신호에 따라 해당 명령을 수행한다.According to an aspect of the present invention, there is provided a 3D motion-based presentation apparatus including an image collection unit, an image correction unit, a target extraction unit, a resolution determination unit, a command behavior generation unit, a command signal generation unit, And an execution unit. The image capturing unit captures one or two or more images of a person's hand gesture for a predetermined period of time, and the image correcting unit corrects the photographed image through calibration, and the target extracting unit extracts, from the corrected image, The resolution discrimination unit discriminates the resolution of the extracted hand region image, and the command behavior generating unit combines the discriminated one or two hand region images to generate a command action, and the command signal generating unit generates the command action, And the instruction execution unit executes the instruction according to the instruction signal.

본 발명에 있어서, 이미지 보정부는 촬영된 손동작 이미지를 보정하기 위해 오픈씨브이(OpenCV) 데이터로 변환한다.In the present invention, the image correcting unit converts the captured hand motion image into OpenCV data for correcting the captured hand motion image.

대상 추출부는 손 영역 이미지 추출을 위해 수학식 1로 정의되는 이진화 조건식을 이용하여

로부터 일정 범위의 depth 값을 가지는

만을 검출한다. The target extraction unit uses the binarization conditional expression defined in Equation (1) to extract the hand region image

Having a certain range of depth values

.

[수학식 1][Equation 1]

(

는 현재 탐색하는 이미지 픽셀의 depth값,

는 손 중심점 픽셀의 depth 값)(

The depth value of the currently scanned image pixel,

Is the depth value of the hand center pixel)

손 영역의 중심점 검출은 수학식 2로 정의되는 거리 변환 알고리즘을 이용해서 L로 각 픽셀 값을 설정하여 L_max 값을 찾은 후, L을 L_max로 나누어서 손 중심점과 손목 검출 원, 손목 검출점을 추출한다.To detect the center point of the hand region, the L_max value is set by setting each pixel value to L using the distance conversion algorithm defined by Equation 2, and then the hand center point, wrist detection point, and wrist detection point are extracted by dividing L by L_max .

[수학식 2]&Quot; (2) "

L/L_maxL / L_max

(L은 손 중심점으로부터 가장 가까운 외곽선과의 거리, L_max는 모든 픽셀의 L 중에서 가장 큰 값) (L is the distance from the hand center point to the nearest outline, L_max is the largest value of L for all pixels)

해상도 판별부는 손 영역 이미지가 세부적인 요소를 인식할 수 있는 1920x1080 이상의 고해상도일 경우, 손 영역의 세부적인 요소 인식을 수행한다. 이 때, 손 영역 이미지의 세부적인 요소 인식은 손 영역의 최외각 점들을 직선으로 연결하여 모든 점을 포함하는 콘벡스 홀(Convex Hull)과, 콘벡스 홀(Convex Hull)로부터 두 점 사이의 직선에서 가장 외곽선과 거리가 먼 지점인 콘벡시티 디펙트(Convexity Defect)를 이용한다. The resolution discriminator performs detailed element recognition of the hand region when the hand region image has a high resolution of 1920x1080 or more capable of recognizing detailed elements. At this time, the detailed element recognition of the hand region image is performed by connecting a straight line between the outermost points of the hand region and a straight line between the two points from the Convex Hull and the Convex Hull, The Convexity Defect, which is the farthest from the outline, is used.

해상도 판별부는 손 영역 이미지가 세부적인 요소를 인식할 수 없는 1920x1080 미만의 저해상도일 경우, 1920x1080 이상의 고해상도 손 영역 이미지를 생성한다. 이 때, 손 영역 이미지의 저해상도로 인해 고해상도 생성에 있어서 오차가 발생하는 경우에는 수학식 3으로 정의되는 와이씨알씨비(YCrCb) 영상 변환을 더 추가하여 동양인의 피부색을 가장 잘 나타내는 컬러 값으로 기 설정한다. The resolution determining unit generates a high resolution hand region image of 1920x1080 or more when the hand region image is a low resolution of less than 1920x1080 in which detailed elements can not be recognized. In this case, when an error occurs in generation of a high resolution due to a low resolution of the hand region image, a YCrCb image transformation defined by Equation (3) is further added to set a color value that best represents the Asian skin color do.

[수학식 3]&Quot; (3) "

&&

(

는 휘도와 적색 성분,

는 휘도와 청색 성분, minCr은 130, maxCr은 160, minCb는 110, maxCb는 140)(

The luminance and the red component,

The luminance and blue components, minCr is 130, maxCr is 160, minCb is 110, and maxCb is 140)

상기와 같은 목적을 달성하기 위하여 본 발명의 기술적 사상에 의한 3D 동작기반의 프레젠테이션 방법은, 이미지 수집 단계와 이미지 보정 단계와 대상 추출 단계와 해상도 판별 단계와 명령 행동 생성 단계와 명령 신호 생성 단계와 명령 수행 단계를 포함하는 것을 그 기술적 구성상의 특징으로 한다. 이미지 수집 단계는 프레젠테이션 하는 사람의 손동작을 일정 시간 동안 하나 또는 둘 이상의 이미지로 촬영하고, 이미지 보정 단계는 촬영된 이미지를 캘리브레이션(Calibration)을 통해 보정하고, 대상 추출 단계는 보정된 이미지에서 손 영역에 해당하는 이미지를 추출하고, 해상도 판별 단계는 추출된 손 영역 이미지의 해상도를 판별하고, 명령 행동 생성 단계는 판별된 하나 또는 둘 이상의 손 영역 이미지를 조합하여 명령 행동을 생성하고, 명령 신호 생성 단계는 생성된 명령 행동을 명령 신호로 변환하고, 명령 수행 단계는 명령 신호에 따라 해당 명령을 수행한다.In order to achieve the above object, a 3D motion-based presentation method according to the technical idea of the present invention includes an image collection step, an image correction step, a target extraction step, a resolution determination step, a command behavior generation step, The present invention is characterized in its technical construction. The image capturing step captures the hand gesture of the person presenting the presentation in one or two or more images for a predetermined period of time. In the image correction step, the photographed image is calibrated through calibration, Extracting a corresponding image, determining a resolution of the extracted hand area image, and generating a command behavior by combining one or two or more discriminated hand area images, and the command signal generating step Converts the generated command behavior into a command signal, and the command execution step executes the command according to the command signal.

본 발명에 있어서, 이미지 보정 단계는 촬영된 손동작 이미지를 보정하기 위해 오픈씨브이(OpenCV) 데이터로 변환한다.In the present invention, the image correcting step converts the captured hand motion image into OpenCV data for correcting the captured hand motion image.

대상 추출 단계는 손 영역 이미지 추출을 위해 수학식 1로 정의되는 이진화 조건식을 이용하여

로부터 일정 범위의 depth 값을 가지는

만을 검출한다. The target extraction step uses the binarization conditional expression defined in Equation (1) to extract the hand region image

Having a certain range of depth values

.

[수학식 1][Equation 1]

(

는 현재 탐색하는 이미지 픽셀의 depth값,

는 손 중심점 픽셀의 depth 값)(

The depth value of the currently scanned image pixel,

Is the depth value of the hand center pixel)

[수학식 2]&Quot; (2) "

L/L_maxL / L_max

해상도 판별 단계는 손 영역 이미지가 세부적인 요소를 인식할 수 있는 1920x1080 이상의 고해상도일 경우, 손 영역의 세부적인 요소 인식을 수행한다. 이 때, 손 영역 이미지의 세부적인 요소 인식은 손 영역의 최외각 점들을 직선으로 연결하여 모든 점을 포함하는 콘벡스 홀(Convex Hull)과, 콘벡스 홀(Convex Hull)로부터 두 점 사이의 직선에서 가장 외곽선과 거리가 먼 지점인 콘벡시티 디펙트(Convexity Defect)를 이용한다. The resolution determination step performs detailed element recognition of the hand region when the hand region image has a high resolution of 1920x1080 or more, which allows detailed elements to be recognized. At this time, the detailed element recognition of the hand region image is performed by connecting a straight line between the outermost points of the hand region and a straight line between the two points from the Convex Hull and the Convex Hull, The Convexity Defect, which is the farthest from the outline, is used.

해상도 판별 단계는 손 영역 이미지가 세부적인 요소를 인식할 수 없는 1920x1080 미만의 저해상도일 경우, 1920x1080 이상의 고해상도 손 영역 이미지를 생성한다. 이 때, 손 영역 이미지의 저해상도로 인해 고해상도 생성에 있어서 오차가 발생하는 경우에는 수학식 3으로 정의되는 와이씨알씨비(YCrCb) 영상 변환을 더 추가하여 동양인의 피부색을 가장 잘 나타내는 컬러 값으로 기 설정한다. The resolution determination step generates a high resolution hand region image of 1920x1080 or more when the hand region image is a low resolution of less than 1920x1080 in which detailed elements can not be recognized. In this case, when an error occurs in generation of a high resolution due to a low resolution of the hand region image, a YCrCb image transformation defined by Equation (3) is further added to set a color value that best represents the Asian skin color do.

[수학식 3]&Quot; (3) "

&&

(

는 휘도와 적색 성분,

The luminance and the red component,

본 발명에 의한 3D 동작기반의 프레젠테이션 장치 및 방법에 따르면 다음과 같은 효과가 있게 된다.According to the 3D motion-based presentation apparatus and method according to the present invention, the following effects can be obtained.

첫째, 동작 인식을 이용한 프레젠테이션 툴은 더욱 향상된 동작 인식을 제공할 수 있다.First, a presentation tool using motion recognition can provide improved motion recognition.

둘째, 우수한 동작 인식은 프레젠테이션 장치에 한정되지 않고, 다른 모든 사물을 아무런 도구 없이 사람의 동작으로만 조작할 수 있는 편리한 도구로써 현재 각광받고 있는 IoT에 적용 가능한 유용한 기술로 자리 잡을 수 있다.Second, excellent motion recognition is not limited to presentation devices, but can be a useful technique applicable to IoT, which is currently being used as a convenient tool to manipulate all other objects only by human actions without any tools.

도 1은 본 발명의 일실시예에 따른 3D 동작기반의 프레젠테이션 장치의 구성도
도 2는 본 발명에 일실시예에 따른 3D 동작기반의 프레젠테이션 방법의 순서도
도 3은 본 발명의 일실시예에 따른 캘리브레이션(Calibration) 하기 전의 색상 이미지와 깊이 이미지의 예시도
도 4는 본 발명의 일실시예에 따른 이진화된 전처리 이미지의 예시도
도 5는 본 발명의 일실시예에 따른 손목 중심점 및 손 중심점 검출 예시도
도 6은 본 발명의 일실시예에 따른 손목 제거 방법 예시도
도 7은 본 발명의 일실시예에 따른 손 영역 층 분리 예시도
도 8은 본 발명의 일실시예에 따른 업샘플링(Up-sampling) 결과물 예시도(좌: 업샘플링(Up-sampling) 깊이 이미지, 우: RGB 이미지)
도 9는 본 발명의 일실시예에 따른 콘벡스 홀(Convex Hull) 및 콘벡시티 디펙트(Convexity Defect) 예시도
도 10은 본 발명의 일실시예에 따른 와이씨알씨비(YCrCb) 채널을 이용한 피부 검출 결과 예시도1 is a block diagram of a presentation device based on 3D operation according to an embodiment of the present invention;
2 is a flowchart of a 3D motion-based presentation method according to an embodiment of the present invention
Figure 3 is an illustration of a color image and a depth image before calibrating according to an embodiment of the present invention.
Figure 4 is an exemplary diagram of a binned preprocessed image according to one embodiment of the present invention.
5 is a diagram illustrating an example of a center-of-gravity point and a center-of-hand point detection according to an embodiment of the present invention
6 is an illustration of a wrist removal method according to an embodiment of the present invention.
Figure 7 is an illustration of hand region layer separation in accordance with one embodiment of the present invention.
8 is an illustration of an up-sampling result (left: up-sampling depth image, right: RGB image) according to an embodiment of the present invention;
FIG. 9 is an illustration of a Convex Hull and a Convexity Defect according to an embodiment of the present invention. FIG.
FIG. 10 is a diagram illustrating an example of skin detection result using a YCrCb channel according to an embodiment of the present invention. FIG.

첨부한 도면을 참조하여 본 발명의 실시예들에 의한 3D 동작기반의 프레젠테이션 장치 및 방법에 대하여 상세히 설명한다. 본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는바, 특정 실시예들을 도면에 예시하고 본문에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 개시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. 첨부된 도면에 있어서, 구조물들의 치수는 본 발명의 명확성을 기하기 위하여 실제보다 확대하거나, 개략적인 구성을 이해하기 위하여 실제보다 축소하여 도시한 것이다.A 3D motion-based presentation apparatus and method according to embodiments of the present invention will be described in detail with reference to the accompanying drawings. The present invention is capable of various modifications and various forms, and specific embodiments are illustrated in the drawings and described in detail in the text. It is to be understood, however, that the invention is not intended to be limited to the particular forms disclosed, but on the contrary, is intended to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing. In the accompanying drawings, the dimensions of the structures are enlarged to illustrate the present invention, and are actually shown in a smaller scale than the actual dimensions in order to understand the schematic structure.

또한, 제1 및 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 한편, 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Also, the terms first and second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. On the other hand, unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

도 1 내지 도 2을 참조하면, 본 발명의 일실시예에 따른 3D 동작기반의 프레젠테이션 장치는 이미지 수집부(100), 이미지 보정부(200), 대상 추출부(300), 해상도 판별부(400), 명령 행동 생성부(500), 명령 신호 생성부(600), 명령 수행부(700)를 포함하여 구성된다.1 and 2, a 3D motion-based presentation apparatus according to an exemplary embodiment of the present invention includes an image collection unit 100, an image correction unit 200, a target extraction unit 300, a resolution determination unit 400 A command behavior generating unit 500, a command signal generating unit 600, and an instruction performing unit 700. [

이미지 수집부(100)는 프레젠테이션 하는 사람의 손동작을 일정 시간 동안 하나 또는 둘 이상의 이미지로 촬영한다.The image capturing unit 100 captures a hand gesture of a person who performs a presentation in one or two or more images for a predetermined period of time.

이미지 보정부(200)는 촬영된 손동작 이미지를 캘리브레이션(Calibration)을 통해 보정한다. 이 때, 이미지 보정부(200)의 촬영된 손동작 이미지를 보정하기 위해 오픈씨브이(OpenCV) 데이터로 변환한다.The image correcting unit 200 corrects the photographed image of the hand gesture through calibration. At this time, the image data is converted into OpenCV data for correcting the photographed hand motion image of the image correcting unit 200.

대상 추출부(300)는 보정된 이미지에서 손 영역에 해당하는 이미지를 추출한다. 이 때, 대상 추출부(300)는 손동작 이미지로부터 손 영역 이미지만 추출하기 위해 수학식 1로 정의되는 이진화 조건식을 이용하여

로부터 일정 범위의 depth 값을 가지는

만을 검출한다.The target extracting unit 300 extracts an image corresponding to the hand region from the corrected image. At this time, the object extracting unit 300 extracts only the hand region image from the hand motion image using the binarization conditional expression defined by Equation (1)

Having a certain range of depth values

.

[수학식 1][Equation 1]

(

는 현재 탐색하는 이미지 픽셀의 depth값,

는 손 중심점 픽셀의 depth값)(

The depth value of the currently scanned image pixel,

Is the depth value of the hand center pixel)

손 영역의 중심점 검출은 수학식 2로 정의되는 거리 변환 알고리즘을 이용하여 L로 각 픽셀 값을 설정한 후에 L_max 값을 찾아서, L을 L_max로 나누어준다. The center point of the hand region is determined by finding the L_max value after setting each pixel value to L using the distance transformation algorithm defined by Equation (2), and dividing L by L_max.

[수학식 2]&Quot; (2) "

L/L_maxL / L_max

(L은 손 중심점으로부터 가장 가까운 외곽선과의 거리, L-max는 모든 픽셀의 L 중에서 가장 큰 값) (L is the distance from the hand center point to the nearest outline, and L-max is the largest value among L of all pixels)

해상도 판별부(400)는 추출된 손 영역 이미지의 해상도를 판별한다. The resolution determination unit 400 determines the resolution of the extracted hand region image.

해상도 판별부(400)는 손 영역 이미지가 세부적인 요소를 인식할 수 있는 1920x1080 이상의 고해상도일 경우, 손 영역의 세부적인 요소 인식을 수행한다. 이 때, 손 영역 이미지의 세부적인 요소 인식은 손 영역의 최외각 점들을 직선으로 연결하여 모든 점을 포함하는 콘벡스 홀(Convex Hull)과, 콘벡스 홀(Convex Hull)로부터 두 점 사이의 직선에서 가장 외곽선과 거리가 먼 지점인 콘벡시티 디펙트(Convexity Defect)를 이용한다. The resolution determination unit 400 performs detailed element recognition of the hand region when the hand region image has a high resolution of 1920x1080 or more that allows detailed elements to be recognized. At this time, the detailed element recognition of the hand region image is performed by connecting a straight line between the outermost points of the hand region and a straight line between the two points from the Convex Hull and the Convex Hull, The Convexity Defect, which is the farthest from the outline, is used.

해상도 판별부(400)는 손 영역 이미지가 세부적인 요소를 인식할 수 없는 1920x1080 미만의 저해상도일 경우, 1920x1080 고해상도 손 영역 이미지를 생성한다. 이 때, 손 영역 이미지의 저해상도로 인해 고해상도 생성에 있어서 오차가 발생하는 경우에는 수학식 3으로 정의되는 와이씨알씨비(YCrCb) 영상 변환을 더 추가하여 동양인의 피부색을 가장 잘 나타내는 컬러 값으로 기 설정한다.The resolution determination unit 400 generates a 1920x1080 high resolution hand region image when the hand region image is a low resolution of less than 1920x1080 in which detailed elements can not be recognized. In this case, when an error occurs in generation of a high resolution due to a low resolution of the hand region image, a YCrCb image transformation defined by Equation (3) is further added to set a color value that best represents the Asian skin color do.

[수학식 3]&Quot; (3) "

&&

(

는 휘도와 적색 성분,

The luminance and the red component,

명령 행동 생성부(500)는 판별된 하나 또는 둘 이상의 손 영역 이미지를 조합하여 명령 행동을 생성한다.The command behavior generating unit 500 combines the discriminated one or two hand region images to generate command actions.

예를 들면, 손가락을 5개를 다 핀 손동작은 프레젠테이션 발표를 시작하기 위해서 프레젠테이션 장치 화면을 키는 명령이고, 손가락을 1개 핀 손동작은 프레젠테이션 슬라이드를 넘기는 명령이고, 손가락을 2개 핀 손동작은 프레젠테이션 슬라이드를 뒤로 넘기는 명령이고, 손가락을 모두 펼치지 않은 주먹 상태의 손동작은 프레젠테이션 발표를 종료하기 위해 프레젠테이션 장치 화면을 끄는 명령이라고 할 수 있다. For example, a five-finger pinch gesture is a command to open a presentation device screen to start a presentation, a finger pin manipulation is a command to flip through a presentation slide, a two finger pinch is a presentation The hand movements of the punch state in which the slide is moved backward and the fingers are not extended are commands to turn off the presentation device screen to end presentation of the presentation.

명령 신호 생성부(600)는 생성된 명령 행동을 명령 신호로 변환한다.The command signal generation unit 600 converts the generated command action into a command signal.

명령 수행부(700)는 명령 신호에 따라 해당 명령을 수행한다.The instruction execution unit 700 executes the instruction according to the instruction signal.

도 3을 참조하면, 본 발명의 일실시예에 따른 예시도로서, 색상 카메라와 깊이 카메라가 하나의 기기에 통합되어 있는 형태인 키넥트(Kinect)를 사용하였다. 두 카메라는 물리적으로 다른 위치에 존재하고 있기 때문에, 두 카메라에서 찍은 이미지(201)는 왜곡되어 있다. 이러한 왜곡을 보정하기 위해서 캘리브레이션(Calibration)을 수행하였고 보정된 이미지(202)가 나타남을 알 수 있다.Referring to FIG. 3, Kinect, which is a type in which a color camera and a depth camera are integrated into one device, is used as an example according to an embodiment of the present invention. Since both cameras are physically located at different positions, the image 201 taken by both cameras is distorted. In order to correct this distortion, it is known that calibration is performed and the corrected image 202 appears.

도 4를 참조하면, 본 발명의 일실시예에 따른 예시도로서, 키넥트(Kinect)에서 제공되는 손의 중심점 Depth 값과 프로그램에서 설정한 Range 값으로 이진화된 전처리 이미지이다.Referring to FIG. 4, an exemplary embodiment according to the present invention is a preprocessed image binarized with a hand center point Depth value provided by a Kinect and a Range value set by a program.

도 5을 참조하면, 본 발명의 일실시예에 따른 예시도로서, 이진화된 손 영역 후보 영상에 수학식 2로 정의되는 거리 변환 알고리즘을 이용하여 나타낸 손 중심점과 손목 검출 원, 손목 검출점이다. 특히, 거리 변환 알고리즘은 L로 각 픽셀 값을 설정한 후에 L_max 값을 찾아서, L을 L_max로 나누어주는 방식이다. Referring to FIG. 5, an exemplary embodiment according to the present invention is a hand center point, a wrist point detection point, and a wrist point detection point using a distance transformation algorithm defined by Equation (2) in a binarized hand region candidate image. In particular, the distance conversion algorithm finds the L_max value after setting each pixel value with L, and divides L by L_max.

[수학식 2]&Quot; (2) "

L/L_maxL / L_max

(L은 손 중심점으로부터 가장 가까운 외곽선과의 거리, L-max는 모든 픽셀의 L 중에서 가장 큰 값)(L is the distance from the hand center point to the nearest outline, and L-max is the largest value among L of all pixels)

손 중심점(301)의 경우 앞에서 설정한 L_max 픽셀이며, 손 중심점을 중심으로 하는 원(302)은 손 중심점으로부터 가장 가까운 외곽선과의 거리를 반지름으로 하고, 손 중심점을 중심으로 하는 다른 원(303)은 손 중심점을 중심으로 하는 원(302)의 반지름에 상수 값 1.5를 곱한 반지름을 가진다.In the case of the hand center point 301, the circle 302 centered on the center of the hand is the L_max pixel set in the previous example, and the distance from the center line of the hand center point to the nearest outline is a radius, and another circle 303, Has a radius obtained by multiplying the radius of the circle 302 centered on the hand center by a constant value of 1.5.

손목 점 검출의 경우 손 중심점을 중심으로 하는 다른 원(303)을 1도씩 360개로 나눈 후, 거리변환 행렬에서 0이 아닌 값이 연속하여 8번(8도) 나타나면 연속한 각도에서 중간 각도의 지점을 2차 손목 후보(304)로 검출한다. 손목은 손 중심점(301)과 팔꿈치 점(305)을 이은 직선에서 가장 가까운 2차 손목 후보(304)를 최종 손목 후보로 검출한다.In the case of detecting the wrist point, the other circle (303) centered on the center of the hand is divided into 360 by 1 degree, and when non-zero values appear consecutively 8 times (8 degrees) in the distance conversion matrix, Is detected as the second wrist candidate (304). The wrist detects the second wrist candidate 304 closest to the straight line connecting the center of hand 301 and the elbow point 305 as the final wrist candidate.

도 6을 참조하면, 앞에서 검출한 2차 손목 후보(304)의 이웃 픽셀들 중에 가장 작은 값을 시작점 1로 설정하고, 점 1의 반대편 4개의 픽셀 중에서 가장 크기가 작은 픽셀을 다음 시작점으로 설정하는 과정을 반복하면서 시작점이 0이 되는 시점, 즉 더 이상 손 후보 영역이 아닌 지점까지 탐색하여 지나온 시작점들을 손목으로 검출하여 손이 아닌 팔 영역을 제거한다.6, the smallest value among the neighboring pixels of the secondary wrist candidate 304 detected previously is set as the starting point 1, and the smallest pixel among the four pixels on the opposite side of the point 1 is set as the next starting point As the process is repeated, the starting points that are searched from the point where the starting point becomes 0, that is, the point which is not the hand candidate region, are detected by the wrists and the arm region is removed.

도 7을 참조하면, 업샘플링(Up-sampling)을 수행할 때 Depth의 경계선에서 배경과 손가락 2개의 Depth 층이 섞이는 경우, 경계가 흐릿해지는 현상이 발생했을 때의 결과이다.Referring to FIG. 7, when the up-sampling is performed, when the background and the two depth layers of the finger are blended at the boundary of the depth, the result is a case where the boundary is blurred.

도 8를 참조하면, 키넥트(Kinect) 색상 영상의 해상도가 깊이 영상의 해상도보다 약 4배 정도 더 크기 때문에, 2번의 업샘플링(Up-sampling)(가로/세로 4배)으로 색상 영상의 정보를 모두 반영한 업샘플링(Up-sampling)이 가능함을 알 수 있는 도면이다.8, since the resolution of the Kinect color image is about four times larger than the resolution of the depth image, the information of the color image is obtained by up-sampling (up / (Up-sampling) can be performed.

도 9을 참조하면, 손가락 검출도 알고리즘을 나타낸 예시도로서, 도 9a의 경우 손 영역의 최외곽 점들을 직선으로 연결하여 모든 점이 포함하는 알고리즘인 콘벡스 홀(Convex Hull)을 나타낸다. 이 때, 외곽선 내부는 외곽선을 포함하기 때문에 제외하며, 앞에서 검출한 외곽선에 대해서만 콘벡스 홀(Convex Hull)을 진행한다. 도 9b의 경우 콘벡스 홀(Convex Hull)로부터 두 점 사이의 직선에서 가장 외곽선과 거리가 먼 지점인 콘벡시티 디펙트(Convexity Defect)를 이용하는 것을 나타낸다. 이때, 빨간 점은 콘벡시티 디펙트(Convexity Defect)이며 초록 점은 콘벡스 홀(Convex Hull)이다.Referring to FIG. 9, a finger detection algorithm is shown. In FIG. 9A, a Convex Hull which is an algorithm that all points include by connecting the outermost points of a hand region in a straight line is shown. In this case, the inside of the outline is excluded because it includes the outline, and the Convex Hull is performed only for the outline detected earlier. In FIG. 9B, a Convexity Defect is used, which is a point far from the outermost line in a straight line between two points from the Convex Hull. At this time, the red dot is Convexity Defect and the green dot is Convex Hull.

도 10을 참조하면, 색상 이미지 기반 손 영역을 검출하는 예시도로서, 깊이 이미지에서의 손 영역 추출은 업샘플링(Up-sampling) 단계에서 저해상도의 문제로 인해 많은 오차가 발생되기 때문에, 추가적으로 피부색을 이용한 손 영역 검출을 구현한 것이다. 피부색을 이용한 손 영역 추출에는 수학식 3으로 정의되는 와이씨알씨비(YCrCb) 영상 변환을 더 추가하여 동양인의 피부색을 가장 잘 나타내는 컬러 값으로 기 설정한다.Referring to FIG. 10, an example of detecting a color image-based hand region is illustrated. In the up-sampling step, a lot of errors are generated due to a problem of low resolution. Therefore, And realizes the use of hand region detection. For the hand region extraction using the skin color, the YCrCb image transformation defined by Equation 3 is further added to set the color value that best represents the Asian skin color.

[수학식 3]&Quot; (3) "

&&

(

는 휘도와 적색 성분,

The luminance and the red component,

이상에서 본 발명의 바람직한 실시예를 설명하였으나, 본 발명은 다양한 변화와 변경 및 균등물을 사용할 수 있다. 본 발명은 상기 실시예를 적절히 변형하여 동일하게 응용할 수 있음이 명확하다. 따라서 상기 기재 내용은 하기 특허청구범위의 한계에 의해 정해지는 본 발명의 범위를 한정하는 것이 아니다. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. It is clear that the present invention can be suitably modified and applied in the same manner. Therefore, the above description does not limit the scope of the present invention, which is defined by the limitations of the following claims.

100 : 이미지 수집부 200 : 이미지 보정부
201 : 두 카메라에서 찍은 이미지 202 : 보정된 이미지
300 : 대상 추출부 301 : 손 중심점
302 : 손 중심점을 중심으로 하는 원 303 : 손 중심점을 중심으로 하는 다른 원 400 : 해상도 판별부
500 : 명령 행동 생성부 600 : 명령 신호 생성부
700 : 명령 수행부100: image collecting unit 200: image correcting unit
201: Images from two cameras 202: Corrected images
300: target extraction unit 301: hand center point
302: Circle centering on the hand center point 303: Other circle centered on the hand center point 400: Resolution discrimination unit
500: command behavior generating unit 600: command signal generating unit
700: Command execution unit

Claims

A 3D motion-based presentation device,
An image collecting unit for photographing a hand gesture of a person making a presentation in one or two or more images for a predetermined period of time;
An image correcting unit for correcting the photographed image through calibration;
A target extracting unit for extracting an image corresponding to a hand region in the corrected image;
A resolution discrimination unit for discriminating a resolution of the extracted hand region image;
A command behavior generating unit for generating command actions by combining the determined one or two or more hand region images;
A command signal generating unit for converting the generated command action into a command signal; And an instruction execution unit for executing the instruction according to the instruction signal,
The hand region image extraction of the object extracting unit may be performed using a binarization conditional expression defined by Equation (1)

Having a certain range of depth values

Wherein the 3D motion-based presentation device detects only the 3D motion-based presentation device.
[Equation 1]

(

The depth value of the currently scanned image pixel,

Is the depth value of the hand center pixel)

The image processing apparatus according to claim 1,
And converts the data into OpenCV data for correcting the photographed hand motion image.

delete

2. The method of claim 1,
Wherein the center point detection is performed by setting each pixel value to L using a distance transformation algorithm defined by Equation (2) to find L_max, and then dividing L by L_max.
&Quot; (2) "
L / L_max
(L is the distance from the hand center point to the nearest outline, L_max is the largest value of L for all pixels)

The apparatus according to claim 1,
Performing detailed element recognition of the hand region when the hand region image has a high resolution of 1920x1080 or more capable of recognizing detailed elements,
And generates a high resolution hand region image of 1920x1080 or more when the hand region image is a low resolution of less than 1920x1080 in which detailed elements can not be recognized.

6. The method of claim 5,
The detailed element recognition of the hand region image includes a convex hull including all the points connecting the outermost points of the hand region in a straight line,
And a Convexity Defect which is a point farthest from the outline in a straight line between the two points from the Convex Hull is used.

6. The method of claim 5, wherein in the hand region image,
When an error occurs in generation of a high resolution due to a low resolution, a YCrCb image transformation defined by Equation (3) is further added to set a color value that best represents the skin color of an Asian person Based presentation device.
&Quot; (3) "

&&

(

The luminance and the red component,

A 3D motion-based presentation method,
An image collecting step of photographing a hand gesture of a person making a presentation in one or two or more images for a predetermined period of time;
An image correction step of correcting the photographed image through calibration;
Extracting an image corresponding to a hand region in the corrected image;
Determining a resolution of the extracted hand region image;
A command behavior generating step of generating a command action by combining the determined one or two or more hand region images;
A command signal generating step of converting the generated command action into a command signal; And an instruction execution step of executing the instruction according to the instruction signal,
The hand region image extraction in the subject extraction step may be performed using a binarization conditional expression defined by Equation (1)

Having a certain range of depth values

Wherein the 3D motion-based presentation method comprises the steps of:
[Equation 1]

(

The depth value of the currently scanned image pixel,

Is the depth value of the hand center pixel)

9. The method according to claim 8,
And converting the image data into OpenCV data for correcting the captured hand motion image.

delete

9. The method of claim 8,
Wherein the center point detection is performed by setting each pixel value to L using a distance conversion algorithm defined by Equation (2) to find L_max, and then dividing L by L_max.
&Quot; (2) "
L / L_max
(L is the distance from the hand center point to the nearest outline, L_max is the largest value of L for all pixels)

9. The method of claim 8,
Performing detailed element recognition of the hand region when the hand region image has a high resolution of 1920x1080 or more capable of recognizing detailed elements,
Wherein a high resolution hand region image of 1920x1080 or more is generated when the hand region image is a low resolution of less than 1920x1080 in which detailed elements can not be recognized.

13. The method of claim 12,
The detailed element recognition of the hand region image includes a convex hull including all the points connecting the outermost points of the hand region in a straight line,
And a Convexity Defect which is a point farthest from the outline in a straight line between the two points from the Convex Hull is used.

13. The method of claim 12, wherein in the hand region image,
When an error occurs in generation of a high resolution due to a low resolution, a YCrCb image transformation defined by Equation (3) is further added to set a color value that best represents the skin color of an Asian person Based presentation method.
&Quot; (3) "

&&

(

The luminance and the red component,