KR20140035244A

KR20140035244A - Apparatus and method for user interfacing, and terminal apparatus using the method

Info

Publication number: KR20140035244A
Application number: KR1020130090587A
Authority: KR
Inventors: 남승우
Original assignee: 한국전자통신연구원
Priority date: 2012-09-10
Filing date: 2013-07-31
Publication date: 2014-03-21
Also published as: KR101748126B1

Abstract

Disclosed are a method and device for user interfacing and a terminal device using same. A user interfacing method according to an aspect of the present invention includes the steps of: setting a reference image for a subject to be used for user interfacing; recognizing the subject to be used for user interfacing from an input image related to a user; comparing the recognized subject with the reference image to determine a depth-related motion of the subject; and running an application according to the depth-related motion of the subject, whereby a terminal device may be controlled using a user motion based on a distance between the user and the terminal device used by the user. [Reference numerals] (110) Camera receiving unit; (120) Feature extraction unit; (130) Motion recognition part; (140) Content driving unit; (150) Display unit

Description

A device and method for interfacing a user, and a terminal device using the same.

본 발명은 사용자 인터페이싱에 관한 것으로, 더욱 상세하게는 사용자의 동작을 활용해 휴대 단말 장치에 사용자 인터페이싱을 제공하는 장치 및 방법, 그리고 이를 이용하는 단말 장치에 관한 것이다.The present invention relates to user interfacing, and more particularly, to an apparatus and method for providing user interfacing to a mobile terminal device by using a user's motion, and a terminal device using the same.

사용자 인터페이스는 사용자와 기기 사이에서 둘 간의 상호 작용이 원활하게 이루어지도록 돕는 장치나 소프트웨어를 의미한다. 사용자 인터페이스는 주로 컴퓨터, 전자기기, 산업기기, 가전기기 등에 사용되어 사용자가 해당 장치와 소통할 수 있도록 돕는다. The user interface refers to a device or software that facilitates the interaction between the user and the device. The user interface is mainly used for computers, electronic devices, industrial devices, and home appliances to help the user communicate with the device.

전형적인 사용자 인터페이스의 일 예로서, 사용자가 키보드에 명령을 입력하여 프로그램을 작동시키는 것을 커맨드 라인 인터페이스라 하고, 메뉴 선택에 의한 명령으로 작동시키는 것을 메뉴 방식 인터페이스라 하며, 광, 펜, 마우스, 컨트롤 볼, 조이스틱 등의 위치 지정 도구를 사용하여 도형 표시 프로그램을 작동시키는 것을 그래픽 사용자 인터페이스라고 한다. As an example of a typical user interface, a user inputs a command on a keyboard to operate a program, called a command line interface, and a menu driven command is called a menu driven interface, and an optical, pen, mouse, and control ball. A graphical user interface is called up using a positioning tool such as a joystick or a joystick.

기술의 발달로 인해 사용자 인터페이스는 과거의 이러한 전형적인 형태를 벗어나 사용자와 기기 간의 자연스럽고 직관적인 인터페이스의 모습을 갖추어 가고 있다. 이러한 인터페이스의 대표적인 예로 3D 사용자 인터페이스를 들 수 있다. With advances in technology, the user interface is moving away from this typical form of the past to take the form of a natural and intuitive interface between the user and the device. A representative example of such an interface is a 3D user interface.

3D 인터페이스의 일종인 마이크로소프트 사의 키넥트(Kinect™)는 컨트롤러를 사용하지 않고 사용자의 동작을 인식하여 게임과 엔터테인먼트 서비스를 제공한다. 키넥트의 경우에는 콘텐츠와 사용자간에 상호 연동을 위한 전신 동작으로서, 콘텐츠 시작 전에 사용자의 초기 자세, 예를 들어, 양팔을 드는 자세를 취하도록 구성되어 있다. Microsoft's Kinect ™, a 3D interface, provides game and entertainment services by recognizing the user's actions without using a controller. In the case of Kinect, it is a whole body operation for mutual interaction between the content and the user, and is configured to take the initial posture of the user, for example, the position of lifting the arms before the start of the content.

하지만, 키넥트를 비롯하여 모바일 단말 장치를 위해 사용되는 사용자 인터페이스의 경우 사용자의 동작에 따른 3차원적인 콘텐츠(어플리케이션)와의 상호작용에 제약이 따른다. 특히 사용자의 인식 범위의 제약, 모바일 단말의 디스플레이 크기의 제약 등으로 사용에 많은 불편이 따를 수 있다. However, in the case of a user interface used for a mobile terminal device including Kinect, there is a restriction on interaction with three-dimensional content (application) according to a user's motion. In particular, the user may experience a lot of inconvenience due to the limitation of the recognition range of the user and the limitation of the display size of the mobile terminal.

따라서, 휴대용 단말 장치에 보다 적합하고 실질적으로 보다 자유로운 사용자 인터페이스를 제공할 필요가 있다 할 것이다. Accordingly, there is a need to provide a user interface that is more suitable and substantially freer for portable terminal devices.

상술한 문제점을 극복하기 위한 본 발명의 목적은 사용자가 사용하는 단말 장치와 사용자 간의 인터페이싱 방법을 제공하는 데 있다.An object of the present invention for overcoming the above-described problem is to provide an interfacing method between a user terminal device and the user.

본 발명의 다른 목적은 상기 인터페이싱 방법을 사용하는 사용자 인터페이스 장치를 제공하는 데 있다.Another object of the present invention is to provide a user interface device using the interfacing method.

본 발명의 또 다른 목적은 상기 사용자 인터페이스를 포함하는 단말 장치를 제공하는 데 있다. Still another object of the present invention is to provide a terminal device including the user interface.

상술한 본 발명의 목적을 달성하기 위한 본 발명의 일 측면에 따른 사용자 인터페이싱 방법은, 사용자 인터페이싱에 사용할 대상에 대한 기준 영상을 설정하는 단계, 입력되는 사용자 관련 영상으로부터 상기 사용자 인터페이싱에 사용할 대상을 인식하는 단계, 상기 인식된 대상과 상기 기준 영상을 비교하여 상기 대상의 깊이 관련 움직임을 판단하는 단계, 및 상기 대상의 깊이 관련 움직임에 따라 어플리케이션을 구동하는 단계를 포함한다. The user interfacing method according to an aspect of the present invention for achieving the above object of the present invention, setting a reference image for the object to be used for user interfacing, recognizes the object to be used for the user interfacing from the user-related image input And comparing the recognized object with the reference image to determine a depth-related motion of the object, and driving an application according to the depth-related motion of the object.

상기 사용자 인터페이싱에 사용할 대상은 사용자 신체의 일 부분인 것을 특징으로 한다.The object to be used for the user interfacing may be a part of a user's body.

상기 사용자 신체의 일 부분은 사용자의 손, 사용자의 손가락, 사용자의 손바닥, 사용자의 얼굴, 사용자의 입술, 사용자의 코, 사용자의 눈 및 사용자의 머리 중 적어도 하나일 수 있다.A portion of the user's body may be at least one of a user's hand, a user's finger, a user's palm, a user's face, a user's lips, a user's nose, a user's eye, and a user's head.

상기 인식된 대상과 상기 기준 영상을 비교하여 상기 대상의 깊이 관련 움직임을 판단하는 단계는, 상기 입력되는 사용자 관련 영상 내에서 인식된 상기 대상 영상의 크기를 상기 기준 영상의 크기와 비교하여 상기 대상 영상의 깊이 관련 위치를 판단하는 단계를 포함한다.The determining of the depth-related movement of the object by comparing the recognized object with the reference image may include comparing the size of the target image recognized in the input user-related image with the size of the reference image. Determining the depth related position of the.

상기 대상 영상 및 상기 기준 영상의 크기는 영상의 폭, 길이, 또는 넓이로 정의될 수 있다.The size of the target image and the reference image may be defined as the width, length, or width of the image.

상기 사용자 인터페이싱에 사용할 대상에 대한 기준 영상을 설정하는 단계는, 기준 영상의 대상이 될 사용자 신체의 일부를 선택하는 단계, 카메라를 통해 입력되는 대상의 영상과 선택된 대상 관련 그래픽을 합성하여 디스플레이하는 단계, 상기 카메라를 통해 입력되는 대상의 영상을 상기 그래픽에 매칭시키는 단계, 및 상기 그래픽에 매칭된 대상의 영상을 기준 영상으로 저장하는 단계를 포함한다.The setting of the reference image for the object to be used for the user interfacing may include selecting a part of the user's body to be the object of the reference image, displaying the image of the object input through the camera and a graphic related to the selected object. And matching the image of the object input through the camera to the graphic, and storing the image of the object matched with the graphic as a reference image.

입력되는 사용자 관련 영상으로부터 상기 사용자 인터페이싱에 사용할 대상의 동작을 인식하는 단계는, 카메라를 통해 입력되는 전체 영상에서 상기 대상 관련 특징부를 추출하는 단계를 포함할 수 있다.Recognizing the motion of the object to be used for the user interfacing from the input user-related image may include extracting the object-related feature from the entire image input through the camera.

상기 깊이 관련 움직임은, 카메라로부터 상기 대상으로 설정된 사용자 신체 부분까지의 거리를 기준으로 한 움직임인 것을 특징으로 한다. The depth-related movement is a movement based on a distance from a camera to a user's body part set as the target.

본 발명의 다른 측면에 따른 사용자 인터페이싱 방법은, 사용자의 깊이 관련 움직임뿐 아니라, 카메라를 기준으로 하였을 때 평면 방향(또는 수평)의 움직임 또한 사용자 인터페이싱에 반영한다.
The user interfacing method according to another aspect of the present invention reflects not only the depth-related movement of the user, but also the movement in the plane direction (or horizontal) when referring to the camera.

본 발명의 다른 목적을 달성하기 위한 본 발명의 일 측면에 따른 사용자 인터페이스 장치는, 사용자 관련 영상을 수신하는 수신부, 입력된 사용자 관련 영상으로부터 상기 사용자 인터페이싱에 사용할 대상 관련 영상을 추출하는 특징 추출부, 상기 추출된 대상 관련 영상과 상기 기준 영상을 비교하여 상기 대상의 움직임을 판단하는 동작 인식부, 및 상기 대상의 움직임에 따라 콘텐츠를 구동하는 콘텐츠 구동부를 포함한다.According to an aspect of the present invention, there is provided a user interface apparatus including: a receiver configured to receive a user related image, a feature extractor configured to extract an object related image to be used for the user interface from an input user related image; And a motion recognition unit for comparing the extracted object-related image with the reference image to determine the movement of the object, and a content driver for driving content according to the movement of the object.

이때, 대상의 움직임은 깊이 관련 움직임 및 평면 방향의 움직임을 포함한다.At this time, the movement of the object includes a movement related to the depth and the movement in the plane direction.

상기 사용자 인터페이스 장치는, 상기 동작 인식부가 제공하는 추출된 대상 관련 영상 및 기준 영상을 합성하여 디스플레이하는 디스플레이부를 더 포함할 수 있다.The user interface device may further include a display unit configured to synthesize and display the extracted object-related image and the reference image provided by the motion recognition unit.

상기 사용자 인터페이싱에 사용할 대상은 사용자 신체의 일 부분인 것을 특징으로 한다. The object to be used for the user interfacing may be a part of a user's body.

상기 동작 인식부는, 상기 입력되는 사용자 관련 영상 내에서 인식된 상기 대상 영상의 크기를 상기 기준 영상의 크기와 비교하여 상기 대상 영상의 깊이 관련 위치를 판단한다.The motion recognition unit determines a depth related position of the target image by comparing the size of the target image recognized in the input user related image with the size of the reference image.

여기서, 상기 대상 영상 및 상기 기준 영상의 크기는 영상의 폭, 길이, 또는 넓이로 정의될 수 있다.The size of the target image and the reference image may be defined as the width, length, or width of the image.

상기 기준 영상은, 사용자 인터페이싱에 사용할 대상이 기준점에 위치할 때 카메라로 입력되는 상기 대상의 영상인 것을 특징으로 한다.
The reference image is an image of the object input to the camera when the object to be used for user interfacing is located at the reference point.

본 발명의 또 다른 목적을 달성하기 위한 본 발명의 일 측면에 따른 단말 장치는, 입력된 사용자 관련 영상으로부터 상기 사용자 인터페이싱에 사용할 대상 관련 영상을 추출하고, 상기 추출된 대상 관련 영상과 상기 기준 영상을 비교하여 상기 대상의 깊이 관련 움직임을 판단하며, 상기 대상의 깊이 관련 움직임에 따라 콘텐츠를 구동하는 사용자 인터페이스부, 및 상기 사용자 인터페이싱에 사용할 대상 관련 기준 영상을 저장하는 데이터 저장부를 포함한다.According to another aspect of the present invention, a terminal apparatus extracts a target related image to be used for the user interfacing from an input user related image, and extracts the extracted target related image and the reference image. Compared to determine the depth-related movement of the object, and a user interface unit for driving the content according to the depth-related movement of the object, and a data storage unit for storing the object-specific reference image to be used for the user interfacing.

상기 사용자 인터페이스부는 또한, 기준 영상의 대상이 될 사용자 신체의 일부 관련 그래픽과, 카메라를 통해 입력되는 대상의 실제 영상을 합성하여 디스플레이하고, 상기 카메라를 통해 입력되는 대상의 영상을 상기 그래픽에 매칭시킨 시점에서 상기 그래픽에 매칭된 대상의 영상을 기준 영상으로 설정한다.The user interface unit may also synthesize and display a part of a related graphic of a user's body to be a target of a reference image and an actual image of an object input through a camera, and match an image of an object input through the camera to the graphic. The image of the target matched with the graphic at the viewpoint is set as the reference image.

상술한 바와 같은 본 발명에 따르면, 사용자가 사용하는 단말 장치와 사용자 간의 거리를 기준으로 한 사용자의 움직임을 이용해 단말 장치를 제어 가능하여, 사용자가 보다 자유롭게 전자 기기를 활용할 수 있다.According to the present invention as described above, the terminal device can be controlled by using the user's movement based on the distance between the terminal device used by the user and the user, so that the user can use the electronic device more freely.

도 1은 본 발명에 따른 사용자 인터페이스 장치의 블록 구성도이다.
도 2는 본 발명에 따른 사용자 인터페이싱 방법의 동작 개념도이다.
도 3은 본 발명에 따른 동작 인식을 위한 기준 영상 설정 방법의 동작 흐름도이다.
도 4는 본 발명에 따른 사용자 인터페이싱 방법의 동작 흐름도이다.
도 5는 본 발명에 따른 단말 장치의 블록 구성도이다. 1 is a block diagram of a user interface device according to the present invention.
2 is a conceptual diagram illustrating an operation of a user interfacing method according to the present invention.
3 is a flowchart illustrating a method of setting a reference image for gesture recognition according to the present invention.
4 is an operational flowchart of a user interfacing method according to the present invention.
5 is a block diagram of a terminal device according to the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다.While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail.

그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise" or "having" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가진 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the relevant art and are to be interpreted in an ideal or overly formal sense unless explicitly defined in the present application Do not.

후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 클라이언트나 운용자, 사용자의 의도 또는 판례 등에 따라서 다르게 호칭될 수 있다. 그러므로, 용어에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. The following terms are defined in consideration of the functions of the present invention and can be called differently depending on the client or operator, intention or precedent of the user, and the like. Therefore, the definition of a term should be based on the contents throughout this specification.

본 출원에서 사용하는 '단말'은 이동국(MS), 사용자 장비(UE; User Equipment), 사용자 터미널(UT; User Terminal), 무선 터미널, 액세스 터미널(AT), 터미널, 가입자 유닛(Subscriber Unit), 가입자 스테이션(SS; Subscriber Station), 무선 기기(wireless device), 무선 통신 디바이스, 무선송수신유닛(WTRU; Wireless Transmit/Receive Unit), 이동 노드, 모바일 또는 다른 용어들로서 지칭될 수 있다. A "terminal" used in the present application includes a mobile station (MS), a user equipment (UE), a user terminal (UT), a wireless terminal, an access terminal (AT), a terminal, a subscriber unit, A subscriber station (SS), a wireless device, a wireless communication device, a wireless transmit / receive unit (WTRU), a mobile node, a mobile, or other terminology.

단말의 다양한 실시예들은 셀룰러 전화기, 무선 통신 기능을 가지는 스마트 폰, 무선 통신 기능을 가지는 개인 휴대용 단말기(PDA), 무선 모뎀, 무선 통신 기능을 가지는 휴대용 컴퓨터, 무선 통신 기능을 가지는 디지털 카메라와 같은 촬영장치, 무선 통신 기능을 가지는 게이밍 장치, 무선 통신 기능을 가지는 음악저장 및 재생 가전제품, 무선 인터넷 접속 및 브라우징이 가능한 인터넷 가전제품뿐만 아니라 그러한 기능들의 조합들을 통합하고 있는 휴대형 유닛 또는 단말기들을 포함할 수 있으나, 이에 한정되는 것은 아니다.
Various embodiments of the terminal may be used in various applications such as cellular telephones, smart phones with wireless communication capabilities, personal digital assistants (PDAs) with wireless communication capabilities, wireless modems, portable computers with wireless communication capabilities, Devices, gaming devices with wireless communication capabilities, music storage and playback appliances with wireless communication capabilities, Internet appliances capable of wireless Internet access and browsing, as well as portable units or terminals incorporating combinations of such functions. However, the present invention is not limited thereto.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 본 발명을 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In order to facilitate the understanding of the present invention, the same reference numerals are used for the same constituent elements in the drawings and redundant explanations for the same constituent elements are omitted.

본 발명은 단말 장치와 사용자 간의 깊이 정보를 활용하여 사용자 인터페이스를 제공하는 기술에 관한 것으로, 특히 콘텐츠 또는 어플리케이션 시작 전에 사용자 신체 부위의 특징을 활용한 캘리브레이션을 통해 깊이 정보를 획득하는 방법을 제시한다. The present invention relates to a technology for providing a user interface by utilizing depth information between a terminal device and a user, and more particularly, to a method of acquiring depth information through calibration using features of a user's body part before starting content or an application.

더불어 휴대용 단말 장치에서 구동되는 콘텐츠(어플리케이션)와 3차원적으로 상호작용을 위해 콘텐츠가 시작되기 전에 초기화가 필요하다. 예를 들면 휴대용 단말에 장착되어 있는 카메라를 통하여 들어온 사용자의 손가락으로 콘텐츠와 상호작용을 한다고 가정한다. 이때 손가락의 크기에 따라 깊이 값을 알아내어 이를 사용자 인터페이스로 활용하고자 한다면 사용자 마다 손가락의 크기가 다르고 휴대용 단말에 장착된 카메라와 사용자의 손가락까지 거리에 따라 크기가 다르기 때문에 손가락의 크기에 대한 정보를 콘텐츠 시작 전에 (키넥트가 양손을 들어 사용자 초기 자세를 인식하여 콘텐츠를 구동하듯이) 초기화 할 필요가 있다. In addition, in order to interact with the content (application) running in the portable terminal device in three dimensions, the initialization is required before the content starts. For example, it is assumed that a user's finger coming in through a camera mounted on a portable terminal interacts with content. In this case, if you want to find out the depth value according to the size of the finger and use it as a user interface, the size of the finger is different for each user and the size of the finger is different according to the distance between the camera mounted on the portable terminal and the user's finger. Before the content starts, you need to initialize it (as Kinect raises both hands to recognize the user's initial posture and drives the content).

이를 확장하여 사용자의 손가락뿐만아니라 신체의 다른 일부분을 이용하여 Z방향(카메라와 사용자간의 거리 방향)으로 인터페이스를 할 경우가 있다. 예를 들면 사람의 얼굴, 눈동자 등을 이용할 수도 있다. 이때도 마찬가지로 사람 얼굴의 크기, 혹은 눈 사이 간격, 눈동자의 초기 위치 등에 대한 초기화가 필요하다. This may be extended to interface in the Z direction (distance between the camera and the user) using not only the user's finger but also other parts of the body. For example, a human face, eyes, or the like can also be used. In this case, it is necessary to initialize the size of the human face, the distance between the eyes, the initial position of the pupil, and the like.

이에 본 발명에서는 사용자 인터페이싱을 위한 초기화 방법 또한 제공한다.
Accordingly, the present invention also provides an initialization method for user interfacing.

도 1은 본 발명에 따른 사용자 인터페이스 장치의 블록 구성도이다. 1 is a block diagram of a user interface device according to the present invention.

본 발명에 따른 사용자 인터페이스 장치는 사용자가 사용하는 다양한 단말 장치에 내장되거나 또는 장착 가능하다. The user interface device according to the present invention may be embedded or mounted in various terminal devices used by a user.

아래에서 상술할 구성요소들은 물리적인 구분이 아니라 기능적인 구분에 의해서 정의되는 구성요소들로서 각각이 수행하는 기능들에 의해서 정의될 수 있다. 각각의 구성요소들은 하드웨어 및/또는 각각의 기능을 수행하는 프로그램 코드 및 프로세싱 유닛으로 구현될 수 있을 것이며, 두 개 이상의 구성요소의 기능이 하나의 구성요소에 포함되어 구현될 수도 있을 것이다. 따라서, 이하의 실시예에서 구성요소에 부여되는 명칭은 각각의 구성요소를 물리적으로 구분하기 위한 것이 아니라 각각의 구성요소가 수행하는 대표적인 기능을 암시하기 위해서 부여된 것이며, 구성요소의 명칭에 의해서 본 발명의 기술적 사상이 한정되지 않는 것임에 유의하여야 한다.Components to be described in detail below may be defined by functions performed by each of the components defined by the functional division rather than the physical division. Each component may be implemented as hardware and / or program code and a processing unit performing the respective functions, and functions of two or more components may be embodied in one component. Therefore, the names given to the components in the following embodiments are given not to physically distinguish each component, but to imply representative functions performed by each component, and are referred to by the name of the component. It should be noted that the technical spirit of the invention is not limited.

도 1에 도시된 본 발명의 일 실시예에 따른 사용자 인터페이스 장치는, 카메라 수신부(110), 특징 추출부(120), 동작 인식부(130), 콘텐츠 구동부(140), 및 디스플레이부(150)를 포함할 수 있다. A user interface device according to an embodiment of the present invention illustrated in FIG. 1 includes a camera receiver 110, a feature extractor 120, a gesture recognition unit 130, a content driver 140, and a display unit 150. It may include.

카메라 수신부(110)로 입력되는 이미지는 이미지 센서, TOF(Time Of Flight) 센서로부터 들어오는 모든 영상을 포함할 수 있다. 카메라 수신부(110)가 수신하는 이미지는 또한 RGB 영상, 깊이맵, 적외선 영상을 포함한다. The image input to the camera receiver 110 may include all images coming from an image sensor and a time of flight (TOF) sensor. The image received by the camera receiver 110 also includes an RGB image, a depth map, and an infrared image.

여기서, 이미지 센서는 CCD(Charge Coupled Device)로 대표되는 영상 검지 디바이스로서, CCD의 경우에는 동전 크기의 칩에 10만 개 이상의 검지 요소를 가지고 있으며, 칩 면에 초점을 맺은 영상이 개개의 요소상에 전하 패킷으로 축적된다. 이들 패킷은 전하 전송 기구에 의해 고속으로 출력되어 변환 처리된 후에 영상으로서 표시된다. CCD에서의 요소는 검출 어레이로서 축적, 출력 등을 위하여 영역을 나누어 사용된다. Here, the image sensor is an image detection device represented by a charge coupled device (CCD). In the case of a CCD, a coin-sized chip has more than 100,000 detection elements, and an image focused on a chip surface has an individual element shape. Accumulates in charge packets. These packets are output at a high speed by the charge transfer mechanism, converted and displayed as an image. The elements in the CCD are used as a detection array to divide the area for accumulation, output, and so on.

또한, TOF(Time Of Flight) 센서는 3D 카메라에서 깊이 측정을 위해 많이 사용되는 방식으로, 빛(적외선 파장)을 전송한 후 피사체로부터 반사돼 되돌아오는 신호 수신까지 소요된 시간을 측정해 피사체까지의 거리를 계산한다. TOF 기반 깊이 카메라는 빛의 펄스를 생성하기 힘들고 고속 특성으로 인해 반사파의 위상 차이를 센서에서 파악하는 방식으로 깊이를 측정한다. In addition, the TOF (Time Of Flight) sensor is a widely used method for depth measurement in a 3D camera, and measures the time taken to receive a signal reflected back from the subject after transmitting light (infrared wavelengths) to the subject. Calculate the distance. TOF-based depth cameras are difficult to generate pulses of light, and because of their high speed, they measure depth in such a way that the sensor knows the phase difference of the reflected waves.

이어서 특징 추출부(120)는, 카메라 수신부(110)로부터 입력되는 영상으로부터 사용자의 특정 신체 부위, 예를 들어, 사용자의 눈, 사용자의 머리, 사용자의 얼굴, 사용자의 손, 손가락 등 본 발명에 따른 동작 인식에 사용될 사용자 신체 부위와 관련된 영상을 추출한다. Next, the feature extractor 120 may include a specific body part of the user, for example, a user's eyes, a user's head, a user's face, a user's hand, a finger, and the like, from an image input from the camera receiver 110. An image related to a user's body part to be used for motion recognition is extracted.

본 발명의 일 실시예에 따르면, 특징 추출부(120)는 입력된 영상으로부터 사용자의 양쪽 눈을 검출하고, 양쪽 눈(눈동자 포함) 사이의 거리를 해당 영상의 특징으로 추출한다. 특징 추출부(120)는 또한, 입력된 영상으로부터 손가락 영역을 검출하고 손가락 길이 및 두께에 관한 정보를 추출하여 동작 인식부(130)로 전달함으로써 동작 인식부(130)가 해당 특징을 활용하도록 할 수도 있다.According to an embodiment of the present invention, the feature extractor 120 detects both eyes of the user from the input image and extracts the distance between both eyes (including the pupil) as the feature of the corresponding image. The feature extractor 120 may also detect the finger region from the input image, extract information about the length and thickness of the finger, and transmit the extracted information to the gesture recognition unit 130 so that the gesture recognition unit 130 may utilize the feature. It may be.

동작 인식부(130)는 사용자가 콘텐츠 또는 어플리케이션을 본격적으로 구동하기 전에, 카메라로부터 입력되는 영상과 관련된 가상의 기준 영상에 대한 그래픽 정보를 디스플레이부(150)에 제공한다. 동작 인식부(130)는 또한, 카메라로부터 입력되는 실제의 컬러 및 흑백의 영상 자체 또는 실제 영상으로부터 추출된 특징 부분의 영상을 디스플레이부(150)에 제공한다. 이때, 카메라로부터 입력되는 흑백 또는 컬러 영상은 깊이 맵, 적외선 영상 등이 될 수 있다.The motion recognition unit 130 provides the display unit 150 with graphic information about the virtual reference image related to the image input from the camera before the user drives the content or the application in earnest. The motion recognition unit 130 also provides the display 150 with an image of the actual color and monochrome images input from the camera itself or an image of a feature portion extracted from the actual image. In this case, the black and white or color image input from the camera may be a depth map or an infrared image.

동작 인식부(130)는 또한, 캘리브레이션을 위한 기준 영상을 설정하는 단계에서 카메라의 줌을 자동으로 조절하여 사용자의 실제 눈의 위치에 가상의 그래픽 외곽선이 입력 영상의 대상인 눈의 외곽과 매칭되도록 조절한다. 반대로, 사용자가 가상의 양안의 위치에 맞게 거리를 조절하여 맞추는 경우에는 사용자 또는 사용자 신체의 일부가 기준점에 위치할 때 기준 영상으로 사용될 사용자의 특징 부분을 인식할 수 있다. 이때 기준점이 되는 것은 조절된 줌이나 사용자 또는 사용자 신체의 일부의 위치이다. 줌을 사용하지 않는 경우에는 사용자가 사용자 스스로 가상의 그래픽 외곽선에 신체의 일부를 맞추어서 특징으로 인식할 대상체의 기준점으로 설정하고 대상 영상을 저장한다. The motion recognition unit 130 also automatically adjusts the zoom of the camera in the step of setting the reference image for calibration so that the virtual graphic outline matches the outline of the eye that is the target of the input image. do. On the contrary, when the user adjusts and adjusts the distance to the position of the virtual binocular, the user may recognize a feature part of the user to be used as the reference image when the user or a part of the user's body is located at the reference point. In this case, the reference point is the adjusted zoom or the position of the user or a part of the user's body. When the zoom is not used, the user sets a reference point of the object to be recognized as a feature by aligning a part of the body to the virtual graphic outline and saves the target image.

사용자가 카메라로부터 멀리 떨어지게 되면 사용자 양안 사이의 거리는 상대적으로 좁아지고 사용자가 카메라로 가까이 다가가면 양눈 사이의 상대적인 거리가 멀어지는 것을 콘텐츠와 사용자간 인터페이싱에 이용할 수 있다. When the user moves away from the camera, the distance between both eyes becomes relatively narrow, and when the user approaches the camera, the relative distance between both eyes can be used for interfacing between the content and the user.

본 발명의 다른 일 실시예로는 사용자 입술의 양쪽 가장자리 포인트를 이용하는 방법을 사용할 수 있다. 입술의 양쪽 가장자리 포인트를 사용하는 경우에는 제1 포인트로부터 제2 포인트까지의 거리에 따라 카메라로부터 사용자의 거리를 추산할 수 있는데, 양 포인트 간의 거리가 길수록 카메라로부터 사용자까지의 거리를 짧아지는 특성을 사용자 인터페이싱에 활용할 수 있다. In another embodiment of the present invention, a method using both edge points of the user's lips may be used. If both edge points of the lips are used, the user's distance from the camera can be estimated according to the distance from the first point to the second point. The longer the distance between the two points, the shorter the distance from the camera to the user. It can be used for user interfacing.

한편, 본 발명의 또 다른 실시예에 따르면 손가락을 특징을 사용할 수 있는데, 이 경우에는 카메라에서 수신하는 영상과 기준이 되는 손가락의 두께 또는 길이를 그래픽으로 합성하여 디스플레이한다. Meanwhile, according to another embodiment of the present invention, a finger may be used. In this case, the image received from the camera and the thickness or length of the reference finger are synthesized and displayed graphically.

이하에서 도 2를 통해 자세히 설명하겠지만, 사용자는 자신의 손가락, 즉 실제 손가락 영상의 길이나 두께를 가상의 그래픽 위치에 매칭시킨다. 사용자의 실제 손가락 영상과 가상의 그래픽 위치가 매칭되었을 때 사용자의 손가락 위치가 기준 위치가 되고, 카메라로부터 사용자 손가락까지의 거리가 기준 거리가 된다. 이때 카메라의 특성에 따라 캘리브레이션을 위한 데이터는 미리 저장되어 있을 수 있다. As will be described in detail below with reference to FIG. 2, the user matches the length or thickness of his or her finger, ie, the actual finger image, to the virtual graphic position. When the actual finger image of the user and the virtual graphic position match, the user's finger position becomes the reference position, and the distance from the camera to the user's finger becomes the reference distance. In this case, data for calibration may be stored in advance according to the characteristics of the camera.

손가락의 위치가 기준거리에서 카메라 쪽으로 다까이 다가가는 경우 카메라로 입력되는 사용자 손가락 영상의 두께가 두꺼워지고 길이는 길어진다. 반대로, 사용자의 손가락의 위치가 카메라로부터 기준 거리 이상으로 멀어지는 경우 손가락의 두께가 얇아지고 손가락의 길이는 짧아진다. When the position of the finger approaches the camera from the reference distance, the thickness of the user's finger image input to the camera becomes thicker and the length becomes longer. On the contrary, when the position of the user's finger moves away from the camera by more than the reference distance, the thickness of the finger becomes thin and the length of the finger becomes short.

이러한 원리를 활용하면 카메라로 입력되는 손가락, 눈 등 사용자의 신체 부위 영상으로부터 사용자 카메라 간의 거리를 산출할 수 있다.Using this principle, the distance between the user cameras can be calculated from an image of a user's body part such as a finger or an eye input to the camera.

디스플레이부(150)는 동작 인식부(130)가 제공하는 카메라를 통해 입력된 실제 영상 및 기준 영상 관련 가상 그래픽을 합성하여 함께 디스플레이한다. The display unit 150 synthesizes and displays the virtual image related to the real image and the reference image input through the camera provided by the motion recognition unit 130.

여기서, 디스플레이부가 3D 영상을 디스플레이하는 방식으로는 좌/우 양안에 각각 다른 영상을 입력하여 인간으로 하여금 입체감을 느끼도록 디스플레이하는 스테레오스코픽(stereoscopic) 방식과 인간의 시점에 따르 물체의 원근 및 좌/우 상의 이동 양이 달라지는 운동 시차 방식 등이 사용될 수 있다. Here, the display unit displays a 3D image by inputting different images into both left and right eyes, and displaying a stereoscopic method in which a human feels a three-dimensional effect, and perspective and left / right of an object according to a human viewpoint. A motion parallax method in which the amount of movement of the upper right phase is different may be used.

한편, 깊이 맵은 3D 영상을 표현하는 데 중요한 요소 중 하나로 3차원 공간 상에 위치하는 객체와 그 객체를 촬영하는 카메라 사이의 거리를 흑백이나 컬러의 단위로 나타낸 것이다. 예를 들면 깊이 맵을 흑백으로 나타내는 경우 가까운 물체일수록 흰색에 가깝고 멀리 떨어진 물체일수록 흑색에 가깝게 표현될 수 있다. 일반적으로 사람의 좌안, 우안은 하나의 입체 대상물을 관찰할 때 서로 미세하게 다른 위치로부터 대상물을 관찰하게 되어, 관찰자의 좌안과 우안을 통해 미세하게 서로 다른 영상 정보가 관찰된다. 관찰자는 이렇게 서로 미세하게 다른 영상 정보를 조합하여 입체 대상물에 대한 깊이 정보를 획득하고 이로부터 입체감을 느낄 수 있다. On the other hand, the depth map is one of the important elements to represent the 3D image represents the distance between the object located in the three-dimensional space and the camera photographing the object in black and white or color units. For example, when the depth map is displayed in black and white, the closer the object, the closer the white, and the farther the object, the closer the black. In general, when the left eye and the right eye of a person observe one stereoscopic object, the object is observed from different positions from each other, and minutely different image information is observed through the left and right eyes of the observer. The observer can obtain the depth information about the three-dimensional object by combining the minutely different image information with each other, and can sense the three-dimensional effect from the three-dimensional object.

또한, 디스플레이부(150)가 실제 영상에 더하여 기준 영상을 그래픽으로 합성하여 디스플레이하는 경우 AR(Augmented Reality: 증강 현실) 등의 기법을 활용하여 디스플레이할 수 있다. In addition, when the display unit 150 displays a graphic synthesized with the reference image in addition to the actual image, it may be displayed using a technique such as Augmented Reality (AR).

AR 기법은 사용자의 현실 세계에 3차원 가상 물체를 겹쳐 보여주는 기술로, 현실 세계에 컴퓨터 기술로 만든 가상물체 및 정보를 융합, 보완해 주는 기술을 말한다. 현실 세계에서 실시간으로 부가정보를 갖는 가상세계를 더해서 하나의 영상으로 보여주므로 혼합현실이라고도 한다. 가상현실 기술의 경우에는 컴퓨터 그래픽이 만든 가상 환경에 사용자를 몰입하도록 함으로써 실제 환경을 볼 수 없지만, 증강현실 기술은 실제 환경에 가상의 객체를 혼합하여 사용자가 실제 환경에서 보다 실감나는 부가 정보를 제공받을 수 있다는 장점이 있다. The AR technique is a technique of overlaying three-dimensional virtual objects on the user's real world, and is a technology that fuses and complements virtual objects and information made by computer technology in the real world. In the real world, it is also called mixed reality because it shows a virtual world with additional information in real time. In the case of virtual reality technology, the real world cannot be seen by immersing the user in the virtual environment created by computer graphics, but augmented reality technology provides the user with more realistic information by mixing virtual objects in the real environment. The advantage is that you can get.

콘텐츠 구동부(140)는 동작 인식부(130)가 출력하는 정보 중 사용자까지의 거리 값에 따라 사용자의 의도를 파악하고 파악된 사용자의 의도에 따라 해당 콘텐츠를 제어한다. The content driver 140 determines the user's intention according to the distance value to the user among the information output from the motion recognition unit 130 and controls the corresponding content according to the determined user's intention.

즉, 사용자가 손이나 얼굴의 기준 위치를 맞추거나 카메라의 줌이 자동으로 맞추어 캘리브레이션이 완료되고, 기준 영상이 설정되면, 기준 영상에 해당하는 사용자의 신체 부위가 기준 위치로부터 멀어지면 손의 두께 혹은 길이가 상대적 크기라 작아지므로 손이 카메라로부터 거리가 멀어지는 것을 감지하고 그에 해당하는 콘텐츠를 제어할 수 있다. That is, when the user adjusts the reference position of the hand or face or automatically adjusts the zoom of the camera and the reference image is set, when the user's body part corresponding to the reference image moves away from the reference position, the thickness of the hand or As the length becomes smaller, the hand can detect the distance from the camera and control the corresponding content.

예를 들어, 사용자의 손이 카메라로부터 멀어지는 경우 해당 콘텐츠의 아이콘 크기를 축소하여 표현할 수 있다. 반대로, 사용자의 손이 기준 위치로부터 카메라 쪽으로 가까이 다가가면 손의 두께가 두꺼워지거나 손의 길이가 길어지는 것을 이용하여 콘텐츠를 제어할 수 있다. 예를 들어, 사용자의 손이 카메라에 가까이 다가가는 경우 해당 콘텐츠의 아이콘 크기를 확대하여 표현할 수 있다.
For example, when the user's hand moves away from the camera, the icon size of the corresponding content may be reduced. On the contrary, when the user's hand approaches the camera from the reference position, the content can be controlled by using a thicker hand or a longer hand. For example, when the user's hand approaches the camera, the icon size of the corresponding content may be enlarged and expressed.

도 2는 본 발명에 따른 사용자 인터페이싱 방법의 동작 개념도이다. 2 is a conceptual diagram illustrating an operation of a user interfacing method according to the present invention.

도 2에 도시된 본 발명의 개념에 따르면, 입체 3D 콘텐츠나 홀로그래픽 콘텐츠의 경우 카메라로부터 사용자까지의 거리, 예를 들어 카메라로부터 사용자 손가락까지의 거리를 실시간으로 측정하여 깊이가 존재하는 입체 콘텐츠 또는 어플리케이션을 제어할 수 있다. According to the inventive concept shown in FIG. 2, in the case of stereoscopic 3D content or holographic content, the depth from the camera to the user, for example, the distance from the camera to the user's finger, is measured in real time, You can control the application.

다시 말해, 본 발명에 따르면 카메라로부터 직각인 평면 방향으로의 사용자의 x, y 방향과 카메라로부터 거리 방향인 z 방향의 인터페이스가 가능하다. In other words, according to the present invention, an interface of the user's x and y directions in the plane direction perpendicular to the camera and the z direction, which is the distance direction from the camera, is possible.

도 2에서 좌측에 위치하는 것이 카메라 수신부(110)이고, 우측으로 갈수록 사용자의 손가락이 카메라 수신부(110)로부터 멀어지는 상황을 나타내고 있다. In FIG. 2, the camera receiver 110 is located at the left side, and the user's finger moves away from the camera receiver 110 toward the right side.

도 2에서는 사용자의 손가락이 d1, d2, d3 세 지점에 위치하는 경우의 수를 보여주고 있으며, d1에서 d3로 갈수록 카메라 수신부(110)로부터의 거리가 멀어지고 있음을 알 수 있다. 2 shows the number of cases where the user's finger is located at three points d1, d2, and d3, and it can be seen that the distance from the camera receiver 110 increases from d1 to d3.

d1, d2, d3 세 지점의 상단에는 각 지점에서의 사용자 손가락 영상이 도시되어 있다. 도시된 바와 같이, a1은 사용자의 손가락이 d1에 위치할 때의 영상을, a2는 사용자의 손가락이 d2에 위치할 때의 영상을, a3는 사용자의 손가락이 d3에 위치할 때의 영상을 나타낸다. At the top of the three points d1, d2, and d3, the user's finger image is shown at each point. As shown, a1 represents an image when the user's finger is located at d1, a2 represents an image when the user's finger is located at d2, and a3 represents an image when the user's finger is located at d3. .

a1, a2, a3 영상에 나타나 있는 붉은색 점선은 가상의 손가락을 표시하는 그래픽으로, 카메라 수신부(110)로부터 사용자 손가락까지의 거리(z)를 산출하는 데 기준이 되는 기준 영상에 관한 가상의 그래픽이다. The red dotted line shown in the a1, a2, and a3 images is a graphic representing a virtual finger. The virtual graphic of the reference image is a reference for calculating a distance z from the camera receiver 110 to the user's finger. to be.

도 2에서는 d2의 위치가 기준 위치 혹은 캘리브레이션 위치로 설정되어 있다. In FIG. 2, the position of d2 is set as a reference position or a calibration position.

사용자가 손가락을 그래픽에 맞추어 d2의 위치에 둔 상태에서 대상체의 기준 위치로 설정한 후, 기준 위치 d2를 기준으로 사용자가 d2보다 가까운 거리에 있는 d1에 위치하는 경우의 영상인 a1의 경우 실제 손가락의 영상이 기준 영상보다 크게 나타나며, 사용자가 d2보다 먼 거리에 있는 d3에 위치하는 경우의 영상인 a3의 경우 실제 손가락의 영상이 기준 영상보다 작게 나타남을 알 수 있다. In the case of a1, which is an image in which the user sets the reference position of the object while the user places the finger at the position of d2 in accordance with the graphic, and then the user is located at the distance d1 closer than d2 based on the reference position d2, the actual finger It can be seen that the image of is larger than the reference image, and in the case of a3, which is an image when the user is located at a distance d3 farther than d2, the image of the actual finger is smaller than the reference image.

도 2에 나타낸 개념을 수학적으로 접근하면, 카메라로부터 사용자의 손까지의 거리는 손가락의 두께나 길이에 따른 함수가 된다. Mathematically approaching the concept shown in FIG. 2, the distance from the camera to the user's hand becomes a function of the thickness or length of the finger.

즉, 본 발명의 바람직한 일 실시예에 따르면, 카메라로부터 사용자까지의 거리는 아래 수학식 1에 정의된 바와 같이 손가락의 두께에 관한 함수로 나타낼 수 있다. 이때, 카메라로부터 사용자까지의 거리는 손가락의 두께에 반비례한다. That is, according to an exemplary embodiment of the present invention, the distance from the camera to the user may be represented as a function of the thickness of the finger as defined in Equation 1 below. At this time, the distance from the camera to the user is inversely proportional to the thickness of the finger.

여기서, z는 카메라로부터 사용자의 손가락까지의 거리를 나타내고, Δx는 손가락의 두께를 의미한다.Here, z represents the distance from the camera to the user's finger, Δx means the thickness of the finger.

본 발명의 바람직한 다른 실시예에 따르면, 카메라로부터 사용자 손가락까지의 거리는 아래 수학식 2에 정의된 바와 같이 손가락의 길이에 관한 함수로 나타낼 수 있다. 이때, 카메라로부터 사용자까지의 거리는 손가락의 길이에 반비례한다. According to another preferred embodiment of the present invention, the distance from the camera to the user finger can be represented as a function of the length of the finger as defined in Equation 2 below. At this time, the distance from the camera to the user is inversely proportional to the length of the finger.

여기서, z는 카메라로부터 사용자의 손가락까지의 거리를 나타내고, Δy는 손가락의 길이를 의미한다. Here, z represents the distance from the camera to the user's finger, Δy means the length of the finger.

이상의 예에서 본 발명에 따른 영상의 크기는 영상의 폭 또는 길이로 정의될 수 있음을 살펴보았으며, 본 발명에 따른 사용자 인터페이싱을 위해 인식되는 영상의 크기는 영상의 폭, 길이, 또는 면적 등에 의해 정의될 수 있다.
In the above example, the size of the image according to the present invention has been described as being defined as the width or length of the image, and the size of the image recognized for the user interface according to the present invention is determined by the width, length, or area of the image. Can be defined.

도 3은 본 발명에 따른 동작 인식을 위한 기준 영상 설정 방법의 동작 흐름도이다. 3 is a flowchart illustrating a method of setting a reference image for gesture recognition according to the present invention.

도 3에서는 본 발명에 따른 사용자 인터페이싱에 활용하기 위해 사용자의 동작 인식에 기준이 되는 기준 영상을 설정하는 방법을 설명한다. 3 illustrates a method of setting a reference image as a reference for recognizing a motion of a user for use in user interfacing according to the present invention.

본 발명에 따른 동작 인식에 기준이 되는 영상은 사용자의 손가락, 사용자의 양안, 사용자의 입, 사용자의 머리 등 움직임에 따라 거리의 원근을 감지할 수 있는 사용자 신체의 다양한 부분에 대한 영상이 될 수 있다. An image that is a reference for gesture recognition according to the present invention may be an image of various parts of the user's body capable of detecting a perspective of a distance according to a user's finger, both eyes of the user, the user's mouth, and the user's head. have.

본 발명에 따른 동작 인식에 기준이 되는 영상은 예를 들어, 사용자 신체의 일 부분인, 사용자의 손가락, 사용자의 손, 사용자의 손바닥, 사용자의 얼굴, 사용자의 입술, 사용자의 코, 사용자의 양안, 사용자의 한쪽 눈(예를 들어, 한쪽 눈의 길이가 특징으로 사용 가능), 사용자의 머리 등을 그 대상으로 한 영상일 수 있다.An image that is a reference for gesture recognition according to the present invention may be, for example, a part of a user's body, a user's finger, a user's hand, a user's palm, a user's face, a user's lips, a user's nose, or both eyes of the user. , One eye of the user (eg, the length of one eye may be used as a feature), and an image of the user's head.

본 발명에 따라 기준 영상을 설정하기 위해서는 우선, 기준으로 삼을 대상체 관련 영상을 입력받는다(S210). 여기서, 대상체는 사용자 혹은 사용자 신체의 일부일 수 있다.In order to set the reference image according to the present invention, first, an object related image to be used as a reference is received (S210). Here, the object may be a user or a part of the user's body.

이후, 입력되는 영상으로부터 기준으로 사용할 대상체를 추출한다. 이때, 기준으로 사용할 대상체는 해당 어플리케이션에 의해 미리 정해지거나 사용자에 의해 미리 정해진 상태일 수 있다. 또한, 이때의 대상체 추출은, 입력되는 전체 영상 중 특징을 추출하여 대상체를 중심으로 한 대상체 영상을 추출하는 방식으로 이루어질 수 있다. Then, the object to be used as a reference is extracted from the input image. In this case, the object to be used as a reference may be predetermined by the corresponding application or in a predetermined state by the user. In this case, object extraction may be performed by extracting a feature from the entire input image and extracting an object image centering on the object.

본 발명에 따른 사용자 인터페이싱 방법은, 입력되는 영상을 디스플레이함과 더불어 기준 영상에 맞는 가상의 그래픽을 중첩하여 디스플레이한다(S220). 이때 그래픽은 사용자 혹은 단말의 인식이 용이하도록 도 2에 도시된 바와 같이 점선으로 가이드하여 디스플레이되거나 트랜스패런트한 이미지 형태로 입력 영상과 중첩하여 디스플레이될 수 있다.In the user interfacing method according to the present invention, an input image is displayed and a virtual graphic corresponding to the reference image is superimposed (S220). In this case, the graphic may be displayed by being guided by a dotted line to be easily recognized by the user or the terminal, or may be displayed by overlapping the input image in a transparent image form.

이후, 대상체의 깊이 조절이 수행된다(S230). 여기서, 깊이 조절은 사용자가 디스플레이된 가상의 그래픽을 보고 사용자 인터페이싱에 사용될 대상체를 카메라에 대해 앞으로 혹은 뒤로 즉, 수직 방향으로 이동시킴으로써 이루어질 수 있다. 깊이 조절은 또한, 본 발명에 따른 사용자 인터페이싱을 제공하는 단말 장치에 장착된 카메라 줌을 활용해 단말 장치가 자동으로 수행할 수도 있다.Thereafter, depth adjustment of the object is performed (S230). Here, the depth adjustment may be made by the user viewing the displayed virtual graphic and moving the object to be used for user interfacing forward or backward with respect to the camera, that is, in the vertical direction. Depth adjustment may also be automatically performed by the terminal device utilizing a camera zoom mounted on the terminal device providing the user interface according to the present invention.

단말 장치는 입력 영상의 대상체와 가상 그래픽으로 가이드되어 정의되는 이미지 형상의 크기 또는 면적이 일치하는지 판단한다(S240). The terminal device determines whether the size or the area of the image shape defined by being guided by the virtual graphic object and the object in the input image (S240).

이때, 가상 그래픽은 영상의 가장자리를 둘러싸는 점선의 형태 또는 입력 영상의 대상체와 중첩하여 디스플레이되는 트랜스패런트한 이미지 형상으로 표현될 수 있으며, 그 외에도 다양한 방식으로 표현될 수 있다.In this case, the virtual graphic may be expressed in the form of a dotted line surrounding the edge of the image or in a transparent image displayed by overlapping with an object of the input image. In addition, the virtual graphic may be expressed in various ways.

여기서, 입력된 대상체 영상과 가상 그래픽에 의해 정의되는 이미지 형상의 크기 또는 면적이 일치하는지에 대한 판단 주체는 사용자 혹은 단말일 수 있다. 예를 들어, 사용자가 두 영상이 충분히 매칭되었다고 판단한 시점에서 OK 버튼을 누르는 방식을 사용할 수도 있고, 단말 장치가 일정한 조건을 만족하는 경우 자동으로 OK 사인을 발행하는 방식을 사용할 수도 있다.Here, the determining subject of whether the input object image coincides with the size or area of the image shape defined by the virtual graphic may be a user or a terminal. For example, when the user determines that the two images are sufficiently matched, the user may use a method of pressing an OK button, or a method of automatically issuing an OK sign when the terminal device satisfies a predetermined condition.

입력된 대상체 영상과 가상 그래픽에 의해 정의되는 이미지 형상이 매칭되는 것으로 판단된 경우에는 입력된 영상의 대상체 영상 자체 또는 대상체에 관한 정보, 예를 들어 대상체의 크기 또는 면적, 길이, 넓이 등의 정보를 저장한다(S250). 이로써, 본 발명에 따른 사용자 인터페이싱에 사용될 기준 영상의 설정을 완료한다.
If it is determined that the input object image and the image shape defined by the virtual graphic match, information about the object image itself or the object of the input image, for example, information such as size or area, length, width of the object, etc. Save (S250). Thus, setting of the reference image to be used for user interfacing according to the present invention is completed.

도 4는 본 발명에 따른 사용자 인터페이싱 방법의 동작 흐름도이다. 4 is an operational flowchart of a user interfacing method according to the present invention.

이하의 실시예 설명에서 본 발명의 방법을 구성하는 각 단계들이 도 1을 통하여 설명된 사용자 인터페이스 장치의 대응되는 구성요소에서 수행되는 동작으로 이해될 수 있으나, 본 발명에 따른 방법을 구성하는 각 단계들은 각 단계를 정의하는 기능 자체로서 한정되어야 한다. 즉, 각 단계를 수행하는 것으로 예시된 구성요소의 명칭에 의해서 각 단계의 수행 주체가 한정되지 않음에 유의하여야 한다.In the following description of the embodiments, each step constituting the method of the present invention may be understood as an operation performed in a corresponding component of the user interface device described with reference to FIG. 1, but each step constituting the method according to the present invention. These should be limited to the function itself defining each step. That is, it should be noted that the subject of each step is not limited by the name of the constituent element exemplified by performing each step.

본 발명에 따른 사용자 인터페이싱을 수행하기 위해서는 우선 도 3을 통해 살펴본 바와 같은 단계들을 통해 인터페이싱을 위한 기준 영상을 설정하는 절차가 필요하다.
In order to perform user interfacing according to the present invention, first, a procedure of setting a reference image for interfacing through the steps described with reference to FIG. 3 is required.

기준 영상의 설정이 완료되면, 사용자 인터페이싱을 위해 카메라로 입력되는 영상을 수신한다(S310). When the setting of the reference image is completed, an image input to the camera for user interface is received (S310).

입력된 영상에 대해서는 사용자 인터페이스를 위한 영상의 특징부를 추출한다(S320). 여기서, 본 발명에 따른 영상의 특징부는 기준 영상의 대상이 되는 사용자의 신체 부분으로, 사용자의 손가락 또는 사용자의 양눈, 사용자의 머리 등 사용자 인터페이싱에 사용될 수 있는 신체의 다양한 부분이 될 수 있다 할 것이다. For the input image, the feature of the image for the user interface is extracted (S320). Here, the feature of the image according to the present invention is the body part of the user, which is the target of the reference image, and may be various parts of the body that can be used for user interfacing, such as a user's finger or both eyes of the user, and the user's head. .

추출된 영상의 특징부에 대해서는, 기준 영상 설정 과정에서 확정되어 단말이 저장하고 있던 기준 영상과의 비교 작업을 수행한다(S340).
The feature of the extracted image is compared with the reference image which is determined in the reference image setting process and stored by the terminal (S340).

본 발명에 따른 단말 장치 혹은 사용자 인터페이스 장치는, 실제 추출된 영상의 특징부와 기준 영상의 비교를 통해 도출되는 캘리브레이션 데이터를 이용해 콘텐츠를 구동한다(S350). 예를 들어, 사용자의 손가락이 기준 영상이라고 하면 실체 추출된 특징부인 손가락 부분의 두께가 기준 영상의 두께보다 작은 경우와 기준 영상의 두께보다 큰 경우를 구분할 수 있고, 이를 콘텐츠 구동에 사용할 수 있다. The terminal device or the user interface device according to the present invention drives the content using the calibration data derived by comparing the feature of the actually extracted image and the reference image (S350). For example, if the user's finger is the reference image, the thickness of the finger portion, which is the extracted feature, may be distinguished from the thickness of the reference image and the thickness of the reference image, which may be used to drive the content.

여기서, 사용자 인터페이싱을 위한 기준 영상이 2개 이상 설정된 경우라면, 특징 추출 단계(S320)에서 추출되는 부분 영상은 기준 영상의 개수에 따라 2개 이상이 될 것이며, 추출된 복수의 특징을 기준 영상과 비교하는 절차(S340)에서도 둘 이상의 기준 영상과 실제 입력되는 둘 이상의 특징이 되는 부분 영상을 비교한 두 가지 이상의 결과를 종합적으로 판단하여 콘텐츠 또는 어플리케이션 구동에 사용할 수 있다.Here, if two or more reference images for user interfacing are set, the partial image extracted in the feature extraction step S320 will be two or more according to the number of reference images, and the plurality of extracted features are compared with the reference image. In the comparison process (S340), two or more results of comparing two or more reference images and partial images that are two or more features that are actually input may be comprehensively determined and used to drive content or an application.

이 경우 두 개 이상의 기준 영상을 설정하는 절차가 필요할 것이며, 두 개 이상의 기준 영상의 움직임을 복합적이고 통합적으로 판단하여 사용자 인터페이싱에 사용한다.In this case, a procedure for setting two or more reference images will be required, and the motions of two or more reference images are determined in a complex and integrated manner and used for user interfacing.

한편, 앞서 언급한 바와 같이, 사용자가 인터페이싱에 사용할 기준 영상을 변경하고자 하는 경우(S360)에는 사용자 인터페이싱을 위한 기준 영상을 재설정한다(S200).Meanwhile, as described above, when the user wants to change the reference image to be used for interfacing (S360), the reference image for user interfacing is reset (S200).

이후 카메라로부터 영상이 입력되는 경우 본 발명에 따른 사용자 인터페이스는 재설정된 기준 영상에 따라 카메라로부터 입력되는 영상의 특징부를 추출하고, 해당 특징부에 대한 캘리브레이션을 수행하며 캘리브레이션된 데이터에 따라 해당 콘텐츠를 구동한다. Then, when an image is input from the camera, the user interface according to the present invention extracts a feature of the image input from the camera according to the reset reference image, performs calibration on the feature, and drives the corresponding content according to the calibrated data. do.

도 4에서는 사용자 인터페이싱에 사용되는 움직임을 대상의 깊이 관련 움직임에 집중하여 설명하였으나, 본 발명의 다른 측면에 따른 사용자 인터페이싱 방법에 따르면, 사용자의 깊이 관련 움직임뿐 아니라, 카메라를 기준으로 하였을 때 평면 방향(또는 수평)의 움직임 또한 사용자 인터페이싱에 반영한다.
In FIG. 4, the motion used for user interfacing has been described by focusing on the depth-related motion of the object. According to the user interfacing method according to another aspect of the present invention, not only the depth-related motion of the user, but also the plane direction based on the camera (Or horizontal) movement is also reflected in the user interface.

도 5는 본 발명에 따른 단말 장치의 블록 구성도이다. 5 is a block diagram of a terminal device according to the present invention.

본 발명에서 예시하는 단말 장치는 휴대용 통신 단말, 예를 들어 스마트폰 등이 될 수 있다. The terminal device exemplified in the present invention may be a portable communication terminal, for example, a smartphone.

도 5에 도시된 바와 같이 본 발명에 따른 휴대용 통신 단말 장치는, 사용자 인터페이스부(100), 통신데이터 송수신부(200), 무선통신 프로세서(300), 데이터 저장부(400)를 포함할 수 있다. As shown in FIG. 5, the portable communication terminal device according to the present invention may include a user interface unit 100, a communication data transmission / reception unit 200, a wireless communication processor 300, and a data storage unit 400. .

본 발명에 따른 사용자 인터페이스부(100)는 입력된 사용자 관련 영상으로부터 사용자 인터페이싱에 사용할 대상 관련 영상을 추출하고, 상기 추출된 대상 관련 영상과 기준 영상을 비교하여 대상의 움직임을 판단하며, 대상의 움직임에 따라 콘텐츠 또는 어플리케이션를 구동한다.The user interface unit 100 according to the present invention extracts the object-related image to be used for user interfacing from the input user-related image, compares the extracted object-related image and the reference image to determine the movement of the object, the movement of the object Drive content or application accordingly.

사용자 인터페이스부(100)는 또한, 기준 영상의 대상이 될 사용자 신체의 일부 관련 가상의 그래픽과 카메라를 통해 입력되는 대상의 실제 영상을 합성하여 디스플레이하고, 카메라를 통해 입력되는 대상의 영상을 그래픽에 매칭시킨 시점에서 그래픽에 매칭된 대상의 영상을 기준 영상으로 설정한다.The user interface unit 100 also synthesizes and displays a part of the virtual graphics of the user's body to be the target image and the actual image of the object input through the camera, and displays the image of the object input through the camera on the graphic. The image of the target matched with the graphic at the matching time point is set as the reference image.

통신데이터 송수신부(200)는 무선 통신 단말의 고유 역할에 따른 데이터, 즉 무선통신 데이터를 송수신한다. 이때의 무선통신 데이터는 사용자의 음성 통화 및 음성 외의 데이터를 포함한다. 통신데이터 송수신부(200)는 해당 단말 장치가 지원하는 규격에 따라 통신데이터를 기지국으로 전송하고, 기지국이 단말로 송신하는 통신데이터를 수신하는 역할을 수행한다. 본 발명에 따른 단말 장치 및 이러한 단말 장치와 통신하는 이동통신 시스템은 3GPP, IEEE 등 다양한 통신 규격을 따를 수 있다. The communication data transceiver 200 transmits and receives data according to a unique role of the wireless communication terminal, that is, wireless communication data. The wireless communication data at this time includes a user's voice call and data other than voice. The communication data transmission / reception unit 200 transmits communication data to a base station according to a standard supported by the corresponding terminal device, and serves to receive communication data transmitted from the base station to the terminal. A terminal device according to the present invention and a mobile communication system for communicating with such a terminal device may comply with various communication standards such as 3GPP and IEEE.

무선통신 프로세서(300)는 통신데이터 송수신부(200)가 수신한 데이터에 대해 수신 처리를 수행하여 사용자에게 음성, 텍스트, 영상의 형태로 제공하거나 사용자의 선택에 따라 수신한 데이터를 데이터 저장부(400)에 저장한다. The wireless communication processor 300 performs a reception process on the data received by the communication data transmission / reception unit 200 to provide the user in the form of voice, text, or image, or transmit the received data according to a user's selection. 400).

무선통신 프로세서(300)는 또한 사용자로부터 입력되는 음성 통화 데이터에 대해 송신 처리를 수행하여 통신데이터 송수신부(200)에 전달한다. The wireless communication processor 300 also transmits the voice call data input from the user to the communication data transmission / reception unit 200.

데이터 저장부(400)는 본 발명에 따라 사용자 인터페이싱에 사용할 대상 관련 기준 영상 및 기준 영상 관련 정보를 저장한다. The data storage unit 400 stores the object related reference image and the reference image related information to be used for user interfacing according to the present invention.

데이터 저장부(400)는 또한, 단말의 무선통신을 통해 발생하는 각종 데이터를 저장한다. 데이터 저장부(400)가 저장하는 데이터는 사용자 단말이 주고받는 각종 문자 텍스트, 영상, 전화번호 등의 연락처를 포함할 수 있다. 데이터 저장부(400)는 또한, 사용자 단말 장치에서 실행 가능한 각종 콘텐츠 및 어플리케이션 프로그램을 저장한다. The data storage unit 400 also stores various data generated through wireless communication of the terminal. The data stored by the data storage unit 400 may include contact information such as various texts, images, and phone numbers exchanged by the user terminal. The data storage unit 400 also stores various contents and application programs executable in the user terminal device.

데이터 저장부(400)에 저장되는 각종 데이터는 데이터베이스의 형태로 저장될 수 있는데, 본 발명에서 사용되는 데이터베이스(database)라는 용어는 관계형(relational), 객체지향형(object-oriented) 데이터베이스와 같이 엄밀한 형태의 데이터베이스를 의미하는 것이 아니라 정보를 저장하는 기능적 구성요소를 의미하는 것으로, 다양한 형태로 구현될 수 있다.Various data stored in the data storage unit 400 may be stored in the form of a database. The term database used in the present invention is a strict form such as a relational or object-oriented database. It does not mean a database but means a functional component that stores information, and may be implemented in various forms.

예컨대, 본 발명에서 사용되는 파일 베이스(file-base) 형태의 간단한 정보 저장 구성요소로서 구성될 수도 있다.For example, it may be configured as a simple information storage component in the form of a file-base used in the present invention.

실시예들을 통해 상술한 바와 같은 본 발명에 따르면, 사용자가 사용하는 단말 장치와 사용자 간의 거리를 기준으로 한 사용자의 움직임, 즉 사용자 움직임의 깊이 정보를 이용해 단말 장치를 제어 가능하다.According to the present invention as described above through the embodiments, it is possible to control the terminal device using the user's movement, that is, the depth information of the user's movement based on the distance between the terminal device used by the user and the user.

따라서, 본 발명을 이용한 인터페이싱을 통해 사용자가 보다 자유롭게 전자 기기를 활용할 수 있다.
Therefore, the user can utilize the electronic device more freely through the interface using the present invention.

이상 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. It will be possible.

100: 사용자 인터페이스 110: 카메라 수신부
120: 특징 추출부 130: 동작 인식부
140: 콘텐츠 구동부 150: 디스플레이부
200: 통신데이터 송수신부 300: 무선통신 프로세서
400: 데이터 저장부 100: user interface 110: camera receiver
120: feature extraction unit 130: motion recognition unit
140: content driver 150: display unit
200: communication data transceiver 300: wireless communication processor
400: data storage

Claims

Setting a reference image of a target to be used for user interfacing;
Recognizing an object to be used for the user interfacing from an input user related image;
Comparing the recognized object with the reference image to determine a depth-related movement of the object; And
And driving an application according to a depth-related movement of the object.

The method according to claim 1,
And the object to be used for the user interfacing is a part of the user's body.

The method according to claim 2,
A portion of the user's body is at least one of a user's hand, a user's finger, a user's palm, a user's face, a user's lips, a user's nose, a user's eyes, and a user's head. .

The method according to claim 1,
Determining a depth-related movement of the object by comparing the recognized object and the reference image,
And determining a depth related position of the target image by comparing the size of the target image recognized in the input user related image with the size of the reference image.

The method of claim 4,
The size of the target image recognized in the input user-related image is defined as the width, length, or width of the image.

The method according to claim 1,
Setting a reference image for the object to be used for the user interfacing,
Setting a part of a user's body to be a target of the reference image;
Synthesizing and displaying a virtual graphic related to a partial image and a reference image of the set user's body input through a camera;
Matching a part of a related image of the set user's body input through the camera to the virtual graphic; And
And storing the user body part related image matched with the virtual graphic as a reference image.

The method according to claim 1,
Recognizing an operation of a target to be used for the user interfacing from the input user-related image,
Extracting the object related feature from the entire image input through the camera.

Claim 1:
The depth-related movement,
And a movement based on a distance from a camera to a user's body part set as the object.

A receiver which receives a user related image;
A feature extractor configured to extract a target related image to be used for user interfacing from the input user related image;
A motion recognition unit comparing the extracted object-related image with a reference image to determine a depth-related movement of the object; And
And a content driver configured to drive content according to a depth-related movement of the object.

The method of claim 9,
And a display unit for synthesizing and displaying the extracted object-related image and the reference image provided by the motion recognition unit.

The method of claim 9,
And the object to be used for the user interfacing is a part of the user's body.

The method of claim 11,
A portion of the user's body is at least one of a user's hand, a user's finger, a user's palm, a user's face, a user's lips, a user's nose, a user's eyes, and a user's head. .

The method of claim 11,
Wherein the motion recognition unit comprises:
And a depth related position of the target image is determined by comparing the size of the target image recognized in the input user related image with the size of the reference image.

The method according to claim 13,
The size of the target image recognized in the input user-related image is defined as the width, length, or width of the image.

The method of claim 9,
The depth-related movement,
And a movement based on a distance from a camera to a user's body part set as the object.

The method of claim 9,
The reference image,
And an image of the object input to the camera when the object to be used for user interfacing is located at the reference point.

Extracting the object-related image to be used for the user interfacing from the input user-related image, and compares the extracted object-related image and the reference image to determine the depth-related movement of the object, the content according to the depth-related movement of the object A user interface unit for driving the; And
And a data storage unit which stores a target-related reference image for use in the user interfacing.

18. The method of claim 17,
The user interface unit,
Partial graphics of the user's body to be the target image and the actual image of the object input through the camera are synthesized and displayed, and the image is matched to the graphic when the image of the object input through the camera is matched with the graphic. And setting the image of the target object as the reference image.

18. The method of claim 17,
The depth-related movement,
And a movement based on a distance from a camera to a user's body part set as the object.