KR20110032429A

KR20110032429A - Apparatus and method for recognzing gesture in mobile device

Info

Publication number: KR20110032429A
Application number: KR1020090089906A
Authority: KR
Inventors: 송희준; 윤제한; 심현식; 박영희; 이칠우; 김민욱; 오치민; 디 아후라흐만
Original assignee: 삼성전자주식회사; 전남대학교산학협력단
Priority date: 2009-09-23
Filing date: 2009-09-23
Publication date: 2011-03-30
Also published as: KR101622197B1

Abstract

PURPOSE: An apparatus for improving gesture recognizing ratio in a mobile device is provided to generate integrated specification information from a plurality of input data, and recognize a gesture. CONSTITUTION: A gesture recognition unit(102) comprises a data acquisition unit(104), a feature extraction unit(106), and an integrated information generator(108). The data acquisition unit acquires a plurality of input data through cameras. The feature extraction unit performs pre-processing about a plurality of input data. The feature information extraction unit classifies the domain capable of gesture recognition. The integrated information generator mixes the extracted feature information to one integrated feature information.

Description

Apparatus and method for recognizing a gesture in a mobile device {APPARATUS AND METHOD FOR RECOGNZING GESTURE IN MOBILE DEVICE}

본 발명은 휴대용 단말기의 제스처 인식에 관한 장치 및 방법에 관한 것으로, 특히 휴대용 단말기에서 다수의 입력 데이터를 이용한 통합된 특징 정보를 사용하여 제스처 인식율을 높이도록 하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for gesture recognition in a portable terminal, and more particularly, to an apparatus and method for increasing a gesture recognition rate by using integrated feature information using a plurality of input data in a portable terminal.

최근 휴대용 단말기의 급격한 발달에 따라 특히, 무선 음성 통화 및 정보 교환이 가능한 휴대용 단말기는 필수품이 되었다. 휴대용 단말기 초기에는 단순히 휴대할 수 있고, 무선 통화가 가능한 것으로 인식되었으나, 그 기술이 발달함과 무선 인터넷의 도입에 따라 상기 휴대용 단말기는 단순한 전화 통화의 목적뿐만 아니라 게임, 위성 방송의 시청, 근거리 통신을 이용한 리모컨, 장착된 디지털 카메라에 의한 이미지 촬영, 및 일정 관리 등의 그 활용범위가 갈수록 커지고 있어 사용자의 욕구를 충족시키고 있다.Recently, with the rapid development of the portable terminal, a portable terminal capable of wireless voice call and information exchange has become a necessity. In the early days of portable terminals, it was recognized that they could simply be carried and wirelessly talked. However, with the development of the technology and the introduction of wireless Internet, the portable terminals are not only for the purpose of telephone calls but also for watching games, satellite broadcasting, and short-range communication. The range of applications such as remote control using a digital camera, image capture by a digital camera, and schedule management is increasing to meet the needs of users.

상기와 같은 디지털 카메라 기능은 정지된 영상(still picture)뿐만 아니라 움직이는 피사체를 위한 동영상(moving picture) 촬영 기능도 구비하고 있는데, 특히 휴대용 영상 촬영 기기는 인물에 대한 영상을 촬영하는데에 많이 이용되고 있다.The digital camera function has a function of capturing a moving picture for a moving subject as well as a still picture. In particular, a portable image capturing device is widely used for capturing an image of a person. .

상기와 같은 휴대용 단말기를 이용하여 인물의 정지영상을 촬영시 인물의 위치 선정이 쉽지 않기 때문에 인물이 계속해서 움직이는 상황에서 정면 얼굴을 찍고 싶을 경우, 정면 얼굴이 화면이 들어오는 순간을 포착해서 정확하게 셔터를 누르는 것이 힘들다는 문제점이 있다.When taking a still image of a person using the portable terminal as described above, it is not easy to select the position of the person. When a person wants to take a front face while the person is constantly moving, the front face captures the moment when the screen comes in and accurately releases the shutter. There is a problem that it is difficult to press.

상기와 같은 문제점을 해결하기 위하여 최근에는 얼굴표정 인식 시스템 및 사용자 인지/적응 HCI시스템구축에 대한 연구가 활발히 진행되고 있으나 표정인지를 위해서는 정확한 얼굴 검출이 전제되어야 하며 정지 영상이나 동영상으로부터 정확하게 얼굴과 특정 영역의 제스처를 검출해 낸다는 것은 결코 쉽지가 않다.In order to solve the above problems, researches on constructing facial expression recognition system and user's recognition / adaptation HCI system have been actively conducted in recent years, but accurate face detection is required for facial recognition. Detecting gestures in an area is never easy.

일반적인 휴대용 단말기에서 제스처를 검출하기 위해서는 카메라를 통해 입력된 영상을 해석하여 특정 영역을 인식하는 것으로, 주로 영상으로부터 특징정보를 추출하여 이를 시계열 데이터 인식 알고리즘을 통해 인식하는 방법을 사용하고 있다. 시계열 데이터 인식 알고리즘으로는 주로 은닉마르코프모델 알고리즘(HMM)이 널리 사용되고 있다.In order to detect a gesture in a general portable terminal, a specific region is recognized by analyzing an image input through a camera, and a method of extracting feature information from an image and recognizing it through a time series data recognition algorithm is mainly used. Hidden Markov Model Algorithm (HMM) is widely used as a time-series data recognition algorithm.

상기와 같은 특정 영역을 확률공간 상에서 유한의 상태집합으로 이루어진다고 가정하면, 상기 은닉마르코프모델은 이들 상태간의 전이확률과 상태가 결정되었을 때 그 상태를 대표하는 관측값(observation)에 의해 제스처를 결정하게 된다.Assuming that the specific region is composed of a finite state set in the probability space, the hidden Markov model determines the gesture by the transition probability between these states and the observations representing the states when the states are determined. Done.

상기와 같은 방법은 다수의 훈련에 의해 수행할 수 있으나 신호에는 존재하 는 노이즈로 인하여 안정된 관측값을 얻기가 힘들다. 이를 개선하기 위해 영상의 공분산 행렬(Covariance Matrix)을 사용한 방법이 제안되었으나 이 방법은 형상변화가 복잡한 제스처 인식에는 안정성이 떨어지며 또한 움직임 데이터(motion data)와 같이 차원이 다른 정보를 포함하고 있지 않아 인식률이 낮다. 또 이를 단순한 패턴정보로만 이용함으로써 잡음에 민감한 특성을 갖게 되어 인식률이 떨어지게 되는 문제점이 있다.The above method can be performed by a number of drills, but it is difficult to obtain stable observations due to noise present in the signal. In order to improve this problem, a method using a covariance matrix of images has been proposed, but this method is less stable for gesture recognition with complex shape changes, and it does not contain information with different dimensions such as motion data. Is low. In addition, there is a problem in that the recognition rate is reduced by using this as simple pattern information, which is sensitive to noise.

본 발명은 상술한 바와 같은 문제점을 해결하기 위하여 도출된 것으로서, 본 발명의 목적은 휴대용 단말기에서 제스처 인식율을 향상시키기 위한 장치 및 방법을 제공함에 있다.The present invention was derived to solve the above problems, and an object of the present invention is to provide an apparatus and method for improving a gesture recognition rate in a portable terminal.

본 발명의 다른 목적은 휴대용 단말기에서 다수의 입력 데이터를 이용한 통합된 특징 정보를 사용하여 제스처 인식율을 높이도록 하는 장치 및 방법을 제공함에 있다.Another object of the present invention is to provide an apparatus and method for increasing a gesture recognition rate by using integrated feature information using a plurality of input data in a portable terminal.

본 발명의 또 다른 목적은 휴대용 단말기에서 통합된 특징 정보를 확률공간으로 맵핑하여 확률밀도 파라미터를 이용하여 효과적이고 안정된 거리 측정을 통해 관측값(observation)을 정확하게 분류하기 위한 장치 및 방법을 제공함에 있다.It is still another object of the present invention to provide an apparatus and method for accurately classifying observations through effective and stable distance measurement using probability density parameters by mapping feature information integrated in a portable terminal into a probability space. .

상술한 목적들을 달성하기 위한 본 발명의 제 1 견지에 따르면, 휴대용 단말기에서 제스처를 인식하기 위한 장치는 다수의 입력 데이터를 입력받아 다수의 특징 정보를 추출하고, 상기 추출한 특징 정보를 하나로 통합한 통합 정보로 생성하는 제스처 인식부를 포함하는 것을 특징으로 한다.According to a first aspect of the present invention for achieving the above objects, a device for recognizing a gesture in a portable terminal is a combination of a plurality of input data to extract a plurality of feature information, the integration of the extracted feature information into one And a gesture recognition unit generating information.

상술한 목적들을 달성하기 위한 본 발명의 제 2 견지에 따르면, 휴대용 단말기에서 제스처를 인식하기 위한 방법은 다수의 입력 데이터를 입력받는 과정과, 상기 입력받은 입력 데이터에서 다수의 특징 정보를 추출하는 과정과, 상기 추출한 특징 정보를 하나로 통합한 통합 정보로 생성하는 과정을 포함하는 것을 특징으로 한다.According to a second aspect of the present invention for achieving the above object, a method for recognizing a gesture in a portable terminal is a process of receiving a plurality of input data, the process of extracting a plurality of feature information from the input data And generating the integrated information by integrating the extracted feature information into one.

상술한 바와 같이 본 발명은 휴대용 단말기에서 제스처 인식율을 향상시키기 위한 장치 및 방법에 관한 것으로 다수의 입력 데이터에서 추출한 특징 정보들을 하나의 통합된 특징 정보로 생성하여 제스처를 인식함으로써, 기존의 휴대용 단말기보다 제스처 인식율이 향상되고, 복잡한 제스처에 대한 인식을 가능하게 한다.As described above, the present invention relates to an apparatus and a method for improving a gesture recognition rate in a portable terminal, and by generating feature information extracted from a plurality of input data as one integrated feature information to recognize a gesture, The gesture recognition rate is improved and enables the recognition of complex gestures.

이하 본 발명의 바람직한 실시 예를 첨부된 도면의 참조와 함께 상세히 설명한다. 그리고, 본 발명을 설명함에 있어서, 관련된 공지기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단된 경우 그 상세한 설명은 생략한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In describing the present invention, when it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

이하 설명에서는 휴대용 단말기에서 제스처 인식율을 향상시키기 위하여 다수의 입력 데이터에서 추출한 특징 정보들을 하나의 통합된 특징 정보로 생성하기 위한 장치 및 방법에 대하여 설명할 것이다. 이하 설명에서 상기 제스처는 추상적인 의미를 지닌 물체의 연속 동작이나 포즈(인체의 특정 자세)를 의미한다.In the following description, an apparatus and method for generating feature information extracted from a plurality of input data as one integrated feature information in order to improve a gesture recognition rate in a portable terminal will be described. In the following description, the gesture refers to a continuous motion or pose (a specific posture of a human body) of an object having an abstract meaning.

도 1은 본 발명에 따라 제스처 인식율을 향상시키기 위한 휴대용 단말기의 구성을 도시한 블록도이다.1 is a block diagram illustrating a configuration of a portable terminal for improving a gesture recognition rate according to the present invention.

상기 도 1을 참조하면, 상기 휴대용 단말기는 제어부(100), 제스처 인식부(102), 메모리부(110), 입력부(112), 표시부(114) 및 통신부(116)를 포함하여 구성할 수 있으며, 상기 제스처 인식부(102)는 데이터 획득부(104), 특징 추출부(106) 및 통합 정보 생성부(108)를 포함하여 구성할 수 있다.Referring to FIG. 1, the portable terminal may include a control unit 100, a gesture recognition unit 102, a memory unit 110, an input unit 112, a display unit 114, and a communication unit 116. The gesture recognition unit 102 may include a data acquisition unit 104, a feature extraction unit 106, and an integrated information generation unit 108.

먼저 상기 휴대용 단말기의 제어부(100)는 상기 휴대용 단말기의 전반적인 동작을 제어한다. 예를 들어, 음성통화 및 데이터 통신을 위한 처리 및 제어를 수행하며, 통상적인 기능에 더하여 본 발명에 따라, 상기 제어부(100)는 상기 휴대용 단말기에서 제스처 인식을 수행할 경우, 다수의 입력 데이터에서 추출한 특징 정보들을 통합적으로 조합하여 하나의 특징 정보로 생성함으로써, 제스처 인식율을 높이도록 처리한다.First, the controller 100 of the portable terminal controls the overall operation of the portable terminal. For example, the processor 100 performs processing and control for voice call and data communication, and in addition to the usual function, when the controller 100 performs gesture recognition in the portable terminal, By combining the extracted feature information into one feature information, the feature is processed to increase the gesture recognition rate.

이후, 상기 제어부(100)는 상기 통합한 하나의 특징 정보를 이용하여 제스처를 인식하도록 함으로써 복잡한 제스처 인식을 가능하게 한다.Subsequently, the controller 100 enables the complex gesture recognition by recognizing a gesture using the integrated feature information.

상기 제스처 인식부(102)는 상기 제어부(100)의 제어를 받아 제스처 인식에 필요한 특징 정보를 추출하며, 본 발명에 따라 제스처 인식율을 높이기 위하여 다수의 입력 데이터에서 추출한 특징 정보를 통합하여 하나의 특징 정보로 생성하도록 한다.The gesture recognition unit 102 extracts feature information required for gesture recognition under the control of the controller 100, and integrates feature information extracted from a plurality of input data to increase gesture recognition rate according to the present invention. Generate information.

상기와 같이 다수의 입력 데이터에서 추출한 특징 정보를 통합하여 하나의 특징 정보로 생성하는 제스처 인식부(102)는 상기 데이터 획득부(104)로 하여금 특징 정보를 포함하는 다수의 입력 데이터를 획득하도록 처리한다.As described above, the gesture recognizing unit 102 integrating the feature information extracted from the plurality of input data into one feature information may process the data obtaining unit 104 to acquire a plurality of input data including the feature information. do.

이후, 상기 제스처 인식부(102)는 상기 데이터 획득부(104)에 의해 획득된 다수의 입력 데이터에서 특징 정보를 추출하도록 하는데 이는 상기 특징 정보 추출부(106)로 하여금 수행하도록 한다.Thereafter, the gesture recognition unit 102 extracts feature information from a plurality of input data obtained by the data obtaining unit 104, which causes the feature information extracting unit 106 to perform the feature information.

또한, 상기 제스처 인식부(102)는 상기 통합 정보 생성부(108)로 하여금 상기 특징 정보 추출부(106)에 의해 획득된 특징 정보들을 통합하여 하나의 특징 정보로 생성하도록 한다.In addition, the gesture recognition unit 102 causes the integrated information generation unit 108 to integrate the feature information obtained by the feature information extraction unit 106 into one feature information.

상기 제스처 인식부(102)의 데이터 획득부(104)는 다수의 입력 데이터를 획득하는 블록으로 일반 2D 카메라와 3D 카메라(스테레오 카메라 또는 TOF(Time of Flight) 원리를 이용한 카메라) 등으로 구성하여 2차원 영상 데이터, 3차원 영상 데이터 등을 획득한다.The data acquisition unit 104 of the gesture recognition unit 102 is a block for acquiring a plurality of input data, and is composed of a general 2D camera and a 3D camera (a stereo camera or a camera using a time of flight (TOF) principle). Acquire three-dimensional image data, three-dimensional image data, and the like.

상기 제스처 인식부(102)의 특징 정보 추출부(106)는 상기 데이터 획득부(104)에 의해 획득된 다수의 입력 데이터에 대하여 가우시안 필터링(Gaussian filtering), smoothing, 감마보정(gamma correction), 영상 평활화(image equalization), 영상복원 또는 보정(image recover or image correction) 등과 같은 전처리 작업을 수행하여 안정된 신호 처리를 가능하도록 한 후, 제스처 인식이 가능한 영역을 분류한다. 즉, 상기 특징 정보 추출부(106)는 상기 전처리된 데이터에서 칼라정보, 거리정보 등을 이용하여 손 영역, 얼굴 영역, 몸 영역과 같은 특정 부위를 분류하고, 상기 분류한 특정 부위에 대하여 마스킹 작업을 수행한다. 여기에서, 상기 특징 정보 추출부(106)는 상기 특정 부위의 계산 영역을 제한하기 위하여 상기 마스크를 설정하고 설정한 마스크 영역에서 특징 정보를 추출한다.The feature information extractor 106 of the gesture recognizer 102 performs Gaussian filtering, smoothing, gamma correction, and image on a plurality of input data acquired by the data acquirer 104. After performing preprocessing operations such as image equalization, image recovering or image correction to enable stable signal processing, an area capable of gesture recognition is classified. That is, the feature information extractor 106 classifies specific areas such as a hand area, a face area, and a body area by using color information and distance information from the preprocessed data, and masks the classified parts. Do this. Here, the feature information extracting unit 106 sets the mask and extracts feature information from the set mask area in order to limit the calculation area of the specific part.

상기 통합 정보 생성부(108)는 상기 특징 정보 추출부(106)에 의해 추출된 특징 정보들을 하나의 통합된 특징 정보로 조합하도록 한다. 이때, 상기 통합 정보 생성부(108)는 모션을 포현하는데 사용하고, 제스처 인식에 필요한 정보를 조합하여 상기 특징 정보들을 조합하도록 처리한다.The integrated information generation unit 108 combines the feature information extracted by the feature information extraction unit 106 into one integrated feature information. At this time, the integrated information generation unit 108 is used to express the motion, and processes to combine the feature information by combining information necessary for gesture recognition.

상기 휴대용 단말기의 메모리부(110)는 롬(ROM ; Read Only Memory), 램(RAM ; Random Access Memory), 플래쉬롬(flash ROM)으로 구성된다. 상기 롬은 상기 제어부(100) 및, 상기 제스처 인식부(102)의 처리 및 제어를 위한 프로그램의 마이크로코드와 각종 참조 데이터를 저장한다.The memory unit 110 of the portable terminal includes a read only memory (ROM), a random access memory (RAM), and a flash ROM. The ROM stores microcodes and various reference data of a program for processing and controlling the controller 100 and the gesture recognition unit 102.

상기 램은 상기 제어부(100)의 워킹 메모리(working memory)로, 각종 프로그램 수행 중에 발생하는 일시적인 데이터를 저장한다. 또한, 상기 플래쉬롬은 전화번호부(phone book), 발신메시지, 수신메시지를 저장한다.The RAM is a working memory of the controller 100, and stores temporary data generated during execution of various programs. In addition, the flash ROM stores a phone book, a calling message, and a receiving message.

상기 입력부(112)는 정지 영상 또는 동영상의 데이터를 입력받으며 상기 입력부(112)는 0 ~ 9의 숫자키 버튼들과, 메뉴버튼(menu), 취소버튼(지움), 확인버튼, 통화버튼(TALK), 종료버튼(END), 인터넷접속 버튼, 네비게이션 키(또는 방향키) 버튼들 및 문자 입력 키 등 다수의 기능키들을 구비하며, 사용자가 누르는 키에 대응하는 키 입력 데이터를 상기 제어부(100)로 제공한다.The input unit 112 receives data of a still image or a video, and the input unit 112 includes 0 to 9 numeric key buttons, a menu button (menu), a cancel button (erase), a confirmation button, and a call button (TALK). ), A plurality of function keys such as an end button (END), an internet access button, navigation key (or direction key) buttons, and a text input key, and input key input data corresponding to a key pressed by the user to the controller 100. to provide.

상기 표시부(114)는 상기 휴대용 단말기의 동작 중에 발생하는 상태 정보, 제한된 숫자의 문자들, 다량의 동영상 및 정지영상 등을 디스플레이한다. 상기 표시부(114)는 컬러 액정 디스플레이 장치(LCD ; Liquid Crystal Display)를 사용할 수 있으며 상기 표시부(114)는 터치 입력 장치를 구비하여 터치 입력 방식의 휴대 용 단말기에 적용할 경우 입력 장치로 사용할 수 있다.The display unit 114 displays state information generated during the operation of the portable terminal, a limited number of characters, a large amount of video and still images, and the like. The display unit 114 may use a color liquid crystal display (LCD), and the display unit 114 may include a touch input device to be used as an input device when applied to a portable terminal of a touch input method. .

상기 통신부(116)는 안테나(미도시)를 통해 입출력되는 데이터의 무선신호를 송수신 처리하는 기능을 수행한다. 예를 들어, 송신인 경우, 송신할 데이터를 채널 코딩(Channel coding) 및 확산(Spreading)한 후, RF처리하여 송신하는 기능을 수행하고, 수신인 경우, 수신된 RF신호를 기저대역신호로 변환하고 상기 기저대역신호를 역 확산(De-spreading) 및 채널 복호(Channel decoding)하여 데이터를 복원하는 기능을 수행한다.The communication unit 116 transmits and receives a radio signal of data input / output through an antenna (not shown). For example, in the case of transmission, after performing channel coding and spreading on the data to be transmitted, RF processing is performed to transmit the data. In the case of reception, the received RF signal is converted into a baseband signal. The baseband signal is de-spreaded and channel decoded to restore data.

상기 제스처 인식부(102)의 역할은 상기 휴대용 단말기의 제어부(100)에 의해 수행할 수 있으나, 본 발명에서 이를 별도로 구성하여 도시한 것은 설명의 편의를 위한 예시적인 구성이지 결코 본 발명의 범위를 제한하자는 것이 아니며, 당업자라면 본 발명의 범위 내에서 다양한 변형 구성이 가능하다는 것을 알 수 있을 것이다. 예를 들어, 이들 모두를 상기 제어부(100)에서 처리하도록 구성할 수도 있다.The role of the gesture recognition unit 102 may be performed by the control unit 100 of the portable terminal. However, the configuration of the gesture recognition unit 102 is an exemplary configuration for convenience of description, and the scope of the present invention will never be limited. It is not intended to be limiting and those skilled in the art will recognize that various modifications are possible within the scope of the invention. For example, the controller 100 may be configured to process all of them.

도 2는 본 발명의 바람직한 일 실시 예에 따른 특징 정보 추출부의 구성을 상세히 도시한 블록도이다.2 is a block diagram showing in detail the configuration of the feature information extraction unit according to an embodiment of the present invention.

상기 도 2를 참조하면, 상기 특징 정보 추출부(200)는 마스크 생성부(202), 특징 추출부(204), 특징 조합부(206)를 포함하여 구성할 수 있다.Referring to FIG. 2, the feature information extractor 200 may include a mask generator 202, a feature extractor 204, and a feature combiner 206.

상기 마스크 생성부(202)는 제스처 인식부(102)에 의해 전 처리된 다수의 입력 데이터에서 계산 영역을 제한하기 위하여 특정 부위(예; 몸, 손, 얼굴 등)를 분 류하고 제스처 인식에 필요한 특징 정보를 포함한 영역을 따로 분리하여 마스크를 생성하도록 처리한다.The mask generator 202 classifies a specific part (eg, a body, a hand, a face, etc.) in order to limit a calculation area in a plurality of input data preprocessed by the gesture recognition unit 102, and is required for gesture recognition. This process separates the area containing the feature information and generates a mask.

상기 특징 추출부(204)는 상기 마스크 생성부(202)에 의해 설정된 마스크 영역을 확인하고, 상기 마스크가 설정된 영역에 포함된 특징 정보를 추출하며, 상기 특징 조합부(206)는 상기 특징 추출부에 의해 추출된 특징 정보들을 하나의 통합된 특징 정보로 조합하기 위한 정보를 확인한다.The feature extractor 204 checks a mask area set by the mask generator 202, extracts feature information included in the area where the mask is set, and the feature combiner 206 is the feature extractor. Confirm the information for combining the feature information extracted by the into one unified feature information.

이상은 휴대용 단말기에서 제스처 인식율을 향상시키기 위한 장치에 대하여 설명하였고, 이하 설명에서는 본 발명에 따른 상기 장치를 이용하여 다수의 입력 데이터에서 추출한 특징 정보들을 하나의 통합된 특징 정보로 생성하여 제스처 인식율을 향상시키기 위한 방법에 대하여 설명할 것이다.The above description has been made of a device for improving a gesture recognition rate in a portable terminal, and in the following description, the gesture recognition rate is generated by generating feature information extracted from a plurality of input data as one integrated feature information using the device according to the present invention. A method for improving will be described.

도 3은 본 발명의 바람직한 일 실시 예에 따른 통합 정보 생성부에서 다수의 입력 데이터에서 추출한 특징 정보들을 조합하는 과정을 도시한 도면이다.3 is a diagram illustrating a process of combining feature information extracted from a plurality of input data in an integrated information generating unit according to an exemplary embodiment of the present invention.

상기 도 3을 참조하면, 상기 통합 정보 생성부(108)는 특징 정보 추출부(106)에 의해 추출된 다수의 특징 정보(300)를 시계열 정보로 생성(301)한 후 모션의 표현에 필요한 데이터를 확인한다.Referring to FIG. 3, the integrated information generating unit 108 generates 301 a plurality of feature information 300 extracted by the feature information extracting unit 106 as time series information, and then needs data for expressing motion. Check.

여기에서, 상기 모션의 표현에 필요한 데이터는 추출한 특징 정보에서 제스처를 인식하는데 사용하는 데이터로 1차원 데이터의 경우, 신호의 극대점이나 파워스펙트럼의 변화량 등이 되고, 1차원 데이터의 경우 영상기울기히스토그램의 빈의 값( bean value of histogram of image gradient), 물체 경계선상의 픽셀들에 대한 종횡방향의 미분값(gradient value for pixels in x and y direction), 3차원 데이터의 경우는 3D 표면의 법선 벡터(normal vector of surface polygon), 꼭지점 영역의 접선 벡터(tangent vector), 움직임 정보의 경우는 이전 위치로부터의 새로운 위치로의 이동 벡터, 가속도, 영역 내 각 픽셀 간의 모션 벡터의 일관성(consistency), 면적의 변화 등을 사용할 수 있다.Herein, the data necessary for expressing the motion is data used for recognizing a gesture from the extracted feature information, and in the case of 1-dimensional data, the maximum point of the signal or the amount of change in power spectrum, etc., and in the case of 1-dimensional data, Bean value of histogram of image gradient, gradient value for pixels in x and y direction, or 3D surface normal vector for 3D data vector of surface polygons, tangent vectors of vertex areas, motion information in the case of motion vectors from new locations to new locations, acceleration, consistency of motion vectors between each pixel in the area, and changes in area. Etc. can be used.

이후, 상기 통합 정보 생성부(108)는 상기 확인한 데이터(예; 가속도(303), 기울기(305), 극대점의 변화량(307))를 이용하여 다수의 입력데이터에서 추출한 특징 정보들을 하나의 통합적인 특징 정보로 조합(309)한다.Thereafter, the integrated information generation unit 108 integrates the feature information extracted from the plurality of input data using the checked data (eg, the acceleration 303, the slope 305, and the maximum change amount 307). The combination 309 is used as feature information.

도 4는 본 발명의 바람직한 일 실시 예에 따른 통합 정보 생성부에서 움직임 정보로부터 추출한 특징 정보들을 조합하는 과정을 도시한 도면이다.4 is a diagram illustrating a process of combining feature information extracted from motion information in an integrated information generating unit according to an exemplary embodiment of the present invention.

상기 도 4를 참조하면, 상기 통합 정보 생성부(108)는 획득한 영상 데이터(401)에서 특정 부위를 선택하여 마스크 작업을 수행(402)한다.Referring to FIG. 4, the integrated information generator 108 selects a specific region from the acquired image data 401 and performs a mask operation (402).

이후, 상기 통합 정보 생성부(108)는 제스처를 인식하는데 사용하는 데이터(모션의 표현에 필요한 데이터)를 이용하여 특징 정보를 추출한다.Thereafter, the integrated information generation unit 108 extracts feature information using data (data required for expressing a motion) used to recognize a gesture.

이때, 상기 통합 정보 생성부(108)는 상기 제스처를 인식하는데 사용하는 데이터를 이용하여 상기 마스크 처리한 영역(404)의 픽셀에서 특징 정보를 추출(403)한다.In this case, the integrated information generation unit 108 extracts (403) feature information from the pixels of the masked region 404 using data used to recognize the gesture.

만약, 상기 제스처를 인식하는데 사용하는 데이터를 색, 엣지, 기울기로 사용하면 상기 통합 정보 생성부(108)는 상기 영상 데이터의 마스크 영역의 모든 픽 셀에 대하여 상기 색, 엣지, 기울기의 요소를 추출(403)하고, 상기 추출한 요소를 하나의 통합된 정보로 생성(405)하도록 처리한다. If the data used to recognize the gesture is used as color, edge, and slope, the integrated information generation unit 108 extracts the elements of the color, edge, and slope for all pixels in the mask area of the image data. And extracts the extracted element into one unified information (405).

도 5는 본 발명에 따른 휴대용 단말기에서 제스처를 인식하는 과정을 도시한 흐름도이다.5 is a flowchart illustrating a process of recognizing a gesture in a portable terminal according to the present invention.

상기 도 5를 참조하면, 상기 휴대용 단말기는 먼저 501단계에서 다수의 입력 데이터를 획득하는 과정을 수행한다. 여기에서, 상기 다수의 입력 데이터는 일반 2D 카메라와 3D 카메라(스테레오 카메라 또는 TOF(Time of Flight) 원리를 이용한 카메라) 등을 이용하여 획득한 2차원 영상 데이터, 3차원 영상 데이터 등을 말하는 것으로 본 발명에 따른 휴대용 단말기는 기존의 휴대용 단말기와 다르게 다수의 입력 데이터에서 추출한 특징 들을 하나의 통합된 특징 정보로 사용하여 제스처 인식율을 높일 것이다.Referring to FIG. 5, the portable terminal first acquires a plurality of input data in step 501. Here, the plurality of input data refers to two-dimensional image data, three-dimensional image data, etc. obtained using a general 2D camera and a 3D camera (a stereo camera or a camera using a time of flight (TOF) principle). Unlike the conventional portable terminal, the portable terminal according to the present invention will increase the gesture recognition rate by using the features extracted from the plurality of input data as one integrated feature information.

이후, 상기 휴대용 단말기는 503단계로 진행하여 상기 501단계에서 획득한 다수의 입력 데이터에서 특징 정보를 추출한다. 이때, 상기 휴대용 단말기는 특정 영역에 포함된 특징 정보를 추출하기 위하여 제스처 인식에 필요한 영역을 확인하여 해당 영역에 마스크를 처리하여 특징 정보를 추출한다.In step 503, the portable terminal extracts feature information from the plurality of input data acquired in step 501. In this case, the portable terminal checks an area required for gesture recognition to extract feature information included in a specific region, and extracts feature information by processing a mask in the corresponding region.

이때, 상기 휴대용 단말기는 본 발명에 따라 형상 정보와 움직임 정보를 이용한 특징 정보를 사용하기 위하여 이전 프레임 정보와 현재 프레임 정보를 이용하여 움직임 정보를 계산한다.In this case, the portable terminal calculates motion information by using previous frame information and current frame information in order to use feature information using shape information and motion information according to the present invention.

이후, 상기 휴대용 단말기는 505단계로 진행하여 상기 단계에서 추출한 특징 정보들을 하나의 특징 정보로 통합하도록 처리한다.In step 505, the portable terminal processes the feature information extracted in the step to be integrated into one feature information.

여기에서, 상기 휴대용 단말기는 제스처 인식율을 높이기 위하여 다수의 입력 데이터에서 추출한 특징 정보를 하나로 통합하는 것으로 이는 하기 도 6내지 7에서 상세히 설명할 것이다.Herein, the portable terminal integrates feature information extracted from a plurality of input data into one to increase the gesture recognition rate, which will be described in detail with reference to FIGS. 6 to 7.

이후, 상기 휴대용 단말기는 507단계로 진행하여 학습 과정을 통한 패턴을 분류하는 과정을 수행한다.In step 507, the portable terminal performs a process of classifying a pattern through a learning process.

여기에서, 상기 휴대용 단말기는 다수의 입력 데이터로부터 추출한 특징 정보들을 통합적으로 조합한 하나의 특징 정보의 가우시안 분포를 가정한 파라미터들을 모델화 시키는 일반적인 학습 과정을 수행한다.Here, the portable terminal performs a general learning process for modeling parameters assuming a Gaussian distribution of one feature information by combining feature information extracted from a plurality of input data.

이때, 상기 휴대용 단말기는 상기 학습 과정을 수행하기 위하여 상기 하나로 조합한 특징 정보를 고유치분해(Eigen-valu decomposition)에 의해 주성분을 구하고, 클러스터링 방법에 의해 카테고리로 분류한다. 이후, 상기 휴대용 단말기는 각기 분류된 카테고리에 속한 영상들에 대해 가우시안 분포(Gaussian distribution)를 가정하여 파라미터들을 구하고 이를 상태모델로 기록하게 된다. 이때 고유치분해에 의해 구해진 주성분들의 공분산행렬(covariance matrix)을 구하여 사용하면 계산이 간단해지고 이를 log-Euclidean 공간으로 맵핑하여 계산이 가능해진다.In this case, the portable terminal obtains a principal component by eigen-valu decomposition to classify the feature information combined into the one to perform the learning process, and classifies the information into categories by a clustering method. Thereafter, the portable terminal obtains parameters by assuming a Gaussian distribution for the images belonging to the classified category and records them as a state model. In this case, if the covariance matrix of the principal components obtained by eigenvalue decomposition is used, the calculation becomes simple and the calculation is possible by mapping it to the log-Euclidean space.

상기와 같은 학습 과정을 수행한 휴대용 단말기는 상기 학습 과정을 통해 구한 파라미터를 모델로 하여 동작의 패턴을 기억하게 된다. 이때 상기 휴대용 단말기는 인식할 신호가 새로 입력되면 상기와 동일한 방법으로 파라미터를 계산하고 이 값과 이미 기억되어 있는 훈련데이터의 파라미터 값과 거리를 계산하여 패턴 공 간상에서 가장 거리가 가까운 모델을 입력 신호의 특정 패턴으로 분류하게 된다. 이를 패턴 분류 과정이라고 하는데, 상기 패턴 분류 과정에 의해 거리가 가장 가까운 모델을 찾기가 힘들 경우 이 신호는 임의에 의해서 다시 학습되거나 패턴 분류가 되지 않는다.The portable terminal performing the learning process as described above stores the pattern of the operation using a model obtained through the learning process as a model. In this case, when a signal to be recognized is newly input, the portable terminal calculates a parameter in the same manner as described above, calculates a parameter value and a distance of the previously stored training data and inputs a model closest to the distance in the pattern space. It is classified into a specific pattern of. This is called a pattern classification process. When the model having the closest distance is difficult to find by the pattern classification process, the signal is not re-learned by random or pattern classification.

이후, 상기 휴대용 단말기는 509단계로 진행하여 분류한 패턴을 이용하여 해당 제스처를 인식하는 과정을 수행한다.In step 509, the portable terminal performs a process of recognizing the corresponding gesture using the classified pattern.

이때, 상기 휴대용 단말기는 HMM, 개선된 HMM 등과 같이 시계열정보를 해석하는 다수의 방법을 이용하여 학습에 의해 각각의 제스처에 대한 파라미터들을 기억하고 새로 입력된 입력에 대한 파라미터 값을 측정하여 가장 유사한 값을 갖는 모델을 입력신호의 제스처로 선택할 수 있다.In this case, the portable terminal stores the parameters for each gesture by learning using a plurality of methods for interpreting time series information such as an HMM and an improved HMM, and measures parameter values for a newly inputted input. The model having a may be selected as a gesture of the input signal.

이후, 상기 휴대용 단말기는 본 알고리즘을 종료한다.The portable terminal then terminates this algorithm.

도 6은 본 발명에 따른 휴대용 단말기에서 다수의 입력데이터에서 특징 정보를 추출하는 과정을 도시한 흐름도이다.6 is a flowchart illustrating a process of extracting feature information from a plurality of input data in a portable terminal according to the present invention.

상기 도 6을 참조하면, 상기 특징 정보를 추출하는 과정은 앞서 설명한 도 5의 503단계에 대한 것으로, 상기 휴대용 단말기는 특징 정보를 추출하기 위하여 먼저 601단계로 진행하여 다수의 입력 데이터에 대한 전처리 작업을 수행한다.Referring to FIG. 6, the process of extracting the feature information is for step 503 of FIG. 5 described above, and the portable terminal proceeds to step 601 in order to extract feature information. Do this.

여기에서, 상기 휴대용 단말기는 앞서 설명한 바와 같이 기존의 휴대용 단말기와 다르게 일반 2D 카메라와 3D 카메라(스테레오 카메라 또는 TOF (Time of Flight) 원리를 이용한 카메라) 등을 이용하여 2차원 영상 데이터, 3차원 영상 데 이터 등의 다수의 입력데이터를 입력받아 가우시안 필터링(Gaussian filtering), smoothing, 감마보정(gamma correction), 영상 평활화(image equalization), 영상복원 또는 보정(image recover or image correction) 등과 같은 전처리 작업을 수행하여 안정된 신호 처리를 가능하도록 한다.Herein, the portable terminal is different from the conventional portable terminal as described above by using a general 2D camera and a 3D camera (stereo camera or a camera using a TOF (Time of Flight) principle), etc., two-dimensional image data, three-dimensional image Preprocessing tasks such as Gaussian filtering, smoothing, gamma correction, image equalization, image recovering or image correction are performed by receiving a plurality of input data such as data. To enable stable signal processing.

이후, 상기 휴대용 단말기는 603단계로 진행하여 전처리한 다수의 입력 데이터에서 제스처 인식이 가능한 영역을 분류한다. 즉, 상기 휴대용 단말기는 전처리된 데이터에서 칼라정보, 거리정보 등을 이용하여 손 영역, 얼굴 영역, 몸 영역과 같은 특정 부위를 분류하고, 605단계로 진행하여 상기 603단게에서 분류한 특정 부위에 대하여 마스킹 작업을 수행한다. 여기에서, 상기 휴대용 단말기는 상기 603단계에서 분류한 특정 부위의 계산 영역을 제한하기 위하여 상기 605단계에서 특정 부위에 마스크를 설정하는 것이다.In step 603, the portable terminal classifies a region in which gesture recognition is possible from a plurality of preprocessed input data. That is, the portable terminal classifies specific areas such as hand area, face area, and body area by using color information and distance information from the preprocessed data, and proceeds to step 605 for the specific areas classified in step 603. Perform masking work. Herein, in step 605, the portable terminal sets a mask on a specific part in order to limit the calculation area of the specific part classified in step 603.

이후, 상기 휴대용 단말기는 607단계로 진행하여 상기 605단계에서 설정한 마스크 영역에서 특징 정보를 추출한 후, 상기 도 5의 505단계의 과정을 수행한다.In step 607, the portable terminal extracts feature information from the mask area set in step 605 and then performs step 505 of FIG. 5.

뿐만 아니라 상기 휴대용 단말기는 본 발명에 따라 형상정보와 움직임 정보를 동시에 통합하는 기술로 움직임 정보를 계산하기 위해 이전 프레임 정보와 현재 프레임 정보를 이용하여 그 값을 계산한다.In addition, the portable terminal calculates a value using previous frame information and current frame information in order to calculate motion information with a technique of simultaneously integrating shape information and motion information according to the present invention.

도 7은 본 발명에 따른 휴대용 단말기에서 다수의 입력 데이터에서 추출한 특징 정보를 통합적으로 조합하는 과정을 도시한 흐름도이다.7 is a flowchart illustrating a process of integrating feature information extracted from a plurality of input data in a portable terminal according to the present invention.

상기 도 7을 참조하면, 상기 특징 정보를 통합적으로 조합하는 과정은 앞서 설명한 도 5의 505단계에 대한 것으로, 상기 휴대용 단말기는 상기 특징 정보를 통합적으로 조합하기 위하여 먼저 701단계에서 상기 도 6의 607단계에서 추출한 특징 정보를 시계열 정보로 생성한 후, 703단계로 진행하여 모션의 표현에 필요한 데이터를 확인한다.Referring to FIG. 7, the process of integrating the feature information is the process of step 505 of FIG. 5 described above, and the portable terminal first starts operation 607 of FIG. 6 in step 701 in order to integrate the feature information. After the feature information extracted in the step is generated as time series information, the process proceeds to step 703 to check the data necessary for expressing the motion.

여기에서, 상기 모션의 표현에 필요한 데이터는 추출한 특징 정보에서 제스처를 인식하는데 사용하는 데이터로 1차원 데이터의 경우, 신호의 극대점이나 파워스펙트럼의 변화량 등이 되고, 2차원 데이터의 경우 영상기울기히스토그램의 빈의 값( bean value of histogram of image gradient), 물체 경계선상의 픽셀들에 대한 종횡방향의 미분값(gradient value for pixels in x and y direction), 3차원 데이터의 경우는 3D 표면의 법선 벡터(normal vector of surface polygon), 꼭지점 영역의 접선 벡터(tangent vector), 움직임 정보의 경우는 이전 위치로부터의 새로운 위치로의 이동 벡터, 가속도, 영역 내 각 픽셀 간의 모션 벡터의 일관성(consistency), 면적의 변화 등을 사용할 수 있으며, 상기 휴대용 단말기는 제스처 인식의 성능을 향상시키기 위하여 다수의 실험을 통하여 상기 모션 표현에 필요한 데이터 가운데 특정 데이터만을 모션 표현에 필요한 데이터로 사용하거나 또는 가능한 모든 특징을 입력으로 넣어 주성분분석법(principle component analysis) 과 같은 통계적 방법에 의해 가장 영향력이 큰 요소를 선택하여 사용할 수 있다.Herein, the data necessary for expressing the motion is data used for recognizing a gesture from the extracted feature information, and in the case of 1-dimensional data, the maximum point of the signal or the amount of change in power spectrum, etc., and in the case of 2-dimensional data, Bean value of histogram of image gradient, gradient value for pixels in x and y direction, or 3D surface normal vector for 3D data vector of surface polygons, tangent vectors of vertex areas, motion information in the case of motion vectors from new locations to new locations, acceleration, consistency of motion vectors between each pixel in the area, and changes in area. And the like, and the portable terminal needs to fill in the motion expression through a number of experiments in order to improve the performance of gesture recognition. A data center may be used by using only the specific data with the data needed to represent motion, or put all the features available to the input principal component analysis selecting the most influential element is greater by a statistical method such as (principle component analysis).

이후, 상기 휴대용 단말기는 705단계로 진행하여 상기 확인한 데이터를 이용하여 다수의 입력데이터에서 추출한 특징 정보들을 하나의 통합적인 특징 정보로 조합한 후, 상기 도 5의 507단계로 진행하여 학습 과정을 통한 패턴 분류 과정을 수행한다.In step 705, the portable terminal combines the feature information extracted from the plurality of input data using the identified data into one integrated feature information, and then proceeds to step 507 of FIG. 5. Perform pattern classification process.

한편 본 발명의 상세한 설명에서는 구체적인 실시 예에 관해 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능함은 물론이다. 그러므로 본 발명의 범위는 설명된 실시 예에 국한되어 정해져서는 아니 되며 후술하는 특허청구의 범위뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다.Meanwhile, in the detailed description of the present invention, specific embodiments have been described, but various modifications are possible without departing from the scope of the present invention. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined not only by the scope of the following claims, but also by the equivalents of the claims.

도 1은 본 발명에 따라 제스처 인식율을 향상시키기 위한 휴대용 단말기의 구성을 도시한 블록도,1 is a block diagram showing the configuration of a portable terminal for improving a gesture recognition rate according to the present invention;

도 2는 본 발명의 바람직한 일 실시 예에 따른 특징 정보 추출부의 구성을 상세히 도시한 블록도,2 is a block diagram showing in detail the configuration of the feature information extraction unit according to an embodiment of the present invention;

도 3은 본 발명의 바람직한 일 실시 예에 따른 통합 정보 생성부에서 다수의 입력 데이터에서 추출한 특징 정보들을 조합하는 과정을 도시한 도면,3 is a diagram illustrating a process of combining feature information extracted from a plurality of input data in an integrated information generating unit according to an exemplary embodiment of the present invention;

도 4는 본 발명의 바람직한 일 실시 예에 따른 통합 정보 생성부에서 움직임 정보로부터 추출한 특징 정보들을 조합하는 과정을 도시한 도면,4 is a diagram illustrating a process of combining feature information extracted from motion information in an integrated information generating unit according to an embodiment of the present invention;

도 5는 본 발명에 따른 휴대용 단말기에서 제스처를 인식하는 과정을 도시한 흐름도,5 is a flowchart illustrating a process of recognizing a gesture in a portable terminal according to the present invention;

도 6은 본 발명에 따른 휴대용 단말기에서 다수의 입력데이터에서 특징 정보를 추출하는 과정을 도시한 흐름도 및,6 is a flowchart illustrating a process of extracting feature information from a plurality of input data in a portable terminal according to the present invention;

도 7은 본 발명에 따른 휴대용 단말기에서 다수의 입력 데이터에서 추출한 특징 정보를 통합적으로 조합하는 과정을 도시한 흐름도.7 is a flowchart illustrating a process of integrating feature information extracted from a plurality of input data in a portable terminal according to the present invention.

Claims

An apparatus for recognizing a gesture in a portable terminal,

And a gesture recognition unit configured to receive a plurality of input data, extract a plurality of feature information, and generate the extracted feature information as integrated information in one.

The method of claim 1,

The gesture recognition unit,

And a plurality of input data including one-dimensional data, two-dimensional data, shape information of three-dimensional data, or image information.

The method of claim 1,

The gesture recognition unit,

Selecting a specific region from the input data, setting a mask on the selected region to limit the calculation region, and then extracting the plurality of feature information from the limited region.

The method of claim 3, wherein

The gesture recognition unit,

And extracting a plurality of pieces of characteristic information from the input data by extracting specific information on motion information using previous frame information and current frame information.

The method of claim 1,

The gesture recognition unit,

And generating the extracted feature information as time series information, and then combining information corresponding to data required for motion representation among the generated time series information to generate the extracted feature information as integrated information.

The method of claim 5,

The data required for the motion representation is

This data is used to recognize gestures, and in the case of 1-dimensional data, it is the maximum point of the signal or the amount of change in the power spectrum. Value, normal vector of 3D surface for 3D data, tangential vector of vertex area, motion vector for new position from previous position for acceleration information, acceleration, consistency of motion vector between each pixel in the area, area of And at least one of the changes.

In the method for recognizing a gesture in a portable terminal,

Receiving a plurality of input data;

Extracting a plurality of feature information from the input data;

And generating the integrated information into which the extracted feature information is integrated into one.

The method of claim 7, wherein

The plurality of input data,

And shape information or image information of one-dimensional data, two-dimensional data, and three-dimensional data.

The method of claim 7, wherein

The process of extracting a plurality of feature information from the received input data,

Selecting a specific part from the input data;

Limiting the calculation area by setting a mask on the selected area;

And extracting feature information from the restricted area.

The method of claim 9,

And extracting specific information on the motion information by using previous frame information and current frame information.

The method of claim 7, wherein

The process of generating the extracted feature information as a unified information in one,

Generating the extracted feature information as time series information;

And combining information corresponding to data required for motion representation among the generated time series information.

The method of claim 11,

The data required for the motion representation is

This data is used to recognize gestures, and in the case of 1-dimensional data, it is the maximum point of the signal or the amount of change in the power spectrum. Value, 3D data, normal vector of 3D surface, tangential vector of vertex region, motion vector of movement information from previous position to new position, acceleration, consistency of motion vector between each pixel in the region, area of At least one of the changes.