KR102664254B1

KR102664254B1 - An apparatus for recognizing hand signals based on vision artificial intelligence and a method for recognizing hand signals using the same

Info

Publication number: KR102664254B1
Application number: KR1020220041264A
Authority: KR
Inventors: 김병학; 원홍인; 정승현; 이준하
Original assignee: 한국생산기술연구원
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2024-05-08
Also published as: KR20230142258A

Abstract

본 발명의 일 실시 예는 비전센서 등의 촬상 장치 이외에 구성을 최소화시킴으로써 단순한 구성으로 수신호 인식 장치를 구현하는 기술을 제공한다. 본 발명의 실시 예에 따른 비전 인공지능 기반 수신호 인식 장치는, 사용자의 손에 의한 동작을 촬상하여 촬상이미지를 생성하는 촬상부; 상기 촬상부로부터 상기 촬상이미지를 전달받고, 사용자 손의 형상 및 변화를 인식하여 생성된 데이터인 형상탐지데이터를 생성시키는 형상탐지모듈; 상기 촬상부로부터 상기 촬상이미지를 전달받고, 사용자 손의 궤적을 인식하여 생성된 데이터인 궤적탐지데이터를 생성시키는 궤적탐지모듈; 및 상기 형상탐지모듈로부터 상기 형상탐지데이터를 전달받고, 상기 궤적탐지모듈로부터 상기 궤적탐지데이터를 전달받으며, 상기 형상탐지데이터 또는 상기 궤적탐지데이터에 대한 학습을 수행하는 학습모듈을 포함한다.One embodiment of the present invention provides a technology for implementing a hand signal recognition device with a simple configuration by minimizing the configuration in addition to an imaging device such as a vision sensor. A vision artificial intelligence-based hand signal recognition device according to an embodiment of the present invention includes an imaging unit that captures a motion of a user's hand and generates a captured image; a shape detection module that receives the captured image from the imaging unit and generates shape detection data, which is data generated by recognizing the shape and change of the user's hand; a trajectory detection module that receives the captured image from the imaging unit and generates trace detection data, which is data generated by recognizing the trace of the user's hand; and a learning module that receives the shape detection data from the shape detection module, receives the trajectory detection data from the trajectory detection module, and performs learning on the shape detection data or the trajectory detection data.

Description

Vision artificial intelligence-based hand signal recognition device and hand signal recognition method using the same {AN APPARATUS FOR RECOGNIZING HAND SIGNALS BASED ON VISION ARTIFICIAL INTELLIGENCE AND A METHOD FOR RECOGNIZING HAND SIGNALS USING THE SAME}

본 발명은 비전 인공지능 기반 수신호 인식 장치 및 이를 이용한 수신호 인식 방법에 관한 것으로, 더욱 상세하게는, 비전센서 등의 촬상 장치 이외에 구성을 최소화시킴으로써 단순한 구성으로 수신호 인식 장치를 구현하는 기술에 관한 것이다.The present invention relates to a vision artificial intelligence-based hand signal recognition device and a hand signal recognition method using the same. More specifically, it relates to a technology for implementing a hand signal recognition device with a simple configuration by minimizing the configuration in addition to an imaging device such as a vision sensor.

기존의 수신호 인식 관련 최근 연구사례로 UCLA 연구진에 의하여 개발된 ASL(american sign language) 인식 기술기반의 Sign-to-speech 기술이 발표되었다. Sign-to-speech 연구사례는 수신호 인식 신호 획득을 위한 센서 부착을 통하여 복잡한 손가락의 모양의 구분 기술을 구현하여 95% 이상의 높은 ASL 인식률을 보여주었으나, 상시 휴대가 어려운 글러브 형태의 센서 부착이 필요함에 따라, 활용 범위가 제한적이다. 그리고, 인텔의 Touchless 기술은 별도의 센서 없이 터치 기능을 구현하였으나, 모니터 근접위치에서 터치, 클릭을 대체하는 형태의 기술로 제한된다.As a recent example of research related to existing hand signal recognition, sign-to-speech technology based on ASL (American Sign Language) recognition technology developed by UCLA researchers was announced. The sign-to-speech research case demonstrated a high ASL recognition rate of over 95% by implementing technology to distinguish complex finger shapes by attaching a sensor to acquire a hand signal recognition signal, but it required attachment of a glove-type sensor that is difficult to carry at all times. Accordingly, the scope of use is limited. Additionally, Intel's Touchless technology implements the touch function without a separate sensor, but is limited to a technology that replaces touch and click at a location close to the monitor.

또한, 국내 상용화 사례로, VTOUCH社에서 보유중인 기술은 차량 내부 제어 디스플레이 또는 키오스크에 활용되는 원격 클릭 형태가 있으나, 마우스 사용 및 클릭, 볼륨조절 제스처 인식 등의 기능 범위로 제한적이며, 복잡한 형태의 문자정보를 입력하기 위해서는 마우스 형태의 제어로 화면에 전시되는 가상 키보드판의 해당 위치를 정확히 클릭해야 함에 따라, 마우스 커서 추종 위치오차에 따라 오타입력이 자주 발생하는 한계사항이 있다.In addition, as a domestic commercialization example, the technology owned by VTOUCH has a remote click type that is used for vehicle interior control displays or kiosks, but it is limited to the range of functions such as mouse use and click, volume control gesture recognition, and complex characters. In order to input information, you must click exactly on the corresponding position on the virtual keyboard displayed on the screen using a mouse-type control, so there is a limitation that typos often occur due to position errors in tracking the mouse cursor.

대한민국 등록특허 제10-2121654호(발명의 명칭: 딥러닝 기반 제스처 자동 인식 방법 및 시스템)에서는, 제스처 인식 시스템이, 입력 영상에서 다수의 윤곽들을 추출하는 단계; 제스처 인식 시스템이, 윤곽들 각각을 구성하는 윤곽 정보들을 정규화하여, 학습 데이터들을 생성하는 단계; 제스처 인식 시스템이, 생성된 학습 데이터들을 이용하여, 제스처 인식을 위한 인공지능 모델을 학습시키는 단계;를 포함하고, 윤곽들은, 중첩 가능한 것을 특징으로 하는 방법이 개시되어 있다.In Republic of Korea Patent No. 10-2121654 (title of the invention: deep learning-based automatic gesture recognition method and system), the gesture recognition system includes the steps of extracting a number of outlines from an input image; A step of the gesture recognition system generating learning data by normalizing outline information constituting each of the outlines; A method is disclosed, including a step of the gesture recognition system training an artificial intelligence model for gesture recognition using the generated learning data, wherein the outlines are capable of overlapping.

대한민국 등록특허 제10-2121654호Republic of Korea Patent No. 10-2121654

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은, 비전센서 등의 촬상 장치 이외에 구성을 최소화시킴으로써 단순한 구성으로 수신호 인식 장치를 구현하는 것이다.The purpose of the present invention to solve the above problems is to implement a hand signal recognition device with a simple configuration by minimizing the configuration other than an imaging device such as a vision sensor.

본 발명이 이루고자 하는 기술적 과제는 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problem to be achieved by the present invention is not limited to the technical problem mentioned above, and other technical problems not mentioned can be clearly understood by those skilled in the art from the description below. There will be.

상기와 같은 목적을 달성하기 위한 본 발명의 구성은, 사용자의 손에 의한 동작을 촬상하여 촬상이미지를 생성하는 촬상부; 상기 촬상부로부터 상기 촬상이미지를 전달받고, 사용자 손의 형상 및 변화를 인식하여 생성된 데이터인 형상탐지데이터를 생성시키는 형상탐지모듈; 상기 촬상부로부터 상기 촬상이미지를 전달받고, 사용자 손의 궤적을 인식하여 생성된 데이터인 궤적탐지데이터를 생성시키는 궤적탐지모듈; 및 상기 형상탐지모듈로부터 상기 형상탐지데이터를 전달받고, 상기 궤적탐지모듈로부터 상기 궤적탐지데이터를 전달받으며, 상기 형상탐지데이터 또는 상기 궤적탐지데이터에 대한 학습을 수행하는 학습모듈을 포함한다.The configuration of the present invention for achieving the above object includes: an imaging unit that captures the motion of the user's hand and generates an image; a shape detection module that receives the captured image from the imaging unit and generates shape detection data, which is data generated by recognizing the shape and change of the user's hand; a trajectory detection module that receives the captured image from the imaging unit and generates trace detection data, which is data generated by recognizing the trace of the user's hand; and a learning module that receives the shape detection data from the shape detection module, receives the trajectory detection data from the trajectory detection module, and performs learning on the shape detection data or the trajectory detection data.

본 발명의 실시 예에 있어서, 상기 촬상부는 복수 개 형성되며, 상기 촬상이미지는 가시광선 촬상에 의한 이미지인 가시광선이미지, 적외선 촬상에 의한 이미지인 적외선이미지 또는 3차원 이미지를 포함할 수 있다.In an embodiment of the present invention, a plurality of imaging units are formed, and the captured image may include a visible light image that is an image by visible light imaging, an infrared image that is an image by infrared imaging, or a three-dimensional image.

본 발명의 실시 예에 있어서, 상기 형상탐지모듈은, 상기 촬상부로부터 상기 촬상이미지를 전달받고 사용자 손의 형상 및 변화를 인식하여 데이터를 생성하는 형상추적부; 및 상기 형상추적부로부터 데이터를 전달받고, 사용자의 손 동작이 수신호(gesture)에 해당하는 여부를 판단하고 분류하여 데이터를 생성하는 형상판단부를 포함할 수 있다.In an embodiment of the present invention, the shape detection module includes a shape tracking unit that receives the captured image from the imaging unit and generates data by recognizing the shape and change of the user's hand; and a shape determination unit that receives data from the shape tracking unit, determines whether the user's hand movement corresponds to a hand signal, classifies it, and generates data.

본 발명의 실시 예에 있어서, 상기 형상탐지모듈은, 상기 형상판단부로부터 데이터를 전달받고 해당 데이터에 대해 학습을 수행하여 상기 형상탐지데이터를 생성시키는 형상탐지부를 더 포함할 수 있다.In an embodiment of the present invention, the shape detection module may further include a shape detection unit that receives data from the shape determination unit and performs learning on the data to generate the shape detection data.

본 발명의 실시 예에 있어서, 상기 형상추적부는, 컨벡스헐(convex-hull) 알고리즘을 구비하여 컨벡스 트랙킹(convex tracking)을 수행할 수 있다.In an embodiment of the present invention, the shape tracking unit may perform convex tracking using a convex-hull algorithm.

본 발명의 실시 예에 있어서, 상기 학습모듈은, 상기 형상탐지데이터 또는 상기 궤적탐지데이터의 이미지에 대해 실시간 학습을 수행하여 반응맵의 추적좌표를 추론하는 추적유닛; 및 상기 추적유닛으로부터 전달받은 데이터를 축적하는 데이터축적유닛을 구비할 수 있다.In an embodiment of the present invention, the learning module includes: a tracking unit that performs real-time learning on images of the shape detection data or the trajectory detection data to infer tracking coordinates of a reaction map; and a data accumulation unit that accumulates data received from the tracking unit.

본 발명의 실시 예에 있어서, 상기 추적유닛은, 상기 형상탐지데이터 또는 상기 궤적탐지데이터의 이미지에서 특징을 추출하는 특징추출부; 및 상기 특징추출부로부터 전달된 데이터를 인공신경망을 이용하여 학습하고 이미지에 대한 상기 반응맵을 생성하는 학습부를 구비할 수 있다.In an embodiment of the present invention, the tracking unit includes a feature extraction unit that extracts features from an image of the shape detection data or the trajectory detection data; and a learning unit that learns the data transmitted from the feature extraction unit using an artificial neural network and generates the response map for the image.

본 발명의 실시 예에 있어서, 상기 추적유닛은, 상기 학습부로 전달된 상기 반응맵에 대한 보정을 수행하는 보정부; 및 상기 보정부로부터 전달된 상기 반응맵을 이용하여 이미지 객체에 대한 특징강화 처리를 수행하는 특징강화부를 더 구비할 수 있다.In an embodiment of the present invention, the tracking unit includes: a correction unit that performs correction on the response map transmitted to the learning unit; and a feature enhancement unit that performs feature enhancement processing on the image object using the response map delivered from the correction unit.

본 발명의 실시 예에 있어서, 상기 궤적탐지모듈은, 상기 촬상부로부터 상기 촬상이미지를 전달받고, 소정의 3차원 영역인 가상캔버스(Virtual canvas)에서 형성된 사용자 손의 이동 궤적을 인식하여 데이터를 생성하는 궤적추적부; 및 상기 궤적추적부로부터 데이터를 전달받고, 해당 데이터에 대한 학습을 수행하여 상기 궤적탐지데이터를 생성하는 궤적탐지부를 포함할 수 있다.In an embodiment of the present invention, the trajectory detection module receives the captured image from the imaging unit and generates data by recognizing the movement trace of the user's hand formed on a virtual canvas, which is a predetermined three-dimensional area. a trajectory tracking unit; and a trajectory detection unit that receives data from the trajectory tracking unit and performs learning on the data to generate the trajectory detection data.

본 발명의 실시 예에 있어서, 생성된 상기 형상탐지데이터 또는 상기 궤적탐지데이터를 이용한 이미지를 화면에 표시하는 수신호표시부를 더 포함할 수 있다.In an embodiment of the present invention, the hand signal display unit may further include a hand signal display unit that displays an image using the generated shape detection data or the trajectory detection data on the screen.

상기와 같은 목적을 달성하기 위한 본 발명의 구성은, 복수 개의 촬상부에서 사용자 손 동작을 촬상하여 상기 촬상이미지를 수집하는 제1단계; 상기 촬상이미지가 상기 형상탐지모듈과 상기 궤적탐지모듈로 전달되는 제2단계; 상기 형상탐지모듈에서 상기 형상탐지데이터가 형성되고, 상기 궤적탐지모듈에서 상기 궤적탐지데이터가 형성되는 제3단계; 및 상기 형상탐지데이터에 의한 이미지 또는 상기 궤적탐지데이터에 의한 이미지가 수신호표시부에 표시되는 제4단계를 포함한다.The configuration of the present invention for achieving the above object includes: a first step of capturing the user's hand movements in a plurality of imaging units and collecting the captured images; A second step in which the captured image is transmitted to the shape detection module and the trajectory detection module; A third step in which the shape detection data is formed in the shape detection module and the trajectory detection data is formed in the trajectory detection module; And a fourth step in which the image based on the shape detection data or the image based on the trajectory detection data is displayed on the hand signal display unit.

본 발명의 실시 예에 있어서, 상기 학습모듈에 의해 생성된 학습데이터를 이용하여 상기 형상탐지모듈과 상기 궤적탐지모듈에서 학습을 수행하는 제5단계를 더 포함할 수 있다.In an embodiment of the present invention, a fifth step of performing learning in the shape detection module and the trajectory detection module using learning data generated by the learning module may be further included.

상기와 같은 구성에 따른 본 발명의 효과는, 사용자의 손에 센서 부착형 입력 장치 등을 부착할 필요 없이, 비전 센서 등의 촬상 장치에 의한 촬상으로 수신호를 인식하므로, 편리하고 경제적이라는 것이다.The effect of the present invention according to the above configuration is that it is convenient and economical because hand signals are recognized through imaging by an imaging device such as a vision sensor without the need to attach a sensor-attached input device to the user's hand.

또한, 본 발명의 효과는, 촬상 장치와 이격된 거리에서도 입력 장치로 활용이 가능하며, 대형 디스플레이를 위한 제어기술로 활용이 가능하여, 활용도가 증가한다는 것이다.In addition, the effect of the present invention is that it can be used as an input device even at a distance from the imaging device and can be used as a control technology for a large display, increasing utilization.

그리고, 본 발명의 효과는, 단순한 제스처부터 복잡한 문자 또는 서명 정보 등 입력 신호의 종류 개수의 제한이 없으므로, 다양한 정보를 신속하게 입력 가능하다는 것이다.Additionally, the effect of the present invention is that there is no limit to the number of types of input signals, such as simple gestures to complex text or signature information, so various information can be input quickly.

본 발명의 효과는 상기한 효과로 한정되는 것은 아니며, 본 발명의 상세한 설명 또는 특허청구범위에 기재된 발명의 구성으로부터 추론 가능한 모든 효과를 포함하는 것으로 이해되어야 한다.The effects of the present invention are not limited to the effects described above, and should be understood to include all effects that can be inferred from the configuration of the invention described in the detailed description or claims of the present invention.

도 1은 본 발명의 일 실시 예에 따른 형상탐지모듈에 대한 개략도이다.
도 2는 본 발명의 일 실시 예에 따른 형상추적부와 형상판단부에 대한 개략도이다.
도 3은 본 발명의 일 실시 예에 따른 형상탐지부에 대한 개략도이다.
도 4는 본 발명의 일 실시 예에 따른 궤적탐지모듈에 대한 개략도이다.
도 5는 본 발명의 일 실시 예에 따른 궤적추적부에 대한 개략도이다.
도 6은 본 발명의 일 실시 예에 따른 수신호표시부에 대한 개략도이다.
도 7은 본 발명의 일 실시 예에 따른 궤적판단부에 대한 개략도이다.
도 8은 본 발명의 일 실시 예에 따른 수신호 인식 장치의 구성에 대한 개략도이다.
도 9는 본 발명의 일 실시 예에 따른 추적유닛에 대한 개략도이다.
도 10은 본 발명의 일 실시 예에 따른 손 궤적 추적에 대한 개념도이다.
도 11과 도 12는 본 발명의 일 실시 예에 따른 수신호 인식 장치를 이용한 수신호 인식 시험에 대한 이미지이다.Figure 1 is a schematic diagram of a shape detection module according to an embodiment of the present invention.
Figure 2 is a schematic diagram of a shape tracking unit and a shape determination unit according to an embodiment of the present invention.
Figure 3 is a schematic diagram of a shape detection unit according to an embodiment of the present invention.
Figure 4 is a schematic diagram of a trajectory detection module according to an embodiment of the present invention.
Figure 5 is a schematic diagram of a trajectory tracking unit according to an embodiment of the present invention.
Figure 6 is a schematic diagram of a hand signal display unit according to an embodiment of the present invention.
Figure 7 is a schematic diagram of a trajectory determination unit according to an embodiment of the present invention.
Figure 8 is a schematic diagram of the configuration of a hand signal recognition device according to an embodiment of the present invention.
Figure 9 is a schematic diagram of a tracking unit according to an embodiment of the present invention.
Figure 10 is a conceptual diagram of hand trace tracking according to an embodiment of the present invention.
Figures 11 and 12 are images of a hand signal recognition test using a hand signal recognition device according to an embodiment of the present invention.

이하에서는 첨부한 도면을 참조하여 본 발명을 설명하기로 한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 따라서 여기에서 설명하는 실시 예로 한정되는 것은 아니다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, the present invention will be described with reference to the attached drawings. However, the present invention may be implemented in various different forms and, therefore, is not limited to the embodiments described herein. In order to clearly explain the present invention in the drawings, parts unrelated to the description are omitted, and similar parts are given similar reference numerals throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결(접속, 접촉, 결합)"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 부재를 사이에 두고 "간접적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 구비할 수 있다는 것을 의미한다. Throughout the specification, when a part is said to be "connected (connected, contacted, combined)" with another part, this means not only "directly connected" but also "indirectly connected" with another member in between. "Includes cases where it is. Additionally, when a part is said to “include” a certain component, this does not mean that other components are excluded, but that other components can be added, unless specifically stated to the contrary.

본 명세서에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. The terms used in this specification are merely used to describe specific embodiments and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

이하 첨부된 도면을 참고하여 본 발명에 대하여 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the attached drawings.

도 1은 본 발명의 일 실시 예에 따른 형상탐지모듈(100)에 대한 개략도이고, 도 2는 본 발명의 일 실시 예에 따른 형상추적부(110)와 형상판단부(120)에 대한 개략도이며, 도 3은 본 발명의 일 실시 예에 따른 형상탐지부(130)에 대한 개략도이다.Figure 1 is a schematic diagram of the shape detection module 100 according to an embodiment of the present invention, and Figure 2 is a schematic diagram of the shape tracking unit 110 and the shape judgment unit 120 according to an embodiment of the present invention. , Figure 3 is a schematic diagram of the shape detection unit 130 according to an embodiment of the present invention.

또한, 도 4는 본 발명의 일 실시 예에 따른 궤적탐지모듈(200)에 대한 개략도이고, 도 5는 본 발명의 일 실시 예에 따른 궤적추적부(210)에 대한 개략도이며, 도 6은 본 발명의 일 실시 예에 따른 수신호표시부(420)에 대한 개략도이다. 그리고, 도 7은 본 발명의 일 실시 예에 따른 궤적판단부에 대한 개략도이다.In addition, Figure 4 is a schematic diagram of the trajectory detection module 200 according to an embodiment of the present invention, Figure 5 is a schematic diagram of the trajectory tracking unit 210 according to an embodiment of the present invention, and Figure 6 is a schematic diagram of the trajectory detection module 200 according to an embodiment of the present invention. This is a schematic diagram of the hand signal display unit 420 according to an embodiment of the invention. And, Figure 7 is a schematic diagram of a trajectory determination unit according to an embodiment of the present invention.

도 1 내지 도 7에서 보는 바와 같이, 본 발명의 수신호 인식 장치는, 사용자의 손에 의한 동작을 촬상하여 촬상이미지를 생성하는 촬상부(410); 촬상부(410)로부터 촬상이미지를 전달받고, 사용자 손의 형상 및 변화를 인식하여 생성된 데이터인 형상탐지데이터를 생성시키는 형상탐지모듈(100); 촬상부(410)로부터 촬상이미지를 전달받고, 사용자 손의 궤적을 인식하여 생성된 데이터인 궤적탐지데이터를 생성시키는 궤적탐지모듈(200); 및 형상탐지모듈(100)로부터 형상탐지데이터를 전달받고, 궤적탐지모듈(200)로부터 궤적탐지데이터를 전달받으며, 형상탐지데이터 또는 궤적탐지데이터에 대한 학습을 수행하는 학습모듈을 포함한다.As shown in Figures 1 to 7, the hand signal recognition device of the present invention includes an imaging unit 410 that captures the motion of the user's hand and generates an image; A shape detection module 100 that receives a captured image from the imaging unit 410 and generates shape detection data, which is data generated by recognizing the shape and change of the user's hand; a trajectory detection module 200 that receives an image from the imaging unit 410 and generates trajectory detection data, which is data generated by recognizing the trajectory of the user's hand; and a learning module that receives shape detection data from the shape detection module 100, receives trajectory detection data from the trajectory detection module 200, and performs learning on the shape detection data or trajectory detection data.

촬상부(410)는 복수 개 형성되며, 촬상이미지는 가시광선 촬상에 의한 이미지인 가시광선이미지, 적외선 촬상에 의한 이미지인 적외선이미지 또는 3차원 이미지를 포함할 수 있다.A plurality of imaging units 410 are formed, and the captured image may include a visible light image that is an image by visible light imaging, an infrared image that is an image by infrared imaging, or a three-dimensional image.

가시광선이미지를 촬상하기 위하여 촬상부(410)는 디지털카메라 또는 RGB카메라센서를 구비할 수 있다. 또한, 적외선이미지를 촬상하기 위하여 촬상부(410)는 적외선(IR) 카메라를 구비할 수 있다. 그리고, 3차원 이미지를 촬상하기 위하여 촬상부(410)는 3차원 카메라를 구비할 수 있다.To capture a visible light image, the imaging unit 410 may be equipped with a digital camera or an RGB camera sensor. Additionally, the imaging unit 410 may be equipped with an infrared (IR) camera to capture an infrared image. Additionally, in order to capture a 3D image, the imaging unit 410 may be equipped with a 3D camera.

하나의 촬상부(410)에서 가시광선이미지와 적외선이미지 및 3차원 이미지를 동시 또는 각각 별도로 촬상할 수 있으며, 복수 개의 촬상부(410) 중 어느 하나의 촬상부(410)는 가시광선이미지를 촬상하고 다른 촬상부(410)는 적외선이미지를 촬상하며 또 다른 촬상부(410)는 3차원 이미지를 촬상할 수 있다.One imaging unit 410 can capture visible light images, infrared images, and three-dimensional images simultaneously or separately, and one imaging unit 410 among the plurality of imaging units 410 captures visible light images. And another imaging unit 410 can capture an infrared image, and another imaging unit 410 can capture a three-dimensional image.

그리고, 상기와 같은 촬상부(410)는 복수 개 형성될 수 있으며, 각각의 촬상부(410)에 의한 촬상이미지에서 사용자 손의 형상 및 변화를 인식하여 생성된 형상탐지데이터나 사용자 손의 궤적을 인식하여 생성된 궤적탐지데이터가 학습모듈로 전달되어 학습될 수 있다.In addition, a plurality of imaging units 410 as described above may be formed, and the shape detection data or the trace of the user's hand generated by recognizing the shape and change of the user's hand in the image captured by each imaging unit 410 Trajectory detection data generated through recognition can be transmitted to the learning module and learned.

촬상된 복수의 도메인에 의한 이미지 데이터를 다중 도메인(Multi-domain) 데이터라고 할 수 있으며, 복수 개의 촬상부(410)에 의해 획득된 다중 도메인 데이터가 형상탐지모듈(100)과 궤적탐지모듈(200)로 전달될 수 있다.Image data from a plurality of captured domains can be referred to as multi-domain data, and multi-domain data acquired by a plurality of imaging units 410 is generated by the shape detection module 100 and the trajectory detection module 200. ) can be transmitted.

형상탐지모듈(100)은, 촬상부(410)로부터 촬상이미지를 전달받고 사용자 손의 형상 및 변화를 인식하여 데이터를 생성하는 형상추적부(110); 및 형상추적부로부터 데이터를 전달받고, 사용자의 손 동작이 수신호(gesture)에 해당하는 여부를 판단하고 분류하여 데이터를 생성하는 형상판단부(120)를 포함할 수 있다.The shape detection module 100 includes a shape tracking unit 110 that receives an image from the imaging unit 410 and generates data by recognizing the shape and change of the user's hand; And it may include a shape determination unit 120 that receives data from the shape tracking unit, determines whether the user's hand movement corresponds to a hand signal, classifies it, and generates data.

그리고, 형상탐지모듈(100)은, 형상판단부(120)로부터 데이터를 전달받고 해당 데이터에 대해 학습을 수행하여 형상탐지데이터를 생성시키는 형상탐지부(130)를 더 포함할 수 있다.In addition, the shape detection module 100 may further include a shape detection unit 130 that receives data from the shape determination unit 120 and performs learning on the data to generate shape detection data.

형상추적부(110)는, 컨벡스헐(convex-hull) 알고리즘을 구비하여 컨벡스 트랙킹(convex tracking)을 수행할 수 있다. 도 1과 도 2에서 보는 바와 같이, 컨벡스 트랙킹은 컨벡스헐(convex-hull) 알고리즘을 이용하여 복수 개의 촬상부(410)(다중 도메인)에서 입력되는 사용자 손의 동작에 대한 정보의 3차원 특징을 분석하여 수신호를 생성하는 사용자 손의 각 지점의 위치 및 속도를 연산하는 것일 수 있다.The shape tracking unit 110 may perform convex tracking using a convex-hull algorithm. As shown in Figures 1 and 2, convex tracking uses the convex-hull algorithm to capture three-dimensional characteristics of information about the user's hand motion input from a plurality of imaging units 410 (multiple domains). This may be done by analyzing and calculating the position and speed of each point of the user's hand that generates the hand signal.

구체적으로, 상기와 같은 2차원의 가시광선이미지, 적외선이미지 또는 3차원 이미지 각각의 이미지가 적어도 하나 이상의 프레임의 조합으로 이루어진 이미지 시퀀스(Image sequence) 단위로 포함될 수 있으며, 이와 같은 복수 개의 촬상부(410)로부터 촬상이미지가 형상추적부(110)로 전달되면, 형상추적부(110)에서는 각각의 촬상이미지를 이용하여 손 영역을 획득할 수 있다.Specifically, each of the above-described two-dimensional visible light images, infrared images, or three-dimensional images may be included as an image sequence unit consisting of a combination of at least one or more frames, and a plurality of such imaging units ( When the captured image is transmitted from 410 to the shape tracking unit 110, the shape tracking unit 110 can obtain the hand area using each captured image.

또한, 형상추적부(110)는, 상기와 같이 획득된 손 영역을 모두 포함하는 컨벡스헐(convex-hull)을 구하고, 손 영역의 컨벡스헐에서 경계점들을 추출하고 경계점들 각각의 3차원 위치 및 속도를 연산할 수 있다.In addition, the shape tracking unit 110 obtains a convex-hull including all of the hand areas obtained as above, extracts boundary points from the convex hull of the hand area, and calculates the three-dimensional position and velocity of each of the boundary points. can be calculated.

그리고, 형상추적부(110)는, 상기와 같은 경계점들 각각의 3차원 위치 및 속도의 변화 여부 및 변화 정도를 이용하여 사용자의 손 동작을 추적하여 인식할 수 있다. 이와 같이 인식된 손 동작에 대한 데이터는 형상판단부(120)로 전달될 수 있다.In addition, the shape tracking unit 110 can track and recognize the user's hand movements using the change and degree of change in the three-dimensional position and speed of each of the boundary points described above. Data on the hand movement recognized in this way may be transmitted to the shape determination unit 120.

형상판단부(120)는 형상추적부(110)로부터 전달된 데이터를 이용하여 인공신경망을 이용한 기계학습을 수행하고, 사용자의 손 동작이 사용자의 의도를 포함한 수신호에 해당되는지 여부를 판단할 수 있다. 일 실시예로, 사용자가 화면으로 찍어내리는 동작 등을 수행할 때, 해당 손 동작이 수신호에 해당되는지 여부를 판단할 수 있다.The shape determination unit 120 performs machine learning using an artificial neural network using the data transmitted from the shape tracking unit 110, and can determine whether the user's hand movement corresponds to a hand signal including the user's intention. . In one embodiment, when a user performs an action such as taking a picture on the screen, it may be determined whether the hand action corresponds to a hand signal.

여기서, 인공신경망으로는 CNN(Convolution Neural Network), DNN(Deep Neural Network), RNN(Recurrent Neural Network) 등이 이용될 수 있으며, 이에 한정되는 것은 아니다.Here, the artificial neural network may include, but is not limited to, CNN (Convolution Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), etc.

그리고, 상기와 같이 형상판단부(120)에서 수신호로 해당하는 것으로 분류된 데이터는 형상탐지부(130)로 전달되고, 형상탐지부(130)에서는 수신호 형상을 인공지능 다단 탐지/분류 모델을 활용하여 인식하게 되고, 이에 의한 데이터를 생성하여 형상탐지데이터를 형성할 수 있다.And, as described above, the data classified as corresponding to a hand signal in the shape determination unit 120 is transmitted to the shape detection unit 130, and the shape detection unit 130 uses an artificial intelligence multi-stage detection/classification model to determine the shape of the hand signal. This allows recognition, and data can be generated to form shape detection data.

여기서, 형상탐지부(130)에서는 two-stage detection model(2-Stage Detector)가 이용될 수 있으며, 구체적으로, 형상탐지부(130)에서는 ResNet, RCNN 계열(R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN ...) 등이 이용될 수 있다. 도 3에서는, 형상탐지부(130)에 의해 탐지/분류되어 생성된 형상탐지데이터의 이미지에 대한 사항이 개시되어 있다.Here, the shape detection unit 130 may use a two-stage detection model (2-Stage Detector). Specifically, the shape detection unit 130 may use ResNet, RCNN series (R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN...) etc. can be used. In Figure 3, details about the image of shape detection data generated by detection/classification by the shape detection unit 130 are disclosed.

상기와 같이, 형상탐지모듈(100)에 의한 방식은 수신호 형상인식 모드(모델)에 의한 것으로, 사용자가 ASL(American Sign Language)과 같은 수신호의 형태만 암기하면 순간적인 속도로 숫자/문자의 입력이 가능한 방법일 수 있다.As described above, the method used by the shape detection module 100 is based on the hand signal shape recognition mode (model), where the user inputs numbers/characters at an instantaneous speed by simply memorizing the form of the hand signal such as ASL (American Sign Language). This may be a possible method.

본 발명의 수신호 인식 장치는, 생성된 형상탐지데이터 또는 궤적탐지데이터를 이용한 이미지를 화면에 표시하는 수신호표시부(420)를 더 포함할 수 있다. 도 1에서 보는 바와 같이, 손의 형상의 의한 수신호인 형상 수신호로써 ASL 수신호 동작을 사용자가 수행하면, 사용자의 수신호 동작과 함께, 해당 수신호가 의미하는 숫자, 문자 등이 수신호표시부(420)의 화면에 표시될 수 있다.The hand signal recognition device of the present invention may further include a hand signal display unit 420 that displays an image using the generated shape detection data or trajectory detection data on the screen. As shown in Figure 1, when a user performs an ASL hand signal operation using a shape hand signal, which is a hand signal based on the shape of the hand, along with the user's hand signal operation, numbers, letters, etc. that the hand signal represents are displayed on the screen of the hand signal display unit 420. It can be displayed in .

수신호표시부(420)에서 궤적탐지데이터에 의한 화면 표시는, 하기에서 상세히 설명하기로 한다.The screen display based on trajectory detection data in the hand signal display unit 420 will be described in detail below.

궤적탐지모듈(200)은, 촬상부(410)로부터 촬상이미지를 전달받고, 소정의 3차원 영역인 가상캔버스(201)(Virtual canvas)에서 형성된 사용자 손의 이동 궤적을 인식하여 데이터를 생성하는 궤적추적부(210); 및 궤적추적부(210)로부터 데이터를 전달받고, 해당 데이터에 대한 학습을 수행하여 궤적탐지데이터를 생성하는 궤적탐지부(220)를 포함할 수 있다.The trajectory detection module 200 receives the captured image from the imaging unit 410 and generates data by recognizing the movement trace of the user's hand formed on the virtual canvas 201, which is a predetermined three-dimensional area. tracking unit 210; And it may include a trajectory detection unit 220 that receives data from the trajectory tracking unit 210, performs learning on the data, and generates trajectory detection data.

궤적추적부(210)는, 가상의 3차원 공간 영역인 가상캔버스(201) 공간을 생성할 수 있는데, 사용자에게 근접해 있는 촬상부(410)의 한 점을 기준으로 하여 촬상부(410)의 전방 소정의 3차원 공간을 가상캔버스(201)로 설정할 수 있다.The trajectory tracking unit 210 can create a virtual canvas 201 space, which is a virtual three-dimensional space area, in front of the imaging unit 410 based on a point on the imaging unit 410 that is close to the user. A predetermined three-dimensional space can be set as a virtual canvas 201.

도 4와 도 6에서 보는 바와 같이, 복수 개의 촬상부(410) 중 하나의 촬상부(410)와 수신호표시부(420)가 서로 결합되어 형성될 수 있으며, 촬상부(410)의 전방이자 수신호표시부(420)의 전방으로부터 이격되어 형성되는 소정의 3차원 공간이 가상캔버스(201)로 설정될 수 있다.As shown in FIGS. 4 and 6, one of the plurality of imaging units 410 may be formed by combining one of the imaging units 410 and the hand signal display unit 420, and the front of the imaging unit 410 and the hand signal display unit A predetermined three-dimensional space formed away from the front of 420 may be set as the virtual canvas 201.

그리고, 가상캔버스(201)의 각 지점은 3차원 좌표로 설정될 수 있으며, 사용자의 손이 가상캔버스(201) 내로 이동하면 궤적추적부(210)가 사용자의 손을 인식하고, 사용자의 손 이동 시 사용자 손의 이동 궤적을 추적하여 인식할 수 있다.Additionally, each point of the virtual canvas 201 can be set to three-dimensional coordinates, and when the user's hand moves within the virtual canvas 201, the trajectory tracking unit 210 recognizes the user's hand and tracks the user's hand movement. It can be recognized by tracking the movement trajectory of the user's hand.

궤적추적부(210)도, 컨벡스헐(convex-hull) 알고리즘을 구비하여, 형상추적부(110)와 같이 각각의 촬상이미지를 이용하여 손 영역을 획득할 수 있으며, 획득된 손 영역을 모두 포함하는 컨벡스헐(convex-hull)을 구하고, 손 영역의 컨벡스헐에서 경계점들을 추출하고 경계점들 각각의 3차원 위치 및 속도를 연산할 수 있다.The trajectory tracking unit 210 is also equipped with a convex-hull algorithm and, like the shape tracking unit 110, can acquire the hand area using each captured image and includes all acquired hand areas. You can obtain a convex-hull, extract boundary points from the convex hull of the hand area, and calculate the 3D position and velocity of each boundary point.

이 때, 사용자는 손의 형상을 고정시킨 상태에서 손을 이동시켜 궤적을 생성할 수 있으며, 수신호표시부(420)는 상기된 경계점들의 3차원 좌표 중에서 수신호표시부(420)에 가장 근접한 3차원 좌표 지점을 궤적중심점으로 설정할 수 있으며, 이에 따라, 사용자가 가장 전방으로 내민 손 끝 등이 궤적중심점으로 설정되고, 궤적추적부(210)는 궤적중심점의 궤적을 인식하여 데이터를 생성할 수 있다.At this time, the user can create a trajectory by moving the hand while fixing the shape of the hand, and the hand signal display unit 420 is the three-dimensional coordinate point closest to the hand signal display unit 420 among the three-dimensional coordinates of the above-described boundary points. can be set as the trajectory center point. Accordingly, the user's fingertip, which is most forward, is set as the trajectory center point, and the trajectory tracking unit 210 can recognize the trajectory of the trajectory center point and generate data.

그리고, 궤적추적부(210)는, 상기와 같은 궤적중심점의 3차원 위치 및 속도의 변화 여부 및 변화 정도를 이용하여 궤적중심점의 궤적을 인식하게 되며, 이와 같이 사용자 손의 궤적인 수신호 궤적을 추적하여 인식한 사항에 대한 데이터를 궤적탐지부(220)로 전달할 수 있다.In addition, the trajectory tracking unit 210 recognizes the trajectory of the trajectory center point using the change and degree of change in the three-dimensional position and speed of the trajectory center point, and in this way, tracks the hand signal trajectory, which is the trace of the user's hand. In this way, data about the recognized matter can be transmitted to the trajectory detection unit 220.

궤적탐지부(220)에서, 가상캔버스(201) 공간에서 수행된 수신호의 궤적을 Point Cloud 데이터와 유사한 형태로 처리하여 분류 모델로 입력될 수 있으며, 이와 같은 분류모델은 기존에 학습된 수신호 궤적의 정보를 바탕으로 가상캔버스(201) 공간에 그려진 수신호인 궤적 수신호의 의미를 추론할 수 있다.In the trajectory detection unit 220, the trajectory of the hand signal performed in the space of the virtual canvas 201 can be processed in a form similar to Point Cloud data and input into a classification model, and this classification model can be used as a classification model based on the previously learned hand signal trajectory. Based on the information, the meaning of the trajectory hand signal, which is a hand signal drawn in the space of the virtual canvas 201, can be inferred.

구체적으로, 궤적탐지부(220)에서는 CNN 등과 같은 인공신경망을 이용하여 궤적탐지부(220)로부터 전달된 데이터를 학습하고 다단 탐지/분류 모델을 활용하여 인식함으로써, 궤적탐지데이터를 형성할 수 있다.Specifically, the trajectory detection unit 220 learns the data transmitted from the trajectory detection unit 220 using an artificial neural network such as CNN and recognizes it using a multi-stage detection/classification model, thereby forming trajectory detection data. .

도 7에서는, 궤적탐지부(220)에 의해 탐지/분류되어 생성된 궤적탐지데이터와, 이와 같은 궤적탐지데이터의 이미지에 대한 사항 및, 궤적탐지데이터가 저장되는 구성(56K images) 등이 개시되어 있다.In Figure 7, the trajectory detection data detected/classified and generated by the trajectory detection unit 220, details on the images of such trajectory detection data, and the configuration (56K images) in which the trajectory detection data are stored are disclosed. there is.

수신호표시부(420)는 촬상부(410)에 의해 촬상된 사용자의 손에 대한 이미지가 표시됨과 동시에, 가상캔버스(201) 공간도 하나의 불투면한 색상의 이미지로 표현될 수 있다. 그리고, 사용자는 수신호표시부(420)의 화면을 통해 사용자의 손이 가상캔버스(201) 공간 내 위치인지 확인할 수 있으며, 사용자가 손으로 수신호를 생성하는 경우, 수신호표시부(420)의 화면에는 사용자 손의 궤적이 표시될 수 있다.The hand signal display unit 420 displays an image of the user's hand captured by the imaging unit 410, and at the same time, the space of the virtual canvas 201 may be expressed as an image in an opaque color. In addition, the user can check whether the user's hand is located within the space of the virtual canvas 201 through the screen of the hand signal display unit 420. When the user creates a hand signal with his hand, the user's hand is displayed on the screen of the hand signal display unit 420. The trajectory can be displayed.

상기와 같이, 궤적탐지데이터를 형성하는 경우에는, 본 발명의 수신호 인식 장치가 수신호 궤적 추적 모드(모델)로 동작하는 것으로, 사용자가 상기된 ASL과 가티 정형적인 심볼을 암기할 필요가 없이, 통용되는 언어의 모양을 허공에 입력하여 키보드 등의 입력장치를 대체할 수 있다.As described above, in the case of forming trace detection data, the hand signal recognition device of the present invention operates in the hand signal trace tracking mode (model), so that the user does not need to memorize the above-mentioned ASL and Gatti formal symbols, and commonly used You can replace input devices such as a keyboard by inputting the shape of the language into the air.

도 8은 본 발명의 일 실시 예에 따른 수신호 인식 장치의 구성에 대한 개략도이고, 도 9는 본 발명의 일 실시 예에 따른 추적유닛(310)에 대한 개략도이며, 도 10은 본 발명의 일 실시 예에 따른 손 궤적 추적에 대한 개념도이다.Figure 8 is a schematic diagram of the configuration of a hand signal recognition device according to an embodiment of the present invention, Figure 9 is a schematic diagram of a tracking unit 310 according to an embodiment of the present invention, and Figure 10 is an embodiment of the present invention. This is a conceptual diagram of hand trace tracking according to an example.

도 8의 (a)는 본 발명의 수신호 인식 장치에서, 초기의 학습 데이터를 이용하여 데이터를 축적한 경우의 소프트웨어(S/W)를 나타낸 것이고, 도 8의 (b)는 형상탐지데이터와 궤적탐지데이터 각각의 양 증가에 따라 학습 데이터가 증가하여 발전된 형태의 본 발명의 수신호 인식 장치에서의 소프트웨어(S/W)를 나타낸 것이다.Figure 8 (a) shows software (S/W) when data is accumulated using initial learning data in the hand signal recognition device of the present invention, and Figure 8 (b) shows shape detection data and trajectory This shows the software (S/W) in the hand signal recognition device of the present invention in an advanced form in which learning data increases as the amount of detection data increases.

그리고, 도 8의 (c)는 형상탐지데이터에 대한 학습모듈의 학습을 나타낸 것이고, 도 8의 (d)는 궤적탐지모듈(200)에 대한 실시간 수신호 추적(Dynamic convex tracking)을 나타낸 것이다.And, Figure 8(c) shows the learning of the learning module for shape detection data, and Figure 8(d) shows real-time hand signal tracking (Dynamic convex tracking) for the trajectory detection module 200.

도 8에서 보는 바와 같이, 시간이 지날 수록 도 8의 (b)와 같이 발전된 형태의 본 발명의 수신호 인식 장치에 의해 형상탐지데이터와 궤적탐지데이터가 형성될 수 있다.As shown in Figure 8, as time passes, shape detection data and trajectory detection data can be formed by the hand signal recognition device of the present invention in an advanced form as shown in (b) of Figure 8.

학습모듈은, 형상탐지데이터 또는 궤적탐지데이터의 이미지에 대해 실시간 학습을 수행하여 반응맵의 추적좌표를 추론하는 추적유닛(310); 및 추적유닛(310)으로부터 전달받은 데이터를 축적하는 데이터축적유닛(320)을 구비할 수 있다.The learning module includes a tracking unit 310 that performs real-time learning on images of shape detection data or trajectory detection data to infer tracking coordinates of a reaction map; and a data accumulation unit 320 that accumulates data received from the tracking unit 310.

그리고, 추적유닛(310)은, 형상탐지데이터 또는 궤적탐지데이터의 이미지에서 특징을 추출하는 특징추출부(311); 특징추출부(311)로부터 전달된 데이터를 인공신경망을 이용하여 학습하고 이미지에 대한 반응맵을 생성하는 학습부(312); 학습부(312)로 전달된 반응맵에 대한 보정을 수행하는 보정부(313); 및 보정부(313)로부터 전달된 반응맵을 이용하여 이미지 객체에 대한 특징강화 처리를 수행하는 특징강화부(314)를 구비할 수 있다.And, the tracking unit 310 includes a feature extraction unit 311 that extracts features from images of shape detection data or trajectory detection data; a learning unit 312 that learns the data transmitted from the feature extraction unit 311 using an artificial neural network and generates a response map for the image; A correction unit 313 that performs correction on the response map transmitted to the learning unit 312; and a feature enhancement unit 314 that performs feature enhancement processing on the image object using the response map transmitted from the correction unit 313.

도 8과 도 9에서 보는 바와 같이, 형상탐지데이터 또는 궤적탐지데이터는 특징추출부(311)로 전달되어 입력되며, 특징추출부(311)는 DiMP(Discriminative Model Prediction) 추적기를 내장하고 있으며, 이와 같은 DiMP 추적기는 ResNet 기반의 구별적 모델 예측 특징 추출 모델, 즉, ResNet50 백본 구조(Backbone Feature) 기반의 사전학습 특징 추출기를 구비할 수 있다.As shown in Figures 8 and 9, shape detection data or trajectory detection data is transmitted and input to the feature extraction unit 311, and the feature extraction unit 311 has a built-in DiMP (Discriminative Model Prediction) tracker, and The same DiMP tracker may be equipped with a ResNet-based discriminative model prediction feature extraction model, that is, a pre-learning feature extractor based on the ResNet50 backbone structure.

학습부(312)는 특징추출부(311)로부터 전달된 데이터에 대한 인공신경망을 이용한 연산을 수행하고 반응맵(Response Map)을 생성할 수 있으며, 여기서, 인공신경망으로는 CNN(Convolution Neural Network), DNN(Deep Neural Network), RNN(Recurrent Neural Network) 등이 이용될 수 있으며, 이에 한정되는 것은 아니다.The learning unit 312 can perform calculations using an artificial neural network on the data transmitted from the feature extraction unit 311 and generate a response map, where the artificial neural network is a CNN (Convolution Neural Network). , DNN (Deep Neural Network), RNN (Recurrent Neural Network), etc. may be used, but are not limited thereto.

보정부(313)는 IoU-Net(Intersection over Union NETwork) 모델을 내장할 수 있으며, 보정부(313)에서는 전달받은 반응맵의 데이터에 대해서, 반응맵의 최대값을 추론하여 반응맵의 최대값 좌표에 따라 객체 추적좌표를 갱신할 수 있다.The correction unit 313 may have a built-in IoU-Net (Intersection over Union NETwork) model, and the correction unit 313 infers the maximum value of the reaction map for the received reaction map data and determines the maximum value of the reaction map. Object tracking coordinates can be updated according to the coordinates.

특징추출부(311)와 학습부(312)에 의한 동작만으로는 입력되는 이미지(영상) 조건에 따라 추적의 성능이 저하될 수 있다. 예를 들어, 이미지에서 추적 중인 객체의 특징적 요소가 감소되거나 특징 정보가 부족할 경우, 반응맵의 최대값의 좌표가 다수의 국소 좌표로 분산되거나, 최대값과 최소값의 차이가 줄어드는 현상과 함께, 신뢰도 값이 줄어들 수 있다.Tracking performance may deteriorate depending on the conditions of the input image (video) only by the operation of the feature extraction unit 311 and the learning unit 312. For example, when the characteristic elements of the object being tracked in the image are reduced or feature information is lacking, the coordinates of the maximum value of the response map are distributed to multiple local coordinates, or the difference between the maximum and minimum values is reduced, along with the reliability. The value may decrease.

이러한 문제점으로 특징추출부(311)에서 추적기의 연속적 객체추적 동작에서 성능의 오차가 증가하게 되는데, 이와 같은 한계사항을 극복하기 위해 특징추출부(311)와 학습부(312)에 의한 반응맵이 보정부(313)로 전달되며, 추적된 경계 상자와 정답값 간의 IoU(Intersection over Union)를 예측하기 위해 학습된 네트워크로, 추적좌표의 정밀 보정을 수행하여 보다 정확한 경계 상자를 추정할 수 있다.Due to this problem, the performance error increases in the continuous object tracking operation of the tracker in the feature extraction unit 311. To overcome these limitations, the response map by the feature extraction unit 311 and the learning unit 312 is used. It is transmitted to the correction unit 313, and is a network learned to predict the IoU (Intersection over Union) between the tracked bounding box and the correct value. A more accurate bounding box can be estimated by performing precise correction of the tracking coordinates.

상기와 같이, 추적유닛(310)에 DiMP 추적기와 IoU-Net 모델의 2가지 복합적 서브 네트워크 모델이 융합되어 동작하는 DDiMP(Depth enhanced DiMP)를 포함함으로써, 반응맵의 추적좌표에 대한 추론 정확도를 향상시킬 수 있다.As described above, the tracking unit 310 includes DDiMP (Depth enhanced DiMP), which operates by fusing two complex sub-network models, the DiMP tracker and the IoU-Net model, to improve the inference accuracy for the tracking coordinates of the reaction map. You can do it.

보정부(313)에서 추적된 경계 상자와 정답값 간의 IoU 값을 만족하는 데이터는 데이터축적유닛(320)으로 전달되고, 그렇지 못한 데이터는 localization response로 분류되어 특징강화부(314)로 전달될 수 있다.Data that satisfies the IoU value between the bounding box tracked in the correction unit 313 and the correct answer value can be transmitted to the data accumulation unit 320, and data that does not meet the requirements can be classified as a localization response and transmitted to the feature enhancement unit 314. there is.

특징강화부(314)에서는 동적 탐색영역 특징강화(RFDSA, Reinforced Feature with Dynamic Search Area)를 수행하며, 구체적으로, 특징강화부(314)는 보정부(313)에 의해 보정된 반응맵의 출력 좌표로부터 관심 탐색영역 (ROI, Region of Interest)을 지정하고 객체의 존재 확률이 높은 탐색영역에 대해 적응적으로 영상의 특징을 강화하는 처리를 수행할 수 있다.The feature enhancement unit 314 performs Reinforced Feature with Dynamic Search Area (RFDSA). Specifically, the feature enhancement unit 314 performs the output coordinates of the response map corrected by the correction unit 313. From this, a region of interest (ROI) can be designated and processing to adaptively enhance image features can be performed on the search region where the object has a high probability of being present.

이 경우, 전체 영상의 이미지와 탐색영역의 영상 이미지의 화질 차이로 인하여 객체의 형상정보가 왜곡될 수 있으므로, 동적 가우시안 마스킹(Dynamic masking) 처리를 통하여 객체의 특징 강화 영역이 자연스럽게 처리되도록 보정할 수 있다.In this case, the shape information of the object may be distorted due to the difference in quality between the image of the entire image and the image of the search area, so it can be corrected so that the feature enhancement area of the object is naturally processed through dynamic Gaussian masking. there is.

다음으로, 특징강화부(314)에서는 데이터에 대한 영상 적응처리(Adaptive processing)을 수행하여 특징 강화된 시퀀스(Reinfocrced Feature Sequence)를 생성하고, 이와 같은 특징 강화된 시퀀스를 다시 특징추출부(311)로 전달하고, 이를 이용하여 초기추적유닛(310)이 동작함으로써, 객체 정보가 부족하더라도 객체의 특징 추출하여 추적좌표를 생성하는 효율이 향상될 수 있다.Next, the feature enhancement unit 314 performs image adaptive processing on the data to generate a feature-enhanced feature sequence, and this feature-enhanced sequence is again sent to the feature extraction unit 311. By transmitting the information to and using it to operate the initial tracking unit 310, the efficiency of extracting features of the object and generating tracking coordinates can be improved even if object information is insufficient.

데이터축적유닛(320)은 추적유닛(310)으로부터 전달된 데이터를 자동으로 축적하며, 축적되는 데이터에 대해서 데이터축적유닛(320)에서는 입력되는 데이터와 기존의 데이터의 상관도를 점검하여 유사도가 유사도의 기준값 보다 높은 데이터는 망각(forget)시키는 방식으로 축적되는 데이터의 포화를 예방할 수 있다.The data accumulation unit 320 automatically accumulates the data transmitted from the tracking unit 310, and for the accumulated data, the data accumulation unit 320 checks the correlation between the input data and the existing data and determines the similarity. Saturation of accumulated data can be prevented by forgetting data higher than the reference value.

도 10의 (a)는 궤적추적부(210)로 사용자의 손 궤적에 의한 수신호가 입력되는 화면을 나타내고, 도 10의 (b)는 궤적추적부(210)에서 실시간 인식이 수행되는 사항에 대한 이미지이며, 도 10의 (c)는 궤적추적부(210)의 수신호 인식에 의한 사항이 수신호표시부(420)의 화면에 표시되는 것을 나타낸 것이다.Figure 10 (a) shows a screen where a hand signal based on the user's hand trace is input to the trace tracking unit 210, and Figure 10 (b) shows the details on which real-time recognition is performed in the trace tracking part 210. It is an image, and FIG. 10(c) shows that information resulting from the hand signal recognition of the trace tracking unit 210 is displayed on the screen of the hand signal display unit 420.

도 8의 (d)와 도 10에서 보는 바와 같이, 궤적탐지데이터는 학습모듈에 의해 학습되어 학습모듈에 의한 데이터인 학습데이터의 생성에 이용됨과 동시에, 실시간으로 사용자의 손 궤적에 의한 수신호로 인식되어 키보드의 입력과 같은 입력 기능을 수행할 수 있다. 그리고, 이와 같은 기능은 ASL 수신호의 경우에도 동일하게 수행될 수 있음은 당연하다.As shown in Figure 8 (d) and Figure 10, the trajectory detection data is learned by the learning module and used to generate learning data, which is data by the learning module, and is recognized as a hand signal based on the user's hand trace in real time. It is possible to perform input functions such as keyboard input. And, of course, this same function can be performed equally in the case of ASL hand signals.

그리고, 상기와 같이 학습모듈에 의해 제안되는 두 가지 모드(수신호 형상 탐지/인식, 수신호 궤적 추적 인식 모드)의 구현을 위하여 다단의 인공지능 탐지/분류 모델과 궤적 신호 분류모델의 데이터 학습이 수행되는 것이며, 이에 따라 생성되는 학습데이터는 상기와 같은 다중 도메인 정밀 추적 알고리즘에 의해 생성되는 것으로써, 개발자의 최소 개입으로 생성될 수 있다.In addition, in order to implement the two modes (hand signal shape detection/recognition, hand signal trace tracking recognition mode) proposed by the learning module as above, data learning of a multi-stage artificial intelligence detection/classification model and a trajectory signal classification model is performed. The learning data generated accordingly is generated by the multi-domain precise tracking algorithm described above, and can be generated with minimal intervention by the developer.

이와 같은 학습데이터도 지속적으로 형상탐지모듈(100)과 궤적탐지모듈(200)로 전달되며, 이를 학습한 궤적탐지모듈(200)과 형상탐지모듈(100)에 의해 도 8의 (a)에서와 같은 초기 버전의 수신호 인식 장치는 도 8의 (b)와 같은 발전(Development)된 형태의 수신호 인식 장치로 발전될 수 있다.Such learning data is also continuously transmitted to the shape detection module 100 and the trajectory detection module 200, and the trajectory detection module 200 and the shape detection module 100 that have learned this are used as shown in (a) of Figure 8. The same early version of the hand signal recognition device can be developed into a developed type of hand signal recognition device as shown in (b) of FIG. 8.

구체적으로, 학습데이터는 촬상부(410)에 의한 촬상이 수행되지 않는 시간에 주로 이용되며, 형상탐지모듈(100)과 궤적탐지모듈(200) 각각은 학습데이터에 의한 학습을 수행하고, 이에 따라, 형상탐지모듈(100)과 궤적탐지모듈(200) 각각의 정확도가 향상되어 형상탐지모듈(100)에 의한 형상 수신호 인식과 궤적탐지모듈(200)에 의한 궤적 수신호 정확도가 향상되어, 이와 같은 수신호를 이용한 입력 정확도가 향상될 수 있다.Specifically, the learning data is mainly used at times when imaging by the imaging unit 410 is not performed, and each of the shape detection module 100 and the trajectory detection module 200 performs learning using the learning data, and accordingly , the accuracy of each of the shape detection module 100 and the trajectory detection module 200 is improved, and the shape hand signal recognition by the shape detection module 100 and the accuracy of the trajectory hand signal by the trajectory detection module 200 are improved, such hand signals Input accuracy can be improved using .

도 11과 도 12는 본 발명의 일 실시 예에 따른 수신호 인식 장치를 이용한 수신호 인식 시험에 대한 이미지이다. 구체적으로, 도 11는 촬상부(410)에서 ASL 수신호를 촬상한 경우에 대한 것이도, 도 12 는 촬상부(410)에서 손의 궤적에 의한 수신호를 촬상한 경우에 대한 것이다.Figures 11 and 12 are images of a hand signal recognition test using a hand signal recognition device according to an embodiment of the present invention. Specifically, FIG. 11 shows a case where the image capture unit 410 captures an ASL hand signal, and FIG. 12 shows a case where the image capture unit 410 captures a hand signal based on a hand trace.

도 11의 (a)는 촬상부(410)를 향해 형상 수신호를 수행하는 사항에 대한 것이고, 도 11의 (b)는 수신호표시부(420)에서 형상 수신호 인식에 대해 표시되는 화면에 대한 것이다. 그리고, 도 12의 (a)는 촬상부(410)를 향해 궤적 수신호를 수행하는 사항에 대한 것이고, 도 12의 (b)는 수신호표시부(420)에서 궤적 수신호 인식에 대해 표시되는 화면에 대한 것이다.Figure 11 (a) is about performing a shape hand signal toward the imaging unit 410, and Figure 11 (b) is about a screen displayed for shape hand signal recognition on the hand signal display unit 420. In addition, Figure 12 (a) is about performing a trace hand signal toward the imaging unit 410, and Figure 12 (b) is about the screen displayed for trace hand signal recognition on the hand signal display unit 420. .

도 11에서 보는 바와 같이, 본 발명의 수신호 장치가 각각의 숫자 수신호를 인식하여 화면에 정확하게 표시하는 것을 확인할 수 있고, 도 12에서 보는 바와 같이, 본 발명의 수신호 장치가 손의 궤적에 의한 수신호를 인식하여 화면에 인식한 형태로 나타내는 것을 확인할 수 있다.As shown in Figure 11, it can be confirmed that the hand signal device of the present invention recognizes each numeric hand signal and displays it accurately on the screen, and as shown in Figure 12, the hand signal device of the present invention recognizes hand signals based on the trace of the hand. You can confirm that it is recognized and displayed in the recognized form on the screen.

본 발명의 수신호 인식 장치; 및 비전 인공지능 기반 수신호 인식 장치로부터 전달된 숫자, 문자 또는 기호에 대한 정보를 처리하여 제어를 수행하는 게임 장치를 포함하는 실내용 게임 시스템이 형성될 수 있다.Hand signal recognition device of the present invention; And an indoor game system including a game device that performs control by processing information about numbers, letters, or symbols transmitted from a vision artificial intelligence-based hand signal recognition device can be formed.

구체적으로, 게임 장치에서는 온라인을 이용한 투어라이딩(Tour Riding) 등을 수행할 수 있으며, 사용자가 게임 장치에 상기와 같은 숫자, 문자 또는 기호를 입력하여 게임 장치를 제어하는 경우에 본 발명의 수신호 인식 장치가 이용될 수 있다.Specifically, the game device can perform online tour riding, etc., and when the user controls the game device by inputting the above numbers, letters, or symbols into the game device, the present invention recognizes hand signals. The device may be used.

이하, 본 발명의 수신호 인식 장치를 이용한 수신호 인식 방법에 대해서 설명하기로 한다.Hereinafter, a hand signal recognition method using the hand signal recognition device of the present invention will be described.

먼저, 제1단계에서, 복수 개의 촬상부(410)에서 사용자 손 동작을 촬상하여 촬상이미지를 수집할 수 있다. 그리고, 제2단계에서, 촬상이미지가 형상탐지모듈(100)과 궤적탐지모듈(200)로 전달될 수 있다.First, in the first step, captured images can be collected by capturing the user's hand movements in the plurality of imaging units 410. And, in the second step, the captured image can be transmitted to the shape detection module 100 and the trajectory detection module 200.

다음으로, 제3단계에서, 형상탐지모듈(100)에서 형상탐지데이터가 형성되고, 궤적탐지모듈(200)에서 궤적탐지데이터가 형성될 수 있다. 그리고, 제4단계에서, 형상탐지데이터에 의한 이미지 또는 궤적탐지데이터에 의한 이미지가 수신호표시부(420)에 표시될 수 있다.Next, in the third step, shape detection data may be formed in the shape detection module 100, and trajectory detection data may be formed in the trajectory detection module 200. And, in the fourth step, an image based on shape detection data or an image based on trajectory detection data may be displayed on the hand signal display unit 420.

제4단계에서, 수신호표시부(420)의 화면에는 상기와 같은 형상탐지데이터에 의한 이미지인 형상 수신호와 궤적탐지데이터에 의한 이미지인 궤적 수신호 각각이 동시에 표시되거나, 또는, 화면 상 각각 별도의 영역에서 표시될 수 있다.In the fourth step, the shape hand signal, which is an image based on shape detection data as described above, and the trace hand signal, which is an image based on trace detection data, are displayed simultaneously on the screen of the hand signal display unit 420, or in separate areas on the screen. can be displayed.

제4단계 이 후 제5단계에서는, 학습모듈에 의해 생성된 학습데이터를 이용하여 형상탐지모듈(100)과 궤적탐지모듈(200)에서 학습을 수행할 수 있다. 구체적으로, 각각의 모듈에 포함된 구성 중 인공신경망에 의해 학습을 수행하는 구성에서는, 학습모듈로부터 학습데이터를 전달받아 학습을 수행할 수 있다.In the fifth step after the fourth step, learning can be performed in the shape detection module 100 and the trajectory detection module 200 using the learning data generated by the learning module. Specifically, among the components included in each module, in the component that performs learning by artificial neural network, learning can be performed by receiving learning data from the learning module.

본 발명의 수신호 인식 방법에 대한 나머지 상세한 사항은, 상기된 본 발명의 수신호 인식 장치에 대해 기재된 사항과 동일하다.The remaining details of the hand signal recognition method of the present invention are the same as those described for the hand signal recognition device of the present invention described above.

상기와 같은 구성의 본 발명의 수신호 인식 장치를 이용하는 경우, 사용자의 손에 센서 부착형 입력 장치 등을 부착할 필요 없이, 비전 센서 등의 촬상 장치에 의한 촬상으로 수신호를 인식하므로, 편리하고 경제적일 수 있다.When using the hand signal recognition device of the present invention configured as described above, the hand signal is recognized through imaging by an imaging device such as a vision sensor without the need to attach a sensor-attached input device to the user's hand, making it convenient and economical. You can.

또한, 촬상 장치와 이격된 거리에서도 입력 장치로 활용이 가능하며, 대형 디스플레이를 위한 제어기술로 활용이 가능하여, 활용도가 증가한다는 장점이 있다.In addition, it can be used as an input device even at a distance from the imaging device, and can be used as a control technology for large displays, which has the advantage of increasing usability.

그리고, 단순한 제스처부터 복잡한 문자 또는 서명 정보 등 입력 신호의 종류 개수의 제한이 없으므로, 다양한 정보를 신속하게 입력 가능하다.Additionally, there is no limit to the number of types of input signals, including simple gestures to complex text or signature information, so a variety of information can be input quickly.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The description of the present invention described above is for illustrative purposes, and those skilled in the art will understand that the present invention can be easily modified into other specific forms without changing the technical idea or essential features of the present invention. will be. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive. For example, each component described as single may be implemented in a distributed manner, and similarly, components described as distributed may also be implemented in a combined form.

본 발명의 범위는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the patent claims described below, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention.

100 : 형상탐지모듈
110 : 형상추적부
120 : 형상판단부
130 : 형상탐지부
200 : 궤적탐지모듈
201 : 가상캔버스
210 : 궤적추적부
220 : 궤적탐지부
300 : 학습모듈
310 : 추적유닛
311 : 특징추출부
312 : 학습부
313 : 보정부
314 : 특징강화부
320 : 데이터축적유닛
410 : 촬상부
420 : 수신호표시부 100: Shape detection module
110: Shape tracking unit
120: Shape judgment unit
130: Shape detection unit
200: Trajectory detection module
201: Virtual Canvas
210: Trajectory tracking unit
220: Trajectory detection unit
300: Learning module
310: tracking unit
311: Feature extraction unit
312: Learning Department
313: Correction unit
314: Feature enhancement unit
320: Data accumulation unit
410: imaging unit
420: Hand signal display unit

Claims

An imaging unit that captures the motion of the user's hand and generates a captured image;
a shape detection module that receives the captured image from the imaging unit and generates shape detection data, which is data generated by recognizing the shape and change of the user's hand;
a trajectory detection module that receives the captured image from the imaging unit and generates trace detection data, which is data generated by recognizing the trace of the user's hand; and
A learning module that receives the shape detection data from the shape detection module, receives the trajectory detection data from the trajectory detection module, and performs learning on the shape detection data or the trajectory detection data.
The learning module includes a tracking unit that performs real-time learning on the image of the shape detection data or the trajectory detection data to infer the tracking coordinates of the reaction map,
The tracking unit may include a feature extraction unit that extracts features from an image of the shape detection data or the trajectory detection data; and a learning unit that learns the data transmitted from the feature extraction unit using an artificial neural network and generates the response map for the image. a correction unit that performs correction on the response map transmitted to the learning unit; and a feature enhancement unit that performs feature enhancement processing on an image object using the response map delivered from the correction unit.

In claim 1,
A vision artificial intelligence-based hand signal recognition device comprising a plurality of imaging units, wherein the captured image includes a visible light image that is an image by visible light imaging, an infrared image that is an image by infrared imaging, or a three-dimensional image.

In claim 1,
The shape detection module is,
a shape tracking unit that receives the captured image from the imaging unit and generates data by recognizing the shape and change of the user's hand; and
A vision artificial intelligence-based hand signal recognition device comprising a shape determination unit that receives data from the shape tracking unit, determines whether the user's hand movement corresponds to a hand signal, classifies it, and generates data.

In claim 3,
The shape detection module is a vision artificial intelligence-based hand signal recognition device, characterized in that it further includes a shape detection unit that receives data from the shape determination unit and performs learning on the data to generate the shape detection data.

In claim 3,
The shape tracking unit is a vision artificial intelligence-based hand signal recognition device characterized in that it is provided with a convex-hull algorithm to perform convex tracking.

In claim 1,
The learning module is a vision artificial intelligence-based hand signal recognition device, characterized in that it further includes a data accumulation unit that accumulates data received from the tracking unit.

delete

In claim 1,
The trajectory detection module,
a trajectory tracking unit that receives the captured image from the imaging unit and generates data by recognizing the movement trace of the user's hand formed on a virtual canvas, which is a predetermined three-dimensional area; and
A vision artificial intelligence-based hand signal recognition device comprising a trajectory detection unit that receives data from the trajectory tracking unit, performs learning on the data, and generates the trajectory detection data.

In claim 1,
A vision artificial intelligence-based hand signal recognition device further comprising a hand signal display unit that displays an image using the generated shape detection data or the trajectory detection data on a screen.

A vision artificial intelligence-based hand signal recognition device according to any one of claims 1 to 6, claims 9, and 10; and
An indoor game system comprising a game device that performs control by processing information about numbers, letters, or symbols transmitted from the vision artificial intelligence-based hand signal recognition device.

In the hand signal recognition method using the vision artificial intelligence-based hand signal recognition device of claim 1,
A first step of collecting the captured images by capturing the user's hand movements in a plurality of imaging units;
A second step in which the captured image is transmitted to the shape detection module and the trajectory detection module;
A third step in which the shape detection data is formed in the shape detection module and the trajectory detection data is formed in the trajectory detection module; and
A hand signal recognition method comprising a fourth step in which an image based on the shape detection data or an image based on the trajectory detection data is displayed on a hand signal display unit.

In claim 12,
A hand signal recognition method further comprising a fifth step of performing learning in the shape detection module and the trajectory detection module using learning data generated by the learning module.