KR101986002B1

KR101986002B1 - Artificial agents and method for human intention understanding based on perception-action connected learning, recording medium for performing the method

Info

Publication number: KR101986002B1
Application number: KR1020170022051A
Authority: KR
Inventors: 이민호; 김상욱
Original assignee: 경북대학교 산학협력단
Priority date: 2017-01-17
Filing date: 2017-02-20
Publication date: 2019-06-04
Also published as: KR20180084576A

Abstract

행동-인식 연결 학습 기반 의도 이해 장치는, 매 프레임마다 관측되는 사용자 행동의 관절 정보 및 사용자 주변 환경의 객체 정보를 검출하는 입력부; 상기 입력부로부터 수신한 상기 관절 정보 및 상기 객체 정보를 인공 신경망 처리가 가능하도록 전처리하는 전처리부; 상기 전처리부에서 출력된 상기 관절 정보 및 상기 객체 정보를 기초로, 사용자의 행동 정보를 분류하는 행동 인식 처리부; 상기 전처리부에서 출력된 객체 정보 및 상기 행동 인식 처리부에서 출력된 행동 정보를 이용하여, 사용자의 행동과 관련된 객체 후보군을 출력하는 객체 관계 정보 처리부; 및 상기 행동 인식 처리부에서 출력된 행동 정보 및 상기 객체 관계 정보 처리부에서 출력된 객체 후보군을 입력으로 하는 인공 신경망을 통해 사용자 의도 인식 결과를 출력하는 의도 출력부를 포함한다. 이에 따라, 사용자 행동과, 그 행동에 관계된 객체 정보로부터 사용자의 의도를 정확하게 예측 가능하다.The behavior-aware link learning-based intention understanding device includes: an input unit for detecting joint information of user behavior observed every frame and object information of a user's surrounding environment; A pre-processing unit for pre-processing the joint information and the object information received from the input unit so as to enable artificial neural network processing; A behavior recognition processing unit for classifying the behavior information of the user based on the joint information and the object information output from the preprocessing unit; An object relation information processing unit for outputting an object candidate group related to a user's action by using the object information output from the preprocessing unit and the behavior information output from the behavior recognition processing unit; And an intention output unit for outputting the user's intention recognition result through the artificial neural network that receives the behavior information output from the behavior recognition processing unit and the object candidate group output from the object relationship information processing unit. Accordingly, the user's intention can be accurately predicted from the user behavior and the object information related to the behavior.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a behavior-aware connection learning intention understanding device, a method, and a recording medium for performing the method. 2. Description of the Related Art [0002]

본 발명은 행동-인식 연결 학습 기반 의도 이해 장치, 방법 및 그 방법을 수행하기 위한 기록 매체에 관한 것으로서, 더욱 상세하게는 사용자 행동과, 그에 관계된 객체 정보로부터 사용자의 의도를 예측하는 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus, a method, and a recording medium for understanding a behavior-aware connection learning-based intention, and more particularly to an apparatus for predicting a user's intention from user behavior and object information related thereto .

최근 IT 기술의 발달로 사용자 행동에 기반한 자연스러운 인터페이스 설계, 스마트 홈 기반 생활 관련 정보 획득, 헬스 케어에서의 사용자 건강, 상태, 운동 관리 등 많은 서비스들이 기획되고 있으나 아직 실시간 환경에서 사용자 행동과 주변 객체간의 상호작용을 바탕으로 한 신뢰성 있는 의도 파악을 보장하지 못하여 제품 실시로 이어지기가 힘든 경우가 많다.In recent years, many services such as natural interface design based on user behavior, acquisition of life related information based on smart home, user health in healthcare, state and exercise management have been planned. However, It is often difficult to achieve product implementation because it can not guarantee reliable intent based on interaction.

의도 이해는 인간에게 다양한 서비스를 제공할 수 있는 로봇과 같은 인공 지능 에이전트를 개발하는데 중요하다. 지난 수년간 인공 에이전트의 개발에 상당한 진전을 보았으나 아직 감정 또는 의도 인식과 같은 인지의 기본 요소 중 일부를 구현하는 것과는 거리가 멀다. 이들은 자연적으로 인간에게 속하고, 인간과 인간의 상호 작용을 독특하게 만드는 정신 능력의 일부이다.Understanding intention is important for developing an artificial intelligence agent such as a robot that can provide various services to human beings. Over the years, we have made considerable progress in the development of artificial agents, but it is far from realizing some of the basic elements of cognition, such as emotion or intent perception. They are part of the mental capacity that naturally belongs to humans and makes human-human interaction unique.

특히, 다른 사람들의 의도를 이해하는 능력, 즉 "공감(empathy)"은 인간과 인간의 의사 소통의 기본이라고 주장되어 왔다. Theory of Mind(Premack & Woodruff, 1978)는 인간은 다른 사람들의 의도를 이해하거나 다른 사람들과 공감할 수 있는 타고난 고유의 능력을 가지고 있다고 주장한다. 인간이 일관되고 유용한 방식으로 반응할 수 있게 하는 이 능력은 또한 언어 이해, 학습 및 감정 인식과 같은 다른 인간 영역으로 확장된다. In particular, the ability to understand other people's intentions, or "empathy," has been claimed to be the basis of human-human communication. Theory of Mind (Premack & Woodruff, 1978) argues that humans have the innate inherent ability to understand others' intentions and to empathize with others. This ability to enable humans to respond in a consistent and useful way also extends to other human domains, such as language understanding, learning, and emotional awareness.

따라서, 지능을 가진 인간과 비슷하게 행동하고 반응하는 인공 에이전트를 개발하기 위해서는 그러한 정신 능력을 구현하는 것이 중요하고, 특히 다른 사람의 의도를 이해하는 능력이 중요하다. Therefore, it is important to develop such mental abilities to develop artificial agents that behave and respond in a manner similar to those of intelligent humans, especially the ability to understand others' intentions.

그러나, 의도를 파악하기 위해서는 먼저 인간은 어떻게 이 능력을 얻는지, 감정 이입의 생리적, 생물학적 기초 또는 다른 사람의 의도를 이해하는 능력은 무엇인지, 이러한 인지 과정에서 무엇인가를 배울 수 있는지, 인공 에이전트에서 의도 인식 능력을 시뮬레이션 할 수 있는지 등의 문제를 해결하여야 한다.However, in order to grasp the intention, it is necessary to first understand how human beings acquire this ability, what the physiological, biological basis of empathy, or the ability to understand others' intentions, what can be learned from this cognitive process, And the ability to simulate intentional recognition capability in the context of the present invention.

현재, 인공 에이전트에서 의도를 인식하는 능력을 구현하기 위해 몇 가지 전산 모델이 제안되었다. 어떤 것들은 물체의 행동 유도성(affordance)을 기반으로 하는 반면, 다른 것들은 행동 예측에 기반을 두고 있다. 그러나, 제안된 모델들은 명시적인 제스처와 유사한 동작을 얻었지만, 동작과 관련된 객체는 고려하지 않은 한계가 있다.Currently, several computational models have been proposed to implement the ability to recognize intent in artificial agents. Some are based on the object's affordance, while others are based on behavioral predictions. However, the proposed models have similar behavior to explicit gestures, but there are limits to objects that do not take into account the objects related to motion.

KR 10-1605078 B1KR 10-1605078 B1 KR 10-1592977 B1KR 10-1592977 B1

Yu, Zhibin, and Minho Lee, Human motion based intent recognition using a deep dynamic neural model, Robotics and Autonomous Systems, 2015 Yu, Zhibin, and Minho Lee, Human motion based intent recognition using a deep dynamic neural model, Robotics and Autonomous Systems, 2015 Kim, S., Kavuri, S., & Lee, M., Intention Recognition and Object Recommendation System using Deep Auto-encoder based Affordance Model, In The 1st International Conference on Human-Agent Interaction, 2013 Kim, S., Kavuri, S., & Lee, M., Intention Recognition and Object Recommendation System using Deep Auto-encoder based Affordance Model, In the 1st International Conference on Human-Agent Interaction, 2013 Yu, Z., & Lee, M., Real-time human action classification using a dynamic neural model, Neural Networks, 69, 29-43, 2015 Yu, Z., & Lee, M., Real-time human action classification using a dynamic neural model, Neural Networks, 69, 29-43, 2015

이에, 본 발명의 기술적 과제는 이러한 점에서 착안된 것으로 본 발명의 목적은 행동-인식 연결 학습 기반 의도 이해 장치를 제공하는 것이다.SUMMARY OF THE INVENTION Accordingly, the present invention has been made in view of the above problems, and it is an object of the present invention to provide a behavior-aware connection learning intention understanding device.

본 발명의 다른 목적은 행동-인식 연결 학습 기반 의도 이해 방법을 제공하는 것이다.It is another object of the present invention to provide a method for understanding behavior-aware connection learning based intent.

본 발명의 또 다른 목적은 상기 행동-인식 연결 학습 기반 의도 이해 방법을 수행하기 위한 컴퓨터 프로그램이 기록된 기록 매체를 제공하는 것이다.Yet another object of the present invention is to provide a recording medium on which a computer program for performing the behavior-aware connection learning-based intention understanding method is recorded.

상기한 본 발명의 목적을 실현하기 위한 일 실시예에 따른 행동-인식 연결 학습 기반 의도 이해 장치는, 매 프레임마다 관측되는 사용자 행동의 관절 정보 및 사용자 주변 환경의 객체 정보를 검출하는 입력부; 상기 입력부로부터 수신한 상기 관절 정보 및 상기 객체 정보를 인공 신경망 처리가 가능하도록 전처리하는 전처리부; 상기 전처리부에서 출력된 상기 관절 정보 및 상기 객체 정보를 기초로, 사용자의 행동 정보를 분류하는 행동 인식 처리부; 상기 전처리부에서 출력된 객체 정보 및 상기 행동 인식 처리부에서 출력된 행동 정보를 이용하여, 사용자의 행동과 관련된 객체 후보군을 출력하는 객체 관계 정보 처리부; 및 상기 행동 인식 처리부에서 출력된 행동 정보 및 상기 객체 관계 정보 처리부에서 출력된 객체 후보군을 입력으로 하는 인공 신경망을 통해 사용자 의도 인식 결과를 출력하는 의도 출력부를 포함한다.According to an embodiment of the present invention, there is provided an apparatus for understanding an action-aware connection learning intention according to an embodiment of the present invention includes an input unit for detecting joint information of user behavior observed per frame and object information of a user's surrounding environment; A pre-processing unit for pre-processing the joint information and the object information received from the input unit so as to enable artificial neural network processing; A behavior recognition processing unit for classifying the behavior information of the user based on the joint information and the object information output from the preprocessing unit; An object relation information processing unit for outputting an object candidate group related to a user's action by using the object information output from the preprocessing unit and the behavior information output from the behavior recognition processing unit; And an intention output unit for outputting the user's intention recognition result through the artificial neural network that receives the behavior information output from the behavior recognition processing unit and the object candidate group output from the object relationship information processing unit.

본 발명의 실시예에서, 상기 입력부는, 가시광선대 영상 센서 및 능동적 적외선 패턴 투사 센서 중 적어도 하나의 정보를 바탕으로, 사용자의 주요 관절의 이차원 또는 삼차원 위치 정보를 실시간으로 수집할 수 있다.In an embodiment of the present invention, the input unit may collect two-dimensional or three-dimensional positional information of a user's main joint in real time based on at least one of a visible light ray zone image sensor and an active infrared ray pattern projection sensor.

본 발명의 실시예에서, 상기 입력부는, 감지 영역 내에 사용자가 다수일 경우, 각 사용자의 행동 모델링 정보의 간섭을 피하기 위해 얼굴 인식 또는 추적 기능 중 적어도 하나를 수행할 수 있다.In an embodiment of the present invention, the input unit may perform at least one of face recognition or tracking to avoid interference of behavior modeling information of each user when there are a plurality of users in the sensing area.

본 발명의 실시예에서, 상기 전처리부는, 상기 입력부로부터 수신한 상기 관절 정보를 정규화 및 부호화하여 상기 행동 인식 처리부로 전달하는 관절 정보 전처리부; 및 상기 입력부로부터 수신한 영상 정보로부터 상기 객체 정보를 추출하고, 사용자가 손으로 집은 객체의 레이블을 상기 객체 관계 정보 처리부로 전달하는 객체 정보 전처리부를 포함할 수 있다.In the embodiment of the present invention, the pre-processing unit may include: a joint information preprocessing unit for normalizing and encoding the joint information received from the input unit and transmitting the joint information to the behavior recognition processing unit; And an object information preprocessing unit for extracting the object information from the image information received from the input unit and delivering the label of the object held by the user to the object relationship information processing unit.

본 발명의 실시예에서, 상기 관절 정보 전처리부는, 상기 입력부의 감지 영역 내에서 획득되는 절대 좌표 기반의 주요 관절 정보를 사용자 별 상대 좌표 표현으로 정규화하는 정규화부; 및 정규화된 상대 좌표 표현을 신경망 친화적인 방식으로 표현하는 부호화부를 포함할 수 있다.In an embodiment of the present invention, the joint information preprocessing unit may include: a normalization unit for normalizing major joint information based on an absolute coordinate acquired in a sensing region of the input unit to a relative coordinate representation for each user; And an encoding unit for expressing the normalized relative coordinate expression in a neural network-friendly manner.

본 발명의 실시예에서, 상기 부호화부는, 자가 생성 맵(SOM; self-organizing map)을 사용할 수 있다.In an embodiment of the present invention, the encoding unit may use a self-organizing map (SOM).

본 발명의 실시예에서, 상기 행동 인식 처리부는, 상기 전처리부에서 출력된 상기 관절 정보 및 상기 객체 정보를 회귀 신경망을 통해 모델링하여, 행동 인식이 가능하도록 인식 전용 노드를 지정할 수 있다.In the embodiment of the present invention, the behavior recognition processing unit may model the joint information and the object information output from the preprocessing unit through the regression neural network to designate the recognition only node so that the behavior recognition can be performed.

본 발명의 실시예에서, 상기 행동 인식 처리부는, 실시간 처리 단계 및 시험 단계에서 주어진 입력에 대응하여 출력되는 인식 전용 노드를 학습할 때, 사용자 행동 벡터들과 비교하여 가장 가까운 행동을 추출할 수 있다.In the embodiment of the present invention, the behavior recognition processor may extract the closest behavior by comparing the user behavior vectors with each other when learning a recognition-only node output corresponding to a given input in the real-time processing step and the test step .

본 발명의 실시예에서, 상기 객체 관계 정보 처리부는, 상기 전처리부에서 출력된 객체 정보 및 상기 행동 인식 처리부에서 출력된 행동 정보를 자가 부호화망을 통해 객체 관계를 모델링하여, 객체 자가 부호화 결과를 출력할 수 있다.In the embodiment of the present invention, the object relationship information processing unit may model the object relation through the self-coding network, on the object information output from the preprocessing unit and the behavior information output from the behavior recognition processing unit, can do.

본 발명의 실시예에서, 상기 사용자 행동의 관절 정보는, 척추 중반, 목, 머리, 어깨 왼쪽, 팔꿈치 왼쪽, 손목 왼쪽, 어깨 오른쪽, 팔꿈치 오른쪽, 손목 오른쪽, 엉덩이 왼쪽, 엉덩이 오른쪽 및 어깨(Spine Shoulder) 중 적어도 하나의 골격점이 사용될 수 있다.In an embodiment of the present invention, the joint information of the user's actions may include a midline of the spine, neck, head, left shoulder, left elbow, left wrist, right shoulder, right elbow, right wrist, left hip, right hip and shoulder ) May be used.

상기한 본 발명의 다른 목적을 실현하기 위한 일 실시예에 따른 행동-인식 연결 학습 기반 의도 이해 방법은, 매 프레임마다 관측되는 사용자 행동의 관절 정보 및 사용자 주변 환경의 객체 정보를 검출하는 단계; 상기 관절 정보 및 상기 객체 정보를 인공 신경망 처리가 가능하도록 전처리하는 전처리 단계; 상기 전처리 단계에서 출력된 상기 관절 정보 및 상기 객체 정보를 기초로, 사용자의 행동 정보를 분류하는 행동 인식 처리 단계; 상기 전처리 단계에서 출력된 객체 정보 및 상기 행동 인식 처리 단계에서 출력된 행동 정보를 이용하여, 사용자의 행동과 관련된 객체 후보군을 출력하는 객체 관계 정보 처리 단계; 및 상기 행동 인식 처리 단계에서 출력된 행동 정보 및 상기 객체 관계 정보 처리 단계에서 출력된 객체 후보군을 입력으로 하는 인공 신경망을 통해 사용자 의도 인식 결과를 출력하는 의도 출력 단계를 포함한다.According to another aspect of the present invention, there is provided a method of understanding an action-aware connection learning intention according to an exemplary embodiment of the present invention includes: detecting joint information of user behavior and object information of a user's environment; A pre-processing step of pre-processing the joint information and the object information so as to enable artificial neural network processing; A behavior recognition processing step of classifying the behavior information of the user based on the joint information and the object information output in the preprocessing step; An object relation information processing step of outputting an object candidate group related to a user's action using the object information output in the preprocessing step and the behavior information output in the behavior recognition processing step; And an intention output step of outputting the user's intention recognition result through the artificial neural network inputting the behavior information output from the behavior recognition processing step and the object candidate group output from the object relationship information processing step.

본 발명의 실시예에서, 상기 사용자 주변 환경의 객체 정보를 검출하는 단계는, 가시광선대 영상 센서 및 능동적 적외선 패턴 투사 센서 중 적어도 하나의 정보를 바탕으로, 사용자의 주요 관절의 이차원 또는 삼차원 위치 정보를 실시간으로 수집할 수 있다.In the embodiment of the present invention, the step of detecting the object information of the user's surroundings may include the step of detecting two-dimensional or three-dimensional position information of the main joint of the user based on at least one of the visible light ray group image sensor and the active infrared ray pattern projection sensor It can be collected in real time.

본 발명의 실시예에서, 상기 사용자 주변 환경의 객체 정보를 검출하는 단계는, 감지 영역 내에 사용자가 다수일 경우, 각 사용자의 행동 모델링 정보의 간섭을 피하기 위해 얼굴 인식 또는 추적 기능 중 적어도 하나를 수행할 수 있다.In the embodiment of the present invention, the step of detecting object information of the user environment may include at least one of face recognition or tracking function to avoid interference of behavior modeling information of each user when there are a plurality of users in the sensing area can do.

본 발명의 실시예에서, 상기 전처리 단계는, 감지 영역 내에서 획득되는 절대 좌표 기반의 주요 관절 정보를 사용자 별 상대 좌표 표현으로 정규화하는 단계; 및 정규화된 상대 좌표 표현을 신경망 친화적인 방식으로 표현하는 부호화 단계를 포함할 수 있다.In the embodiment of the present invention, the pre-processing step may include: normalizing the joint-based information on the basis of the absolute coordinates acquired in the sensing area into a relative coordinate representation per user; And an encoding step of representing the normalized relative coordinate representation in a neural network-friendly manner.

본 발명의 실시예에서, 상기 부호화 단계는, 자가 생성 맵(SOM; self-organizing map)을 사용할 수 있다.In an embodiment of the present invention, the encoding step may use a self-organizing map (SOM).

본 발명의 실시예에서, 상기 전처리 단계는, 수신한 영상 정보로부터 상기 객체 정보를 추출하고, 사용자가 손으로 집은 객체의 레이블을 송신할 수 있다.In the embodiment of the present invention, the preprocessing step extracts the object information from the received image information, and transmits the label of the object that the user holds by hand.

본 발명의 실시예에서, 상기 행동 인식 처리 단계는, 상기 전처리 단계에서 출력된 상기 관절 정보 및 상기 객체 정보를 회귀 신경망을 통해 모델링하여, 행동 인식이 가능하도록 인식 전용 노드를 지정할 수 있다.In the embodiment of the present invention, the behavior recognition processing step may model the joint information and the object information output from the preprocessing step through the regression neural network to designate the recognition only node so that the behavior recognition can be performed.

본 발명의 실시예에서, 상기 행동 인식 처리 단계는, 실시간 처리 단계 및 시험 단계에서 주어진 입력에 대응하여 출력되는 인식 전용 노드를 학습할 때, 사용자 행동 벡터들과 비교하여 가장 가까운 행동을 추출할 수 있다. In the embodiment of the present invention, the behavior recognizing process step may include extracting the closest behavior by comparing with the user behavior vectors when learning a recognition-only node output corresponding to a given input in the real-time processing step and the testing step have.

본 발명의 실시예에서, 상기 객체 관계 정보 처리 단계는, 상기 전처리부에서 출력된 객체 정보 및 상기 행동 인식 처리부에서 출력된 행동 정보를 자가 부호화망을 통해 객체 관계를 모델링하여, 객체 자가 부호화 결과를 출력할 수 있다.In the embodiment of the present invention, the object relationship information processing step may include modeling an object relation through the self-encoding network by the object information output from the preprocessing unit and the behavior information output from the behavior recognition processing unit, Can be output.

상기한 본 발명의 또 다른 목적을 실현하기 위한 일 실시예에 따른 컴퓨터로 판독 가능한 저장 매체에는, 행동-인식 연결 학습 기반 의도 이해 방법을 수행하기 위한 컴퓨터 프로그램이 기록되어 있다. According to another aspect of the present invention, there is provided a computer-readable storage medium storing a computer program for performing a behavior-aware connection learning based understanding understanding method.

이와 같은 행동-인식 연결 학습 기반 의도 이해 방법에 따르면, 기존 사용자 행동 기반 의도 인식 장치의 한계를 극복하기 위하여 객체 인식 모델과 결합하고 적합한 학습 방법을 제안함으로써 더 높은 유연함과 정확성을 가지는 신뢰성 있는 사용자 의도 인식 장치를 제공한다.In order to overcome the limitations of the existing user behavior based intention recognition device, the behavior-aware connection learning based intention understanding method is combined with the object recognition model and suggested an appropriate learning method, so that a reliable user intention having higher flexibility and accuracy A recognition device is provided.

또한, 본 발명이 제안하는 기술은 정적 성질을 가지는 객체간 관계 모델과 동적 성질을 가지는 사용자 행동 인식 모델을 결합하여 상호 정보를 사용함으로써 성능을 획기적으로 개선할 수 있다. 나아가, 본 발명에서 제안하는 구조 및 학습 방법을 사용함으로써, 사용자 행동 인식 성능과 객체간 관계 모델 성능 모두를 향상시킬 수 있다.In addition, the technology proposed by the present invention can dramatically improve performance by using mutual information by combining a relationship model between objects having a static property and a user behavior recognition model having a dynamic property. Furthermore, by using the structure and learning method proposed in the present invention, both the user behavior recognition performance and the inter-object relationship model performance can be improved.

도 1은 본 발명의 일 실시예에 따른 행동-인식 연결 학습 기반 의도 이해 장치의 블록도이다.
도 2는 본 발명에서 제안한 의도 인식을 위한 OA-SMTRN 모델을 나타내는 도면이다.
도 3은 본 발명에서 제안된 모델의 간략한 실행의 흐름도이다.
도 4는 행동 모델링 및 의도 추론을 위한 생물학적 모델을 나타내는 도면이다.
도 5는 행동 이해에 기반한 의도 추론 및 객체 예측을 위한 행동 모듈이다.
도 6은 자동 인코더의 간략도이다.
도 7은 심층 자동 인코더의 학습 과정을 보여주는 도면이다.
도 8은 도 2의 OA-SMTRNN의 예측 모듈의 구조를 나타내는 도면이다.
도 9는 본 발명의 일 실시예에 따른 행동-인식 연결 학습 기반 의도 이해 방법의 흐름도이다.1 is a block diagram of a behavior-aware connection learning intention understanding device in accordance with an embodiment of the present invention.
2 is a diagram illustrating an OA-SMTRN model for intention recognition proposed in the present invention.
Figure 3 is a flow chart of a simplified implementation of the model proposed in the present invention.
Figure 4 is a diagram illustrating a biological model for behavioral modeling and intentional inference.
5 is a behavior module for intentional reasoning and object prediction based on behavior understanding.
6 is a simplified diagram of an automatic encoder.
7 is a diagram showing a learning process of the deep automatic encoder.
8 is a diagram showing a structure of a prediction module of OA-SMTRNN of FIG.
9 is a flowchart of a behavior-aware connection learning intention understanding method according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.The following detailed description of the invention refers to the accompanying drawings, which illustrate, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different, but need not be mutually exclusive. For example, certain features, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in connection with an embodiment. It is also to be understood that the position or arrangement of the individual components within each disclosed embodiment may be varied without departing from the spirit and scope of the invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is to be limited only by the appended claims, along with the full scope of equivalents to which such claims are entitled, if properly explained. In the drawings, like reference numerals refer to the same or similar functions throughout the several views.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다. Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 행동-인식 연결 학습 기반 의도 이해 장치의 블록도이다.1 is a block diagram of a behavior-aware connection learning intention understanding device in accordance with an embodiment of the present invention.

본 발명에 따른 행동-인식 연결 학습 기반 의도 이해 장치(10, 이하 장치)는 기존 사용자 행동 기반 의도 인식 장치의 한계를 극복하기 위하여 객체 인식 모델과 결합하고 적합한 학습 방법을 제안함으로써, 높은 유연함과 정확성을 가지는 신뢰성 있는 사용자 의도 인식 장치를 제안한다.In order to overcome the limitations of the existing user behavior based intention recognizing apparatus, the device 10 for understanding the behavior-aware connection learning based on the present invention combines with the object recognition model and proposes an appropriate learning method, A user's intention recognizing device.

먼저, 의도를 인식할 수 있는 능력을 구현하기 위해 두 가지 인지 프로세스로서, 환경에서의 객체 합리성에 대한 인식과 인간 행동의 예측이 있다. 이 두 프로세스는 의도에 대한 암시적 및 명시적 정보를 제공하며, 다양한 수준에서 상호 작용한다. 본 발명은 인식과 행동 연결의 구현이 인공 에이전트에서 인간의 의도를 인식하는 기능을 구현하기 위한 열쇠가 될 수 있다고 가정한다.First, there are two cognitive processes to realize the ability to recognize intention: awareness of object rationality in the environment and prediction of human behavior. These two processes provide implicit and explicit information about the intent and interact at various levels. The present invention assumes that the implementation of the recognition and behavioral connection can be the key to implementing the function of recognizing human intention in the artificial agent.

한편, 동적인 인간 행동에서 의도 신호를 추출하는 것과 마찬가지로, 행동 분류 문제를 관리하기 위한 다양한 동적 모델이 개발되었다. HMM은 인간 행동을 분석하고 분류하는 잘 알려진 모델이다(Gehrig, Kuehne, Woerner, & Schultz, 2009). 다른 하나는 Multiple Timescale Recurrent Neural Network(MTRNN) (Yamashita & Tani, 2008)와 같은 반복적 신경망(RNN) 기반 모델(

& Stagge, 2003)이다. 이 모델은 행동 분류를 위해 슈퍼바이저된(supervised) MTRNN(SMTRNN)으로 확장되었다(Yu & Lee, 2015).On the other hand, various dynamic models have been developed to manage behavior classification problems, as well as extract intent signals from dynamic human behavior. HMM is a well-known model for analyzing and classifying human behavior (Gehrig, Kuehne, Woerner, & Schultz, 2009). The other is a repetitive neural network (RNN) -based model such as Multiple Timescale Recurrent Neural Network (MTRNN) (Yamashita & Tani, 2008)

& Stagge, 2003). This model has been extended to supervised MTRNN (SMTRNN) for behavioral classification (Yu & Lee, 2015).

본 발명에서 제안하는 시스템은 의도 분류를 위해 인식-행동 연결을 기반으로 독립적인 두 프로세스를 Object Augmented-SMTRNN(OA-SMTRNN)로 통합한다. 인식된 객체나 예측된 행동 순서의 합리성만을 근거로 의도를 결정하는 것은 여러 가지 가능한 의도를 나타내기 때문에 오류가 발생할 가능성이 있다. The system proposed in the present invention integrates two independent processes based on the recognition-action connection into Object Augmented-SMTRNN (OA-SMTRNN) for intent classification. Determining the intention based solely on the rationality of a recognized object or a predicted sequence of actions is likely to cause errors because it represents several possible intentions.

따라서, 특정한 의도를 정확하게 결정하기 위해서는 인식과 행동의 상호 작용을 파악해야 하며, 본 발명은 인식과 행동이 함께 사용자 의도를 결정하는 정확성을 향상시키는 루프를 형성한다고 가정하고, 사용자 행동과, 그에 관계된 객체 정보로부터 사용자의 의도를 예측한다. Therefore, in order to accurately determine a specific intention, it is necessary to grasp the interaction between the recognition and the action. The present invention assumes that the recognition and the action together form a loop for improving the accuracy of determining the user's intention, The user's intention is predicted from the object information.

사용자 행동은 시간에 따른 사용자 신체의 움직임을 의미하며, 이를 모델링하기 위해서는 동적 신호를 다루어야 한다. 그리고 사용자의 행동 중에 사용하는 객체들은 단독으로 사용되거나 같이 사용되는 후보 객체들이 존재하여 그를 도출할 수 있는 관계 모델링이 사용자 의도 인식에 요구된다.User behavior refers to the user's body movements over time, and dynamic signals must be handled to model them. In addition, the objects used during the user 's behavior are used alone, or there are candidates to be used together, and relational modeling that can derive them is required for user intention recognition.

본 발명은 인식-행동 연결 학습을 위한 전산 모델(Object Augmented Multiple Timescale Recurrent Neural Network; 이하, OA-SMTRN)이라는 새로 제안된 모델에서 인간의 의도를 이해하기 위한 인식-행동 연계 학습을 구현한다. 본 발명에서 제안된 모델은 이전 연구(Kim, Kavuri, & Lee, 2013; Yu & Lee, 2015)의 확장이다. The present invention implements recognition-action-based learning for understanding human intention in a newly proposed model called Object Augmented Multiple Timescale Recurrent Neural Network (hereinafter, OA-SMTRN). The model proposed in the present invention is an extension of the previous study (Kim, Kavuri, & Lee, 2013; Yu & Lee, 2015).

본 발명에 따른 장치(10)는 사용자의 주요 관절 정보 및 주변 환경 정보를 바탕으로 사용자의 현재 행동을 인식하고, 관련 객체 및 후보군을 검출함으로써 사용자의 의도 정보를 인식한다.The device 10 according to the present invention recognizes the user's current behavior based on the user's main joint information and surrounding environment information, and detects the user's intention information by detecting related objects and candidate groups.

도 1을 참조하면, 본 발명에 따른 장치(10)는 입력부(100), 전처리부(300) 및 처리부(500)를 포함한다. 더욱 상세하게, 상기 전처리부(300)는 관절 정보 전처리부(310) 및 객체 정보 전처리부(330)를 포함하고, 상기 처리부(500)는 행동 인식 처리부(510), 객체 관계 정보 처리부(530) 및 의도 출력부(550)를 포함한다.Referring to FIG. 1, an apparatus 10 according to the present invention includes an input unit 100, a preprocessing unit 300, and a processing unit 500. More specifically, the pre-processing unit 300 includes a joint information preprocessing unit 310 and an object information preprocessing unit 330. The processing unit 500 includes a behavior recognition processing unit 510, an object relation information processing unit 530, And an intention output unit 550.

본 발명의 상기 장치(10)는 행동-인식 연결 학습 기반 의도를 이해하기 위한 소프트웨어(애플리케이션)가 설치되어 실행될 수 있으며, 상기 입력부(100), 상기 전처리부(300) 및 상기 처리부(500)의 구성은 상기 장치(10)에서 실행되는 상기 행동-인식 연결 학습 기반 의도를 이해하기 위한 소프트웨어에 의해 제어될 수 있다. The apparatus 10 of the present invention can be installed and executed with software (application) for understanding the behavior-aware connection learning intention. The apparatus 10 can be implemented by the input unit 100, the preprocessing unit 300, The configuration can be controlled by software for understanding the behavior-aware connection learning-based intention that is executed in the device 10. [

상기 장치(10)는 별도의 단말이거나 또는 단말의 일부 모듈일 수 있다. 또한, 상기 입력부(100), 상기 전처리부(300) 및 상기 처리부(500)의 구성은 통합 모듈로 형성되거나, 하나 이상의 모듈로 이루어질 수 있다. 그러나, 이와 반대로 각 구성은 별도의 모듈로 이루어질 수도 있다.The device 10 may be a separate terminal or some module of the terminal. The configuration of the input unit 100, the preprocessing unit 300, and the processing unit 500 may be an integrated module or may be composed of one or more modules. However, conversely, each configuration may be a separate module.

상기 장치(10)는 이동성을 갖거나 고정될 수 있다. 상기 장치(10)는, 서버(server) 또는 엔진(engine) 형태일 수 있으며, 디바이스(device), 기구(apparatus), 단말(terminal), UE(user equipment), MS(mobile station), 무선기기(wireless device), 휴대기기(handheld device) 등 다른 용어로 불릴 수 있다. The device 10 may be mobile or stationary. The device 10 may be in the form of a server or an engine and may be a device, an apparatus, a terminal, a user equipment (UE), a mobile station (MS) a wireless device, a handheld device, and the like.

상기 장치(10)는 운영체제(Operation System; OS), 즉 시스템을 기반으로 다양한 소프트웨어를 실행하거나 제작할 수 있다. 상기 운영체제는 소프트웨어가 장치의 하드웨어를 사용할 수 있도록 하기 위한 시스템 프로그램으로서, 안드로이드 OS, iOS, 윈도우 모바일 OS, 바다 OS, 심비안 OS, 블랙베리 OS 등 모바일 컴퓨터 운영체제 및 윈도우 계열, 리눅스 계열, 유닉스 계열, MAC, AIX, HP-UX 등 컴퓨터 운영체제를 모두 포함할 수 있다.The device 10 may execute or produce various software based on an operating system (OS), i.e., a system. The operating system is a system program for allowing software to use the hardware of a device. The operating system includes a mobile computer operating system such as Android OS, iOS, Windows Mobile OS, Sea OS, Symbian OS, Blackberry OS, MAC, AIX, and HP-UX.

상기 입력부(100)는 매 프레임마다 관측되는 사용자 행동의 관절 정보 및 사용자 주변 환경의 객체 정보를 검출하여, 상기 전처리부(300)로 전달한다. 상기 입력부(100)는 가시광선대 영상 센서 및 능동적 적외선 패턴 투사 센서 중 적어도 하나의 정보를 바탕으로, 사용자의 주요 관절의 이차원 또는 삼차원 위치 정보를 실시간으로 수집할 수 있다.The input unit 100 detects the joint information of the user's behavior and the object information of the user's surroundings, which are observed every frame, and transmits the detected object information to the preprocessing unit 300. The input unit 100 may acquire two-dimensional or three-dimensional positional information of the user's major joints in real time based on at least one of the visible light ray image sensor and the active infrared ray pattern projection sensor.

만약, 상기 입력부(100)는 감지 영역 내에 사용자가 다수일 경우, 각 사용자의 행동 모델링 정보의 간섭을 피하기 위해 얼굴 인식 또는 추적 기능 중 적어도 하나를 수행할 수 있다.If there are a plurality of users in the sensing area, the input unit 100 may perform at least one of face recognition and tracking functions to avoid interference of the behavior modeling information of each user.

상기 사용자 행동의 관절 정보는, 척추 중반, 목, 머리, 어깨 왼쪽, 팔꿈치 왼쪽, 손목 왼쪽, 어깨 오른쪽, 팔꿈치 오른쪽, 손목 오른쪽, 엉덩이 왼쪽, 엉덩이 오른쪽 및 어깨(Spine Shoulder) 등의 골격점이 사용될 수 있다.The joint information of the user's actions may be skeletal points such as midline, neck, head, left shoulder, left elbow, left wrist, right shoulder, right elbow, right wrist, left hip, right hip and shoulder have.

또한, 상기 입력부(100)는 사용자의 주변 객체 정보를 파악하기 위하여 가시광선대 영상 센서 정보를 바탕으로 사용자 주변 객체 정보를 실시간으로 검출하여 수집한다.In addition, the input unit 100 detects and collects user peripheral object information in real time based on the visible light ray sensor image sensor information to grasp the user's peripheral object information.

다른 실시예로 상기 입력부(100)는 사용자의 주요 관절 위치 정보를 파악하기 위한 입력부와 사용자의 주변 객체 정보를 파악하기 위한 입력부가 별도로 구비될 수 있다.In another embodiment, the input unit 100 may include an input unit for acquiring information on a user's main joint position and an input unit for acquiring information on a user's neighboring object.

상기 전처리부(300)는 상기 입력부(100)로부터 수신한 상기 관절 정보를 정규화 및 부호화하여 상기 행동 인식 처리부(510)로 전달하는 관절 정보 전처리부(310) 및 상기 입력부(100)로부터 수신한 영상 정보로부터 상기 객체 정보를 추출하고, 사용자가 손으로 집은 객체의 레이블을 상기 객체 관계 정보 처리부(530)로 전달하는 객체 정보 전처리부(330)를 포함한다.The preprocessing unit 300 includes a joint information preprocessing unit 310 for normalizing and encoding the joint information received from the input unit 100 and transferring the joint information to the behavior recognition processing unit 510, And an object information preprocessing unit 330 for extracting the object information from the information and transferring the label of the object held by the user to the object relationship information processing unit 530.

상기 관절 정보 전처리부(310)는 상기 입력부(100)의 감지 영역 내에서 획득되는 절대 좌표 기반의 주요 관절 정보를 사용자 별 상대 좌표 표현으로 정규화하는 정규화부 및 정규화된 상대 좌표 표현을 신경망 친화적인 방식으로 표현하는 부호화부를 포함할 수 있다. 상기 부호화부는 자가 생성 맵(SOM; self-organizing map) 등을 사용할 수 있다.The joint information preprocessing unit 310 may include a normalization unit for normalizing the joint information based on the absolute coordinates obtained in the sensing region of the input unit 100 to a relative coordinate representation for each user, and a normalized relative coordinate representation to a neural- As shown in FIG. The encoding unit may use a self-organizing map (SOM) or the like.

상기 처리부(500)는 상기 전처리부(300)에서 전처리되어 출력된 상기 관절 정보 및 상기 객체 정보를 기초로, 사용자의 행동 정보를 분류하는 행동 인식 처리부(510), 상기 전처리부(300)에서 전처리되어 출력된 객체 정보 및 상기 행동 인식 처리부에서 출력된 행동 정보를 이용하여, 사용자의 행동과 관련된 객체 후보군을 출력하는 객체 관계 정보 처리부(530) 및 상기 행동 인식 처리부(510)에서 출력된 행동 정보 및 상기 객체 관계 정보 처리부(530)에서 출력된 객체 후보군을 입력으로 하는 인공 신경망을 통해 사용자 의도 인식 결과를 출력하는 의도 출력부(550)를 포함한다.The processing unit 500 includes a behavior recognition processing unit 510 for classifying the behavior information of the user based on the joint information and the object information that are preprocessed and output in the preprocessing unit 300, An object relation information processor 530 for outputting object candidates related to the user's behavior using the object information output from the behavior recognition processor and the behavior information output from the behavior recognition processor, And an intention output unit 550 for outputting a user's intention recognition result through an artificial neural network that receives an object candidate group output from the object relationship information processing unit 530 as an input.

상기 행동 인식 처리부(510)는, 전처리된 정보를 바탕으로 회귀 신경망을 통해 지속적으로 행동을 인식한다. 구체적으로, 상기 전처리부(300)에서 출력된 상기 관절 정보 및 상기 객체 정보를 회귀 신경망을 통해 모델링하여, 행동 인식이 가능하도록 인식 전용 노드를 지정한다. 상기 행동 인식 처리부(510)는, 실시간 처리 단계 및 시험 단계에서 주어진 입력에 대응하여 출력되는 인식 전용 노드를 학습할 때, 사용자 행동 벡터들과 비교하여 가장 가까운 행동을 추출한다.The behavior recognition processor 510 continuously recognizes the behavior through the regression neural network based on the preprocessed information. Specifically, the joint information and the object information output from the preprocessing unit 300 are modeled through a regression neural network to designate a recognition dedicated node so that the behavior recognition can be performed. The behavior recognition processor 510 extracts the closest behavior by comparing with user behavior vectors when learning a dedicated recognition node output corresponding to a given input in a real-time processing step and a test step.

상기 객체 관계 정보 처리부(530)는, 전처리된 정보를 바탕으로 객체관계 모델을 통해 지속적으로 관련 후보 객체를 도출한다. 구체적으로, 상기 전처리부(300)에서 출력된 객체 정보 및 상기 행동 인식 처리부(510)에서 출력된 행동 정보를 자가 부호화망을 통해 객체 관계를 모델링하여, 객체 자가 부호화 결과를 출력한다.The object relationship information processing unit 530 continuously derives an associated candidate object through the object relationship model based on the preprocessed information. Specifically, the object information output from the preprocessing unit 300 and the behavior information output from the behavior recognition processing unit 510 are modeled through the self-encoding network, and the object self-encoding result is output.

상기 의도 출력부(550)는 행동 인식 처리부(510) 및 객체 관계 정보 처리부(530)의 결과를 종합하여 사용자 의도 정보를 출력한다. 즉, 행동 인식 처리부(510) 및 객체 관계 정보 처리부(530)의 출력을 입력으로 하는 인공 신경망 혹은 회귀 신경망으로, 행동 및 객체 관계 정보를 종합하여 사용자 의도 인식 결과만을 출력한다.The intention output unit 550 outputs the user intention information by integrating the results of the behavior recognition processing unit 510 and the object relationship information processing unit 530. That is, the neural network or the regression neural network, which receives the outputs of the behavior recognition processing unit 510 and the object relationship information processing unit 530 as input, synthesizes behavior and object relationship information and outputs only the user's intention recognition result.

상기 처리부(500)는 본 발명에서 Object Augmented-SMTRNN(이하, OA-SMTRNN) 모델로 구현되며, 이하에서 자세히 설명한다.The processing unit 500 is implemented in an Object Augmented-SMTRNN (hereinafter, OA-SMTRNN) model in the present invention, and will be described in detail below.

도 2는 본 발명에서 제안한 의도 인식을 위한 OA-SMTRN 모델을 나타내는 도면이다.2 is a diagram illustrating an OA-SMTRN model for intention recognition proposed in the present invention.

도 2의 오른쪽은 심층 자동 인코더(deep auto-encoder)(Hinton & Salakhutdinov, 2006)에 의해 모델화되었으며, 이는 객체의 관계 정보를 분석하는 데 사용된다. 심층 자동 인코더의 코드 계층은 가시적 계층에서 정보를 정류하며, 의도된 사용자가 선택한 객체와 연관된 잠재적인 인간의 의도 정보를 나타낸다. 코딩된 정보는 가중치가 W_pa인 SMTRNN의 저속 컨텍스트 계층에 대한 추가 입력으로 사용된다. The right hand side of FIG. 2 is modeled by a deep auto-encoder (Hinton & Salakhutdinov, 2006), which is used to analyze the relationship information of objects. The code layer of the deep autocoder rectifies the information in the visible layer and represents the potential human intent information associated with the object selected by the intended user. The coded information is used as an additional input to the low-speed context layer of SMTRNN with a weight W _pa .

도 2의 왼쪽에 도시된 SMTRNN은 특정 의도와 관련된 인간 동작의 뼈대 궤도를 분석하는데 사용된다. 나아가, 본 발명은 SMTRNN이 인간의 의도를 분류할 뿐만 아니라 현재의 행동과 관련된 물체의 선택을 예측을 시도한다. The SMTRNN shown on the left side of Figure 2 is used to analyze skeletal trajectories of human motion associated with a particular intent. Further, the present invention seeks to predict the selection of objects associated with the current behavior as well as classify the intent of the human being by the SMTRNN.

따라서, 두 가지 분류 노드 그룹이 사람의 의도 분류와 대상 예측 모두에 대해 정의된다. 심층 자동 인코더의 코드 계층에서 얻은 추가 정보를 사용하여 SMTRNN은 비슷한 동작이지만 다른 객체를 사용하는 다양한 의도를 효율적으로 이해할 수 있다. Thus, two classification node groups are defined for both human intention classification and object prediction. Using additional information obtained from the code layer of the deep automatic encoder, SMTRNN is a similar operation, but can efficiently understand the various intentions of using other objects.

또한, SMTRNN에 의해 예측된 객체는 W_ap 연결을 통해 심층 자동 인코더의 숨겨진 계층에 추가의 가시적인 입력으로 재사용될 수 있다. 심층 자동 인코더의 코드 계층의 뉴런 수를 C라고 가정한다. 심층 자동 인코더의 첫 번째 숨겨진 계층에는 H의 숨겨진 뉴런들이 포함된다. SMTRNN의 객체 예측 태스크는 객체들의 O개 유형을 포함한다. SMTRNN의 의도 분류 태스크는 I개의 의도가 있다. In addition, the object predicted by SMTRNN can be reused as an additional visible input to the hidden layer of the deep automatic encoder via the W _ap connection. Let C be the number of neurons in the code layer of the deep autocoder. The first hidden layer of the deep autocoder includes H hidden neurons. The object prediction task of SMTRNN contains O types of objects. The intention classification task of SMTRNN has I intentions.

그러면, W_ap의 크기는 O*H가 되고, W_pa의 크기는 (S-I-O)*C로 정의된다. 여기서, S는 저속 컨텍스트 노드 번호이다. W_pa는 회로 단락을 방지하도록 설계되었으므로, 코드 계층과 분류 출력(I 및 O) 간에 직접 연결은 없다.Then, the size of W _ap becomes O * H , and the size of W _pa is defined as (SIO) * C. Here, S is a low-speed context node number. Since W _pa is designed to prevent circuit shorts, there is no direct connection between the code layer and the classification outputs ( I and O ).

도 2에서 화살표는 정보의 흐름을 나타내고, 두 모듈 사이에 있는 두 개의 화살표는 행동-인식 연결 학습을 위한 경로를 설명한다.In Figure 2, the arrows indicate the flow of information, and the two arrows between the two modules describe the path for behavior-aware connection learning.

도 3은 본 발명에서 제안된 모델의 간략한 실행의 흐름도이다.Figure 3 is a flow chart of a simplified implementation of the model proposed in the present invention.

도 3은 시간 인덱스 t의 기간에서 제안된 모델의 실행 흐름을 기술한다. 도 2의 OA-SMTRNN의 왼쪽과 오른쪽 부분을 각각 행동 모듈과 인식 모듈로 부른다. 의도를 가진 사람이 시간 t에서 행동하면, 도 2의 왼쪽 모듈인 행동 모듈이 인간 행동의 의미를 이해하려고 시도하고, 인간의 의도와 관련된 객체 레이블을 예측한다. Fig. 3 describes the execution flow of the proposed model in the period of time index t. The left and right portions of OA-SMTRNN in FIG. 2 are referred to as behavioral modules and recognition modules, respectively. If a person with the intention acts at time t, the action module, the left module of Figure 2, attempts to understand the meaning of human behavior and predicts the object label associated with the human intention.

그 다음에, 인식 모듈인, 예측된 객체 레이블들을 갖는 심층 자동 인코더는 시간 t에서 코드 계층에서 잠재된 정보를 생성하고, 코드 계층은 저속 컨텍스트 유닛에 연결되어 인간의 의도 식별뿐만 아니라 시간 t+1에서의 다음 객체 선택의 예측을 돕는다. 행동-인식 연결 학습을 위한 전산 모델의 보다 상세한 절차는 이하 설명한다.Then, a deep autonomous encoder with predicted object labels, which is a recognition module, generates latent information at the code layer at time t, and the code layer is connected to the slow context unit to identify human intention as well as time t + 1 Lt; RTI ID = 0.0 > of < / RTI > A more detailed procedure of the computational model for behavior-aware connection learning is described below.

본 발명에서 제안된 방법의 행동 모듈은 특정한 인간의 의도와 관련된 인간의 행동 이해에 기반한 객체 예측으로 확장된다(Yu & Lee, 2015). 도 2의 OA-SMTRNN의 행동 모듈은 IPL 및 PMv를 모델링하기 위해 3 개의 계층을 사용하고, 상향식 모델로 간주한다. 입력-출력 계층의 입력 신호는 인간의 골격 데이터이고, 저속 컨텍스트 계층에는 객체 행동 유동성을 위한 심층 자동 인코더의 코드 계층에 연결된 또 다른 입력 경로가 있다. The behavioral module of the method proposed in the present invention is extended to object prediction based on understanding human behavior related to a specific human intention (Yu & Lee, 2015). The behavioral module of OA-SMTRNN in FIG. 2 uses three layers to model IPL and PMv, and regards it as a bottom-up model. The input signal at the input-output layer is human skeleton data and the slow context layer has another input path connected to the code layer of the deep automatic encoder for object behavior fluidity.

행동의 예측은 대응하는 가능한 객체를 구속하는 효과가 있다. 본 발명에서는 저속 컨텍스트 계층에 있는 특수한 그룹의 뉴런을 행동 및 객체의 합리적인 정보에 기반한 의도 추론을 위한 분류 노드로 정의한다. 본 발명에서는 객체 레이블 예측을 위해 저속 컨텍스트 계층에서 다른 출력 노드를 새로 정의한다. 이 모델은 인간의 현재 행동 기반 의도에 따라 상호 작용하거나 상호 작용해야 하는 "유형의 대상"을 예측할 것이다.Prediction of behavior has the effect of constraining the corresponding possible objects. In the present invention, a special group of neurons in the low-speed context layer is defined as a classifier node for intentional reasoning based on rational information of behavior and objects. In the present invention, another output node is newly defined in the low-speed context layer for object label prediction. This model will predict "types of objects" that need to interact or interact with human behavioral intentions.

도 4는 행동 모델링 및 의도 추론을 위한 생물학적 모델을 나타내는 도면이다.Figure 4 is a diagram illustrating a biological model for behavioral modeling and intentional inference.

입력-출력 계층은 행동 시퀀스를 수신하고 출력하기 위해, 자가 생성 맵(SOM; self-organizing map)에 의해 모델링된다. SOM 알고리즘은 입력 공간의 토폴로지 특성을 보존할 수 있다.The input-output layer is modeled by a self-organizing map (SOM) to receive and output the action sequence. The SOM algorithm can preserve the topology characteristics of the input space.

MTRNN의 주요 구성 요소인 컨텍스트 계층은 연속 시간 반복적인 신경망(CTRNN)을 사용하여 모델링된다. CTRNN은 특수 유형의 RNN이며 생물학적 신경 네트워크의 동적 시스템 모델이다. CTRNN에서 각 뉴런의 출력은 현재 입력 샘플과 신경 상태의 과거 이력을 사용하여 계산된다. 따라서, CTRNN은 연속적인 감각 운동 시퀀스를 예측하는데 적합하다.The main layer of the MTRNN, the context layer, is modeled using a continuous time repetitive neural network (CTRNN). CTRNN is a special type of RNN and is a dynamic system model of biological neural networks. The output of each neuron in the CTRNN is computed using the current input samples and the past history of the neural status. Thus, the CTRNN is suitable for predicting successive sensory motion sequences.

역 전파 시간(BPTT) 알고리즘은 학습에 사용된다. 오류 함수는 Kullback-Leibler divergence를 사용하여 다음의 수학식 1과 같이 정의된다.The back propagation time (BPTT) algorithm is used for learning. The error function is defined by the following Equation 1 using Kullback-Leibler divergence.

[수학식 1][Equation 1]

여기서, O는 입력-출력 계층의 노드이고,

는 시간 단계 t에서의 i 번째 뉴런의 원하는 출력 값이고,

는 기존의 가중치와 초기 상태에서 i번째 뉴런의 예측값이다. 로봇과 관련된 MTRNN에 대한 이전 연구에서 시각 입력 및 모터 고유값을 포함한 포함한 두 개의 SOM 계층이 일반적이지만(Yamashita & Tani, 2008), 본 발명에서는 실제 로봇을 사용하지 않으므로, 고유값 비전(실험에서 골격 좌표) 입력을 단일 SOM 계층에 결합한다.Where O is the node of the input-output layer,

Is the desired output value of the i < th > neuron at time step t,

Is the predicted value of the ith neuron in the original weight and initial state. In previous research on MTRNNs related to robots, two SOM layers including visual input and motor eigenvalues are common (Yamashita & Tani, 2008), but since the present invention does not use actual robots, Coordinate) input into a single SOM layer.

가중치 업데이트 규칙은 다음의 수학식 2를 통해 설명된다.The weight update rule is described by the following equation (2).

[수학식 2]&Quot; (2) "

,

여기서, n은 반복 단계이고, α는 실험에서 0.0005로 설정된 학습률이다. 편미분

는 다음의 수학식 3과 같이 주어진다.Here, n is an iterative step and? Is a learning rate set to 0.0005 in the experiment. Partial differential

Is given by the following equation (3).

[수학식 3]&Quot; (3) "

도 5는 행동 이해에 기반한 의도 추론 및 객체 예측을 위한 행동 모듈이다.5 is a behavior module for intentional reasoning and object prediction based on behavior understanding.

OA-SMTRNN 모델의 행동 모듈의 상세 구조가 도 5의 입출력 계층과 고속 및 저속 컨텍스트 계층의 개념은 MTRNN 모델에서 상속된다. 시간 단계 t에서 시각 정보가 얻어지면, SOM 노드는 특징 추출에 사용된다. Detailed structure of the behavior module of the OA-SMTRNN model The concept of the input / output hierarchy and the high-speed and low-speed context hierarchy of FIG. 5 is inherited from the MTRNN model. If time information is obtained at time step t, the SOM node is used for feature extraction.

입출력 계층과 저속 컨텍스트 계층은 연결되어 있지 않다. 고속 컨텍스트 계층은 입출력 및 저속 컨텍스트 계층을 연결하는 연결부로 기능하며, 고속 및 컨텍스트 계층 내의 노드들은 완전히 연결된다. SOM 노드의 최종 출력은 BPTT 알고리즘을 사용하여 예측 오차를 계산하는 데 사용된다.I / O layer and low-speed context layer are not connected. The high-speed context layer functions as a connection portion connecting the input / output and low-speed context layers, and the nodes in the high-speed and context layers are completely connected. The final output of the SOM node is used to compute the prediction error using the BPTT algorithm.

본 발명에서 이용하는 Yu & Lee, 2015에 정의된 두뇌 모델에 따르면, 분류 노드라고 불리는 특별한 유형의 저속 컨텍스트 노드를 정의한다. 분류 노드는 저속 컨텍스트 계층의 일부이며 동일한 시간 상수 τ를 갖는다. 분류 노드와 다른 저속 컨텍스트 노드 간의 차이점은 의도 및 객체 레이블이 저속 컨텍스트 단위의 분류 노드를 학습하는 데 사용된다는 것이다. According to the brain model defined in Yu & Lee, 2015 used in the present invention, a special type of low-speed context node called a classification node is defined. The classification node is part of the low-speed context layer and has the same time constant τ. The difference between classification nodes and other slow context nodes is that intent and object labels are used to learn classification nodes in slow context units.

분류 노드는 행동 예측 오류뿐만 아니라 다른 노드에 의도 추론 및 객체 예측 오류를 역 전파해야 한다. 분류 노드를 포함한 모든 노드는 동기적으로 작동하고, 테스트 시퀀스가 주어지면 레이블에 해당하는 분류 노드가 활성화된다. 분류 오차를 고려한 후 편미분 방정식은 아래의 수학식 4와 같이 정의된다.Classification nodes must not only propagate behavior prediction errors, but also propagate intention reasoning and object prediction errors back to other nodes. All nodes, including classification nodes, operate synchronously, and when a test sequence is given, the classification node corresponding to the label is activated. The partial differential equation after considering the classification error is defined by Equation (4) below.

[수학식 4]&Quot; (4) "

여기에서, f'(x)는 시그모이드(Sigmoid) 함수의 도함수이고,

는 뉴런 출력과 이상 값의 차이고,

는 t 시간 단계에서 i번째 뉴런 상태이고,

는 뉴런 상태 업데이트 속도를 제어하는 상수이고,

는 크로네커 델타(i=k라면

　= 1, 그렇지 않으면 0)이다. O는 입출력 노드 집합을 나타내고,

및

는 각각 저속 컨텍스트 계층에서의 객체 예측 및 의도 분류에 사용되는 분류 노드를 나타낸다. Here, f '(x) is a derivative of the Sigmoid function,

Is the difference between the neuron output and the ideal value,

Is the i-th neuron state at time t,

Is a constant that controls the neuron status update rate,

Is a Kronecker delta (i = k if

= 1, otherwise 0). O represents an input / output node set,

And

Represent classification nodes used for object prediction and intention classification in the low-speed context layer, respectively.

입력-출력 계층에서, 분류 작업이 저속 컨텍스트 계층에서 수행되는 동안 행동의 예측 신호가 생성된다. 그러나, 본 발명은 이 방정식을 객체 예측에 맞게 수정하였다. 분류에 사용된 Softmax 활성화 함수(Yu & Lee, 2015) 대신 시그모이드(Sigmoid) 함수를 사용하여 최종 출력을 계산한다. 하나의 의도에 대해 여러 객체를 사용할 수 있도록 이 작업이 수행된다. Softmax 기능은 하나의 카테고리만 지원한다.At the input-output layer, a prediction signal of behavior is generated while the classification operation is performed in the low-speed context layer. However, the present invention has modified this equation to accommodate object prediction. The final output is calculated using the Sigmoid function instead of the Softmax activation function (Yu & Lee, 2015) used in the classification. This is done so that multiple objects can be used for an intent. The Softmax function supports only one category.

위에서 언급한 바와 같이, 사용자의 의도는 또한 행동 유동성(affordance)을 기반으로 추론할 수 있다. 본 발명에서 제안된 OA-SMTRNN에서, 심층 구조화 네트워크는 고차원 관계 또는 직접 관찰할 수 없는 특징을 포착할 수 있는 능력으로 알려져 있기 때문에, 심층 학습 구조 중 하나인 심층 자동 인코더를 사용하여 행동 유동성을 모델링한다(Kim, et al., 2013). As noted above, user intent can also be inferred based on behavioral affordability. In the OA-SMTRNN proposed in the present invention, since the deep structured network is known to have a capability of capturing a high-dimensional relationship or a characteristic that can not be directly observed, behavioral fluidity is modeled using a deep automatic encoder, (Kim, et al., 2013).

자동 인코더 모델은 통상적으로 인코더와 디코더의 두 부분으로 구성된다. 인코더는 원래의 입력 신호를 압축으로 볼 수 있는 비교적 짧은 코드로 변환한다. 디코더는 인코더의 대응하며, 인코딩된 신호에서 원래 정보를 재구성한다. 이 모델을 참조하는 이전 연구에서 자동 인코더의 입력은 행동 유동성 정보를 나타내는 벡터이다. 행동 유동성 정보가 인코딩되고 관련 객체가 디코딩된다.The auto-encoder model typically consists of two parts: an encoder and a decoder. The encoder converts the original input signal into a relatively short code that can be viewed as a compression. The decoder corresponds to the encoder and reconstructs the original information in the encoded signal. In previous studies that refer to this model, the input of an automatic encoder is a vector representing behavioral fluidity information. The behavioral fluidity information is encoded and the associated object is decoded.

도 6은 자동 인코더의 간략도이다.6 is a simplified diagram of an automatic encoder.

일반적으로, 자동 인코더의 작업은 정보를 불필요하게 압축하고 다시 복구하기 때문에 무의미한 것으로 간주될 수 있다. 그러나, 자동 인코더가 특정 목적을 위해 제작된 경우 자동 인코더는 신호에서 유용한 정보를 추출하고 노이즈를 제거할 수 있다. 본 발명은 인간의 의도와 관련된 객체의 행동 유동성을 분석하고 의도된 추론을 돕기 위해 관찰된 객체로부터 의미 있는 잠재된 정보를 추출하는 자동 인코더가 필요하다.In general, the work of an automatic encoder can be considered insignificant because it unnecessarily compresses information and restores it. However, if an auto-encoder is built for a specific purpose, the auto-encoder can extract useful information from the signal and remove noise. The present invention requires an automatic encoder that analyzes the behavioral fluidity of objects associated with human intent and extracts meaningful latent information from the observed objects to aid in directed reasoning.

구조가 깊기 때문에, 자동 인코더를 학습하는 것은 어렵다. 본 발명의 모델에서는 Hinton과 Salakhutdinov(Hinton & Salakhutdinov, 2006)가 제안한 방법을 사용한다. Building by Restricted Boltzmann Machine(RBM)은 계층별로 인코더 네트워크를 초기화한다. 디코더 네트워크는 인코더를 리버스한 다음 간단히 미세 조정하여 간단하게 구축할 수 있다.Because of its deep structure, it is difficult to learn automatic encoders. In the model of the present invention, the method proposed by Hinton and Salakhutdinov (Hinton & Salakhutdinov, 2006) is used. Building by Restricted Boltzmann Machine (RBM) initializes the encoder network by layer. The decoder network can be simply constructed by reversing the encoder and then tweaking it simply.

도 7은 심층 자동 인코더의 학습 과정을 보여주는 도면이다.7 is a diagram showing a learning process of the deep automatic encoder.

자동 인코더 네트워크의 미세 조정 절차에서는, 인공 신경망에 대한 강력한 학습 방법인 오류 역 전파가 사용된다.In the fine tuning procedure of the automatic encoder network, an error back propagation is used, which is a powerful learning method for artificial neural networks.

자동 인코더 네트워크가 구축된 후, 이 예측 모듈의 코드 계층에서 생성된 출력 벡터는 재구성을 위한 충분한 객체 행동 유동성 잠재 정보를 포함한다. 어떤 사람이 어떤 일을 할 의도가 있을 때, 그는 보통 대응하는 객체를 사용한다. 잠재적 코드 벡터는 객체의 동시 발생에 대한 정보를 통합하는데 사용될 수 있다. 따라서, 객체 행동 유동성 정보인 코드는 관련 없는 의도에 대한 억제 효과가 있으며 관련 의도를 높인다.After the automatic encoder network is established, the output vector generated in the code layer of this prediction module contains sufficient object behavioral fluidity potential information for reconstruction. When a person intends to do something, he usually uses the corresponding object. The potential code vectors can be used to integrate information about the co-occurrence of objects. Thus, code, which is object behavior fluidity information, has an inhibitory effect on irrelevant intentions and raises the intent of the intent.

최상위 계층에서 행동 유동성 코드와 관련된 객체가 재구성된다. 노이즈로 인해 원래 벡터에서 객체의 정보가 누락된 경우에도 행동 유동성 코드를 통해 디코딩 프로세스를 통해 정보를 공개할 수 있다. 이 특성은 심층 자동 인코더 모델이 정적인 경우에도 객체의 미래 상태에 대한 예측을 산출할 수 있다. 이는 객체 선택이 순차적으로 발생하고 현재 시간에 관찰되지 않은 누락 정보가 행동 유동성 및 디코딩에 의해 예측 될 수 있기 때문이다.At the top level, the object associated with the behavioral fluidic code is reconstructed. Even if the information of the object in the original vector is missing due to noise, the information can be released through the decoding process through the behavioral fluid code. This property can yield a prediction of the future state of the object even when the deep automatic encoder model is static. This is because the object selection occurs sequentially and missing information that is not observed at the current time can be predicted by behavioral fluidity and decoding.

도 8은 도 2의 OA-SMTRNN의 예측 모듈의 구조를 나타내는 도면이다.8 is a diagram showing a structure of a prediction module of OA-SMTRNN of FIG.

OA-SMTRNN 모델은 시각적 정보를 기반으로 관련 객체를 재구성할 수 있지만 노이즈로 인해 보이는 노드가 올바른 정보를 얻지 못하는 경우가 있다. 이 경우 도 2의 OA-SMTRNN의 행동 모듈로부터의 예측된 객체 레이블은 이 모델이 노이즈 문제에 대해 견고성을 갖도록 도울 수 있다. OA-SMTRNN models can reconstruct related objects based on visual information, but in some cases, visible nodes due to noise do not get the correct information. In this case, the predicted object label from the behavior module of OA-SMTRNN in FIG. 2 may help this model to be robust against the noise problem.

객체 레이블을 예측하는 추가 노드는 동적 바이어스로서 숨겨진 계층에 연결되고, 예측 노드와 숨겨진 계층 간의 가중치가 학습된다. 동시에 다른 가중치는 추가작인 노드에 적합하게 조정되어 적합한 정보를 재구성한다. 상세한 인식-행동 순환 과정은 이하 설명된다.The additional node that predicts the object label is connected to the hidden layer as the dynamic bias, and the weight between the predicted node and the hidden layer is learned. At the same time, the other weights are adjusted to the additional nodes to reconstitute the appropriate information. The detailed recognition-action cycle process is described below.

본 발명에서 제안된 모델의 구조는 도 2에 보여진다. 도 2의 왼쪽에 도시된 행동 모듈은 의도 추론을 위해 골격 시퀀스를 분석할 수 있으며, SMTRNN을 기반으로 모델링된다. 도 2의 오른쪽에 표시된 인식 모듈은 심층 자동 인코더를 사용하여 모델링되어 의미 있는 행동 유동성 정보를 분석하고 추출한다. 객체 레이블은 OA-SMTRNN의 액션 모듈에 의해 예측된 다음 심층 자동 인코더로 공급된다. 위에서 언급되니 바와 같이, 저속 컨텍스트 계층에서 행동에 관련된 객체 레이블을 예측하기 위해 새로운 그룹의 뉴런을 정의했다.The structure of the model proposed in the present invention is shown in Fig. The behavior module shown on the left side of FIG. 2 can analyze the skeleton sequence for intent reasoning and is modeled on the basis of SMTRNN. The recognition module shown on the right of FIG. 2 is modeled using a deep automatic encoder to analyze and extract meaningful behavioral fluidity information. The object label is fed into the next deep automatic encoder predicted by the action module of OA-SMTRNN. As mentioned above, we have defined a new group of neurons to predict the object label associated with the behavior in the low-speed context layer.

OA-SMTRNN의 행동 모듈로부터의 예측 결과를 이용하여, 다음의 수학식 5와 같이 자동 인코더를 향상시킬 수 있다.Using the prediction result from the behavior module of OA-SMTRNN, the automatic encoder can be improved as shown in Equation (5) below.

[수학식 5]&Quot; (5) "

여기서,

는 숨겨진 뉴런 출력이고,

는 보이는 계층과 현재 숨겨진 계층 사이의 연결이며,

는 행동 모듈에서 인식 모듈로의 연결이고, b는 바이어스이다.here,

Is a hidden neuron output,

Is the link between the visible layer and the current hidden layer,

Is the connection from the action module to the recognition module, and b is the bias.

다음의 수학식 6과 같이

의 학습은 여전히 오류 역 전파 규칙을 따를 수 있다.As shown in the following Equation (6)

Learning can still follow the error back propagation rules.

[수학식 6]&Quot; (6) "

여기서,

는 자동 인코더에서 뉴런의 시냅스 이전 값이고

는 인간 행동 이해에 기반한 객체 예측 출력의 시냅스 이후 값이다.here,

Is the pre-synaptic value of the neuron in the auto-encoder

Is the post-synaptic value of the object prediction output based on human behavior understanding.

그런 다음 예측된 객체 레이블은 심층 자동 인코더에 의해 구현된 인식 모듈의 상황 별 바이어스로 볼 수 있다. 그 후, 인식 모듈에 대한 새로운 입력 벡터는 다음의 수학식 7과 같다.The predicted object label can then be viewed as the contextual bias of the recognition module implemented by the deep autocoder. Thereafter, the new input vector for the recognition module is: < EMI ID = 7.0 >

[수학식 7]&Quot; (7) "

여기서 {,}는 벡터 연결을 의미하고, v _original은 심층 자동 인코더의 원래 보이는 노드이며, y _a 는 OA-SMTRNN의 행동 모듈로부터 객체 예측 출력이다.Where {,} denotes the vector connection, v _original is the original visible node of the deep automatic encoder, and y _a is the object prediction output from the behavior module of OA-SMTRNN.

의도 추론을 위한 노드와 객체의 예측 노드가 저속 컨텍스트 계층에 위치하는 동안 행동 예측 신호는 원래 행동 시퀀스만큼 빠르기 때문에 작은 시간 상수를 갖는 입력-출력 계층에서 생성된다. 행동 예측은 또한 학습에 필수적인 역할을 한다. 이미 언급한 바와 같이, 거울 뉴런은 다른 사람들의 행동을 이해하고 모방을 통해 새로운 기술을 습득하는데 중요하다.The behavioral prediction signal is generated in the input-output layer with a small time constant, since the behavioral prediction signal is as fast as the original action sequence while the prediction node of the object and the object's prediction node are located in the low-speed context layer. Behavioral prediction also plays an essential role in learning. As already mentioned, mirror neurons are important in understanding the behavior of others and acquiring new skills through imitation.

OA-SMTRNN의 액션 모듈에서 객체 레이블을 예측하는 방법은 심층 자동 인코더의 맨 아래 계층에 추가적인 가시 노드로 배치한다. 결과적으로, 보이는 계층에서 첫 번째 숨겨진 계층까지의 가중치가 증가한다. The method of predicting the object label in the action module of OA-SMTRNN is placed as an additional visible node in the bottom layer of the deep automatic encoder. As a result, weights from the visible layer to the first hidden layer increase.

따라서, 새로운 시퀀스를 얻으면 OA-SMTRNN의 작업 모듈을 사용하여 현재 및 과거 작업 시퀀스를 기반으로 가능한 객체를 추정한다. 추가 정보는 인식 모듈이 관련 없는 객체를 제외시키고 혼동되거나 잘못된 객체 검출의 경우 객체 레이블을 재구성하는데 도움이 될 수 있다고 가정한다.Thus, when a new sequence is obtained, the task module of OA-SMTRNN is used to estimate possible objects based on current and past task sequences. The additional information assumes that the recognition module can help to exclude unrelated objects and to reconstruct object labels in case of confusing or false object detection.

위에서는 행동에서 인식으로의 개선에 대해 설명하였고, 이하에서는 인식에서 행동으로의 개선에 대해 설명한다.In the above, the improvement from behavior to recognition is explained. In the following, improvement from recognition to action is explained.

MTRNN의 기본 목표는 동적 신호를 예측하는 것이다. 따라서, MTRNN은 각 계층에서 상이한 시상수를 갖는 다수의 CTRNN 층을 포함한다. 작은 시정수(고속 컨텍스트)를 갖는 CTRNN 계층의 정보는 빠르게 변하지만, 저속 컨텍스트 계층은 고속 컨텍스트 계층에 저장된 기본 행동 정보의 순서를 배열한다.The primary goal of MTRNN is to predict dynamic signals. Thus, the MTRNN includes multiple CTRNN layers with different time constants at each layer. The information in the CTRNN layer with a small time constant (fast context) changes rapidly, but the slow context layer arranges the order of basic behavior information stored in the fast context layer.

이전 모델에서 소개된 방법을 사용하여 행동 유동성 모델을 학습한다. 디코더 부분은 심층 자동 부호기를 학습하는데 필요하지만 인식 모듈의 코드 계층 출력, 심층 자동 인코더는 행동 모듈의 저속 컨텍스트 계층에 대한 추가 입력으로 간주된다. 코드와 저속 컨텍스트 계층 간의 가중치는 완전히 연결된다. 코드와 저속 컨텍스트 계층 사이의 가중치는 수학식 1의 Kullback-Leibler 발산 오차를 갖는 역 전파 규칙을 사용하여 수학식 3으로 업데이트된다.Learn the behavioral fluidity model using the methods introduced in the previous model. The decoder portion is required to learn the depth automatic encoder, but the code layer output of the recognition module and the deep automatic encoder are considered as additional inputs to the slow context layer of the behavior module. The weights between the code and the slow context hierarchy are fully concatenated. The weight between the code and the slow context layer is updated to Equation (3) using the back propagation rule with the Kullback-Leibler divergence error of Equation (1).

코드 계층에 저장된 정보는 의도와 관련이 있다(Kim, et al., 2013). 그들은 OA-SMTRNN의 다른 가중치와 함께 훈련받는다. 행동 모델에 대한 행동 유동성 모델의 정보는 도 2에서 컨텍스트 계층을 느리게 만드는 코드에서 화살표로 표시되고, 인간의 의도를 정확하게 인식하는데 필요한 기능을 풍부하게 한다. Information stored in the code layer is related to intention (Kim, et al., 2013). They are trained with different weights of OA-SMTRNN. The behavioral fluidity model information for the behavioral model is indicated by an arrow in the code that makes the context hierarchy slow in Fig. 2, and enriches the functions necessary to accurately recognize human intentions.

객체 행동 유동성에 대한 인식 모듈의 정보는 이미 추상화되어 있으며, 빠르게 변하지 않는다. 인식 모듈의 코드 계층으로부터의 출력은 저속 컨텍스트 계층에 직접 연결된다. 원래, MTRNN과 SMTRNN은 시간을 통한 잠재 상태의 역동성을 기초로 행동 시퀀스를 예측했다. 그러나, 이러한 정보를 통합함으로써 OA-SMTRNN의 의도 분류 기능은 통합 프레임 워크에서 골격 동역학과 객체 합리성을 동시에 고려할 수 있다.The knowledge of the object behavior fluidity module is already abstracted and does not change rapidly. The output from the code layer of the recognition module is directly connected to the low-speed context layer. Originally, MTRNN and SMTRNN predicted action sequences based on the dynamics of latent states over time. However, by integrating this information, the intention classification function of OA-SMTRNN can simultaneously consider skeletal dynamics and object rationality in the integrated framework.

객체 행동 유동성의 인식 모듈은 저속 컨텍스트 계층에 연결되므로, 정보는 행동 시퀀스를 예측하는 입력-출력 계층에 영향을 준다. 따라서, 인식 정보는 모듈이 관련 없는 행동을 쉽게 억제하도록 도와준다. 여러 가지 의도가 비슷한 동적 특성을 공유한다면, 하위 계층의 행동 예측 모듈은 비슷한 출력을 제공한다. 즉, 그들의 특징은 거의 구별되지 않지만, 행동 유동성 인식 모듈의 부호화된 특징 정보는 객체 행동 유동성과 관련된 특징 차원을 삽입함으로써 데이터를 쉽게 분리 할 수 있다. Since the object behavior fluidity awareness module is linked to the low-speed context layer, the information affects the input-output layer that predicts the behavior sequence. Thus, the recognition information helps the module to easily suppress unrelated behavior. If several intents share similar dynamic characteristics, the behavior prediction module of the lower layer provides a similar output. That is, although their characteristics are hardly distinguished, the encoded feature information of the behavioral fluidity recognition module can easily separate data by inserting feature dimensions related to object behavior fluidity.

본 발명은 코드 계층에 저장된 정보가 OA-SMTRNN이 행동 시퀀스를 정확하게 예측하고 의도 추론 성능을 개선하는데 도움이 될 것으로 기대한다.The present invention expects that the information stored in the code layer will help OA-SMTRNN accurately predict the behavior sequence and improve the performance of the inference reasoning.

본 발명은 정적 성질을 가지는 객체간 관계 모델과 동적 성질을 가지는 사용자 행위 인식 모델을 결합하여, 유연함과 정확성을 가지는 신뢰성 있는 사용자 의도 인식 장치를 제공할 수 있다.The present invention can provide a reliable user's intention recognizing device having flexibility and accuracy by combining a relationship model between objects having a static property and a user behavior recognition model having a dynamic property.

도 9는 본 발명의 일 실시예에 따른 행동-인식 연결 학습 기반 의도 이해 방법의 흐름도이다.9 is a flowchart of a behavior-aware connection learning intention understanding method according to an embodiment of the present invention.

본 실시예에 따른 행동-인식 연결 학습 기반 의도 이해 방법은, 도 1의 장치(10)와 실질적으로 동일한 구성에서 진행될 수 있다. 따라서, 도 1의 장치(10)와 동일한 구성요소는 동일한 도면부호를 부여하고, 반복되는 설명은 생략한다. The behavior-aware connection learning intention understanding method according to this embodiment may proceed in substantially the same configuration as the device 10 of FIG. Therefore, the same constituent elements as those of the apparatus 10 of FIG. 1 are denoted by the same reference numerals, and repeated description is omitted.

또한, 본 실시예에 따른 행동-인식 연결 학습 기반 의도 이해 방법은 행동-인식 연결 학습 기반 의도 이해를 수행하기 위한 소프트웨어(애플리케이션)에 의해 실행될 수 있다.Furthermore, the behavior-aware connection learning-based intention understanding method according to the present embodiment can be executed by software (application) for performing behavior-aware connection learning based intention understanding.

도 9를 참조하면, 본 실시예에 따른 행동-인식 연결 학습 기반 의도 이해 방법은, 매 프레임마다 관측되는 사용자 행동의 관절 정보 및 사용자 주변 환경의 객체 정보를 검출한다(단계 S10). Referring to FIG. 9, the behavior-aware connection learning-based intention understanding method according to the present embodiment detects joint information of user behavior observed per frame and object information of the user's surrounding environment (step S10).

이는 가시광선대 영상 센서 및 능동적 적외선 패턴 투사 센서 중 적어도 하나의 정보를 바탕으로, 사용자의 주요 관절의 이차원 또는 삼차원 위치 정보를 실시간으로 수집할 수 있다. 센서의 감지 영역 내에 사용자가 다수일 경우, 각 사용자의 행동 모델링 정보의 간섭을 피하기 위해 얼굴 인식 또는 추적 기능 중 적어도 하나를 수행할 수 있다.This can collect two-dimensional or three-dimensional positional information of the user's major joints in real time based on at least one of the visible light ray image sensor and the active infrared ray pattern projection sensor. If there are a large number of users in the sensing area of the sensor, at least one of face recognition or tracking can be performed to avoid interference of the behavior modeling information of each user.

상기 사용자 행동의 관절 정보는, 척추 중반, 목, 머리, 어깨 왼쪽, 팔꿈치 왼쪽, 손목 왼쪽, 어깨 오른쪽, 팔꿈치 오른쪽, 손목 오른쪽, 엉덩이 왼쪽, 엉덩이 오른쪽 및 어깨(Spine Shoulder) 등의 골격점이 사용될 수 있다.The joint information of the user's actions may be a skeletal point such as midline, neck, head, left shoulder, left elbow, left wrist, right shoulder, right elbow, right wrist, left hip, right hip and shoulder have.

또한, 사용자의 주변 객체 정보를 파악하기 위하여 가시광선대 영상 센서 정보를 바탕으로 사용자 주변 객체 정보를 실시간으로 검출하여 수집한다.Also, in order to grasp the information of the surrounding object of the user, based on the visual sensor information of the visible ray,

사용자 행동의 관절 정보 및 사용자 주변 환경의 객체 정보가 검출되면, 상기 관절 정보 및 상기 객체 정보를 인공 신경망 처리가 가능하도록 전처리한다(단계 S30).When the joint information of the user's behavior and the object information of the user's surrounding environment are detected, the joint information and the object information are preprocessed so as to enable the artificial neural network processing (step S30).

전처리 단계(단계 S30)는 상기 사용자 행동의 관절 정보를 전처리하는 과정과 상기 사용자 주변 환경의 객체 정보를 전처리하는 과정이 별도로 수행될 수 있고, 이 경우 각 과정에서 정규화 단계 및 부호화 단계가 진행될 수 있다.In the preprocessing step (step S30), the process of pre-processing the joint information of the user behavior and the process of preprocessing the object information of the user's environment may be performed separately. In this case, the normalization step and the encoding step may be performed in each step .

상기 사용자 행동의 관절 정보를 전처리하는 단계는 센서의 감지 영역 내에서 획득되는 절대 좌표 기반의 주요 관절 정보를 사용자 별 상대 좌표 표현으로 정규화하고, 정규화된 상대 좌표 표현을 신경망 친화적인 방식으로 표현하는 부호화할 수 있다. 예를 들어, 부호화 단계는 자가 생성 맵(SOM; self-organizing map)을 사용할 수 있다.Wherein the step of pre-processing the joint information of the user's action comprises: normalizing the joint-based information of the absolute coordinates based on the relative coordinates obtained by the user in the sensing area of the sensor, encoding the relative coordinate- can do. For example, the encoding step may use a self-organizing map (SOM).

상기 사용자 주변 환경의 객체 정보를 전처리하는 단계는 수신한 영상 정보로부터 상기 객체 정보를 추출하고, 사용자가 손으로 집은 객체의 레이블을 송신할 수 있다.The preprocessing of the object information of the user environment may extract the object information from the received image information, and may transmit the label of the object that the user holds by hand.

상기 전처리 단계에서 전처리되어 출력된 상기 관절 정보 및 상기 객체 정보를 기초로, 사용자의 행동 정보를 분류하는 행동 인식 처리 단계를 수행한다(단계 S50). A behavior recognition processing step of classifying the behavior information of the user based on the joint information and the object information, which are pre-processed and output in the pre-processing step, is performed (step S50).

상기 행동 인식 처리 단계(단계 S50)는, 상기 전처리 단계에서 출력된 상기 관절 정보 및 상기 객체 정보를 회귀 신경망을 통해 모델링하여, 행동 인식이 가능하도록 인식 전용 노드를 지정한다. 또한, 실시간 처리 단계 및 시험 단계에서 주어진 입력에 대응하여 출력되는 인식 전용 노드를 학습할 때, 사용자 행동 벡터들과 비교하여 가장 가까운 행동을 추출한다.In the behavior recognition processing step (step S50), the joint information and the object information output from the preprocessing step are modeled through a regression neural network to designate a recognition dedicated node so that the behavior recognition can be performed. In addition, when learning a recognition-only node output corresponding to a given input in the real-time processing step and the testing step, the closest behavior is extracted by comparing with the user behavior vectors.

상기 전처리 단계에서 전처리되어 출력된 객체 정보 및 상기 행동 인식 처리 단계에서 출력된 행동 정보를 이용하여, 사용자의 행동과 관련된 객체 후보군을 지속적으로 출력하는 객체 관계 정보 처리 단계를 수행한다(단계 S70).In step S70, the object relation information processing step is continuously performed to output the object candidate group related to the user's behavior using the object information, which has been pre-processed and output in the pre-processing step, and the behavior information output in the behavior recognition processing step.

상기 전처리 단계에서 전처리되어 출력된 객체 정보 및 상기 행동 인식 처리 단계에서 출력된 행동 정보를 자가 부호화망을 통해 객체 관계를 모델링하여, 객체 자가 부호화 결과를 출력한다.The object information preprocessed and output in the pre-processing step and the behavior information output from the behavior recognition processing step are modeled through the self-encoding network, and the object self-encoding result is output.

상기 행동 인식 처리 단계에서 출력된 행동 정보 및 상기 객체 관계 정보 처리 단계에서 출력된 객체 후보군을 입력으로 하는 인공 신경망을 통해 사용자 의도 인식 결과를 최종적으로 출력하는 의도 출력 단계를 수행한다(단계 S90).An intention output step of finally outputting the user's intention recognition result through the artificial neural network inputting the behavior information output in the behavior recognition processing step and the object candidate group output from the object relationship information processing step is performed in step S90.

즉, 상기 행동 인식 처리 단계 및 상기 객체 관계 정보 처리 단계의 출력을 입력으로 하는 인공 신경망 혹은 회귀 신경망으로, 행동 및 객체 관계 정보를 종합하여 사용자 의도 인식 결과만을 출력한다.That is, only the user's intention recognition result is synthesized by integrating behavior and object relationship information into an artificial neural network or a regression neural network that receives the outputs of the behavior recognition processing step and the object relation information processing step as inputs.

이에 따라, 유연함과 정확성을 가지는 신뢰성 있는 사용자 의도 결과를 출력할 수 있고, 이를 이용하여 사람의 의도를 인식하거나 예측할 필요가 있는 모든 영역에서 핵심 기반기술로서 사용될 수 있다.As a result, reliable user intention results with flexibility and accuracy can be output, and can be used as a core underlying technology in all areas where it is necessary to recognize or predict a user's intention.

이와 같은, 행동-인식 연결 학습 기반 의도 이해 방법은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. Such behavior-aware connection learning intention understanding methods can be implemented in an application or implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program commands, data files, data structures, and the like, alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. The program instructions recorded on the computer-readable recording medium may be ones that are specially designed and configured for the present invention and are known and available to those skilled in the art of computer software.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include machine language code such as those generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules for performing the processing according to the present invention, and vice versa.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the present invention as defined by the following claims. You will understand.

본 발명은 사람의 의도를 인식하거나 예측할 필요가 있는 모든 영역에서 핵심 기반기술로서 사용될 수 있다. 예를 들면 스마트 홈, IoT 환경 등 수 많은 신성장형 사업에 필수적인 데이터 분석 기술로서, 다양한 산업에서의 경쟁력을 확보 할 수 있다. 이에, 본 발명은 IT 및 Human Computer Interface(HCI), 신호처리 등의 분야에 활용 될 수 있는 핵심 원천 기술이며, 스마트홈, IoT, 헬스케어, 보안 등에 유용하게 적용될 수 있다.The present invention can be used as a core infrastructure technology in all areas where it is necessary to recognize or predict human intentions. For example, it can be a data analysis technology essential for many new growth businesses such as smart home and IOT, and it can secure competitiveness in various industries. Accordingly, the present invention is a core source technology that can be utilized in fields such as IT and Human Computer Interface (HCI) and signal processing, and can be applied to smart home, IoT, healthcare, security, and the like.

10: 행동-인식 연결 학습 기반 의도 이해 장치
100: 입력부
300: 전처리부
310: 관절 정보 전처리부
330: 객체 정보 전처리부
500: 처리부
510: 행동 인식 처리부
530: 객체 관계 정보 처리부
550: 의도 출력부10: Behavior-aware connection learning-based intent understanding device
100: Input unit
300:
310: Joint information preprocessing section
330: object information preprocessing unit
500:
510: Behavior recognition processor
530: Object relation information processor
550: intention output section

Claims

An input unit for detecting joint information of user behavior observed every frame and object information of a user's surrounding environment;
A joint information preprocessing unit for normalizing and encoding the joint information received from the input unit; and an artificial neural network processing unit for detecting a label of specific object information selected by a user in a current frame among a plurality of object information constituting the user peripheral environment A preprocessing unit configured to preprocess the object information;
A behavior recognition processing unit for classifying the behavior information of the user in the current frame based on the joint information and the specific object information output from the preprocessing unit;
An object relation modeling unit for modeling the object information through the auto-encoder network on the specific object information selected by the user in the current frame output from the preprocessing unit and the behavior information of the user in the current frame output from the behavior recognition processing unit, An object relationship information processor for estimating an object predicted to be selected by a user in a next frame among the plurality of object information and outputting the estimated object as an object candidate group; And
And an intention output unit for outputting a user's intention recognition result through an artificial neural network inputting behavior information output from the behavior recognition processing unit and an object candidate group output from the object relationship information processing unit,
The object relationship information processing unit,
Understanding the behavior-aware connection learning intention that learns the weight between the prediction node and the hidden layer using the object candidate group and reconstructs the missing object by rearranging the remaining weights excluding the weight between the prediction node and the hidden layer according to the learning result Device.

2. The apparatus according to claim 1,
Wherein the two-dimensional or three-dimensional position information of the user's major joints is collected in real time on the basis of information of at least one of the visible light ray image sensor and the active infrared ray pattern projection sensor.

3. The apparatus according to claim 2,
Wherein the at least one of the face recognition or tracking function is performed to avoid interference of the behavior modeling information of each user when there are a plurality of users in the sensing area.

delete

The apparatus according to claim 1, wherein the joint information pre-
A normalization unit for normalizing major joint information based on an absolute coordinate obtained in a sensing region of the input unit to a relative coordinate representation for each user; And
And an encoding unit for expressing the normalized relative coordinate representation using a neural network input method using a self-organizing map (SOM).

delete

The apparatus according to claim 1,
A behavior-aware connection learning based intention understanding device for modeling the joint information and the specific object information output from the preprocessing part through a regression neural network to designate a recognition only node so that behavior recognition can be performed.

8. The apparatus according to claim 7,
A behavior-aware connection learning intention understanding device for extracting the closest behavior compared with user behavior vectors when learning a recognition-only node output corresponding to a given input in a real-time processing step and a test step.

delete

The method according to claim 1,
The joint information of the user's action includes at least one skeletal point of the mid-spine, neck, head, left shoulder, left elbow, left wrist, right shoulder, elbow right, right wrist, left hip, right hip and shoulder A behavior - aware connection learning - based intent understanding device used.

Behavior-aware Connected Learning Understanding the Intention Behavior-aware Connected Learning Performed by the Device In learning-based intention understanding methods,
The behavior-aware connection learning-based intention understanding device comprising: detecting joint information of user behavior observed every frame and object information of a user's surrounding environment;
Wherein the behavior-aware connection learning-based intention understanding device normalizes and encodes the joint information, detects a label of a specific object information selected by a user in a current frame among a plurality of object information items constituting the user's peripheral environment, A preprocessing step for preprocessing to enable the preliminary processing;
A behavior recognition processing step of classifying the behavior information of the user in the current frame based on the joint information and the specific object information output from the preprocessing step;
The behavior-aware connection learning intention understanding device may use the behavior information output from the behavior information recognition step and the specific object information output in the pre-processing step to select a user in the next frame of the plurality of object information An object relation information processing step of estimating a predicted object and outputting the estimated object as an object candidate group; And
The behavior-aware connection learning-based intention understanding device includes an intention to output a user's intention recognition result through an artificial neural network inputting behavior information output from the behavior recognition processing step and an object candidate group output from the object relationship information processing step Output stage,
In the object relation information processing step, the weight between the prediction node and the hidden layer is learned using the object candidate group, and the missing object is reconstructed by re-adjusting the remaining weights excluding the weight between the prediction node and the hidden layer according to the learning result Further comprising a behavior-aware connection learning-based intent understanding method.

12. The method of claim 11, wherein detecting the object information of the user environment comprises:
Based learning learning based on the knowledge of at least one of a visible light ray image sensor, an active infrared ray image sensor, and an active infrared ray pattern projection sensor to collect two-dimensional or three-dimensional position information of a user's major joint in real time.

13. The method of claim 12, wherein detecting the object information of the user environment comprises:
Wherein the at least one of the face recognition or tracking function is performed to avoid interference of the behavior modeling information of each user when there are a plurality of users in the sensing area.

12. The method according to claim 11,
Normalizing the major joint information based on the absolute coordinates obtained in the sensing region into a relative coordinate representation per user; And
And a coding step of representing the normalized relative coordinate representation using a neural network input method using a self-organizing map (SOM).

delete

12. The method according to claim 11,
A behavior-aware connection learning based intention understanding method for modeling the joint information and the object information output from the preprocessing step through a regression neural network to designate a recognition only node so that the behavior recognition can be performed.

18. The method according to claim 17,
A behavior-aware connection learning intention understanding method for extracting the closest behavior by comparing with user behavior vectors when learning a recognition-only node output corresponding to a given input in a real-time processing step and a testing step.

delete

12. A computer-readable recording medium on which a computer-readable recording medium for performing a behavior-aware connection learning based intention understanding method according to claim 11 is recorded.