KR20210101570A

KR20210101570A - Method and Apparatus of Avatar Placement in Remote Space from Telepresence

Info

Publication number: KR20210101570A
Application number: KR1020200015686A
Authority: KR
Inventors: 이성희; 윤영호; 양동석; 정충호; 김재현
Original assignee: 한국과학기술원
Priority date: 2020-02-10
Filing date: 2020-02-10
Publication date: 2021-08-19
Also published as: KR102332074B1

Abstract

Disclosed are a method and apparatus of placing an avatar in a remote space for telepresence. A method of placing avatar in a remote space according to an embodiment of the present invention includes the steps of: collecting user preference data, which is placement data preferred by people, through a user preference survey; defining a similarity between a person and an avatar between corresponding points for two spaces by learning the collected user preference data with a deep neural network-based neural network; and placing the user's avatar in a remote space based on the similarity between the person and the avatar between corresponding points for the two defined spaces.

Description

{Method and Apparatus of Avatar Placement in Remote Space from Telepresence}

본 발명은 텔레프레즌스를 위한 원격 공간 아바타 배치 기술에 관한 것으로, 보다 상세하게는 서로 다른 곳에 거주하는 사람이 마치 함께 생활하는 듯한 텔레프레즌스를 제공하기 위하여, 실제 공간에 위치한 사람의 맥락과 의미를 보존하는 원격 공간의 대응점에 아바타를 배치할 수 있는 원격 공간 아바타 배치 방법 및 그 장치에 관한 것이다.The present invention relates to a remote space avatar arrangement technology for telepresence, and more particularly, to provide telepresence as if people living in different places are living together in a remote space that preserves the context and meaning of a person located in a real space. The present invention relates to a method and apparatus for disposing an avatar in a remote space capable of disposing an avatar at a corresponding point in space.

급속하게 발전하는 기술은 점차적으로 3D 텔레프레즌스 경험을 실현하고 있는데, 이 경험은 사용자가 로컬 물리적 공간에 오버레이된 원격 사용자의 이미지인 자신의 가상 아바타를 통해 원격 사용자와 상호작용을 할 수 있는 것으로, 코프레즌스(copresence)을 현저하게 향상시킨다. 양방향 통신을 위해, 로컬 사용자는 원격 사용자가 볼 수 있는 아바타로 표현되기도 한다.Rapidly advancing technologies are increasingly enabling 3D telepresence experiences, which allow users to interact with remote users through their virtual avatars, which are images of the remote user overlaid on local physical space. Significantly improves presence. For two-way communication, the local user is sometimes represented as an avatar that the remote user can see.

3D 텔레프레즌스의 한 가지 시나리오는 사용자가 원격 사용자와 똑같이 움직이는 가상 아바타를 사용하여 헤드 마운트 디스플레이(HMD)를 착용하는 것이다. 이것은 원격 사용자의 모양을 캡처하고 직접 가상 아바타로 재구성함으로써 달성할 수 있다. 이 시나리오는 두 원격 공간의 구성이 동일할 경우 효과가 좋다. 그렇지 않으면 공간의 차이가 아바타의 모션을 무효화시킬 수 있다. 즉 아바타는 공중에 앉거나 가구에 침투할 수 있다. 인간의 생활공간이 다양하기 때문에 두 공간이 다를 때, 원격 환경에서 수행되는 사용자의 모션은 로컬 환경의 구성에 따라 수정되어야 하고, 로컬 사용자가 자신의 아바타를 통해 원격 사용자가 무엇을 하고 있는지 제대로 인식하고 그녀의 아바타와 상호작용할 수 있어야 한다.One scenario for 3D telepresence is for a user to wear a head-mounted display (HMD) with a virtual avatar that moves just like a remote user. This can be achieved by capturing the remote user's appearance and reconstructing it directly into a virtual avatar. This scenario works well if both remote spaces have the same configuration. Otherwise, spatial differences may negate the motion of the avatar. That is, the avatar can sit in the air or penetrate furniture. Because human living spaces are diverse, when the two spaces are different, the user's motion performed in the remote environment must be modified according to the configuration of the local environment, and the local user can properly recognize what the remote user is doing through his or her avatar and be able to interact with her avatar.

이 문제는 컴퓨터 애니메이션 연구의 모션 리타겟팅(retargeting) 문제의 일종이다. 기존의 모션 리타겟팅 문제는 주로 성격이나 환경 기하학의 변화를 다루지만, 텔레프레즌스 어플리케이션에서 리타겟팅은 사용자의 모션, 주의 및 주변 사람이나 오브젝트와의 관계를 포함하여 실제 생활에서 사람의 모션의 다양한 요소를 고려해야 한다. 보다 높은 수준에서 아바타 애니메이션의 자연성, 상호작용 자원의 가용성과 접근성의 보장, 사용자 몰입과 같은 다른 중요한 측면도 고려해야 한다. 몇몇 잠재적 속성을 인식하고 여러 요소들 사이에서 타협하기 어렵기 때문에 모션의 의미를 가장 잘 보존하기 위한 움직임의 오브젝트를 재지정하는 문제는 도전적인 과제이다.This problem is a kind of motion retargeting problem in computer animation research. While traditional motion retargeting problems mainly deal with changes in personality or environmental geometry, in telepresence applications, retargeting can take into account the various elements of a person's motion in real life, including the user's motion, attention, and relationship with surrounding people or objects. should be considered At a higher level, other important aspects should also be considered, such as the naturalness of the avatar animation, ensuring the availability and accessibility of interactive resources, and user immersion. The problem of reassigning objects in motion to best preserve the meaning of motion is a challenging task, as it is difficult to recognize some potential properties and compromise between multiple elements.

최근의 기술 발전은 두 개의 원격 공간을 하나로 합치는 양방향 몰입 텔레프레즌스를 용이하게 한다. 종래 일 실시예 기술은 원격 사용자의 이미지를 로컬 공간에 중첩하는 HMD 기반 텔레프레즌스 시스템을 제안하였으며, 종래 다른 일 실시예 기술은 원격 사용자와 오브젝트의 3D 데이터를 로컬 공간으로 전송하였다. 이러한 연구들은 원격 사용자를 포착하고 지정된 로컬 공간에서 재구성하는데 초점을 맞추고 있지만, 두 공간의 차이에서 발생하는 문제는 고려하지 않았다.Recent technological advances facilitate interactive immersive telepresence that merges two remote spaces into one. A conventional technology according to an embodiment proposes an HMD-based telepresence system that superimposes an image of a remote user in a local space, and another conventional technology according to another embodiment transmits 3D data of a remote user and an object to the local space. Although these studies focus on capturing remote users and reconstructing them in a designated local space, they do not consider the problems arising from the difference between the two spaces.

이 문제는 부분적으로 통신이 수동으로 지정된 특정 로컬 위치 예를 들어, 소파 대 소파에서 원격 사용자의 이미지를 투영하는 시스템을 개발한 일 실시예의 기술에 의해 다루어졌다. 종래 다른 일 실시예의 기술은 두 개의 공간에 오브젝트 간의 통신을 구성하여 아바타가 근처에 위치해야 하는 적절한 가구 오브젝트를 선택하는 방법을 제안하고 오브젝트의 다른 모양에 맞게 신체 포즈를 조정하였다. 종래 또 다른 일 실시예의 기술은 자유 공간에 있는 원격 사람이 자유 공간에 전송될 수 있도록 자유 공간들을 최대한 중첩하도록 엄격하게 배치하는 방법을 제안하였다. 단, 두 자유 공간이 유사한 경우에만 엄격한 배치 접근법이 유효하다.This problem has been addressed in part by the techniques of one embodiment that have developed a system for projecting images of remote users at specific local locations where the communication is manually assigned, eg, sofa-to-sofa. In the related art, another embodiment of the present invention proposes a method of selecting an appropriate furniture object to which an avatar should be located by configuring communication between objects in two spaces, and adjusting body poses according to different shapes of objects. Another conventional technique of another embodiment proposes a method of strictly arranging free spaces to overlap as much as possible so that a remote person in the free space can be transmitted to the free space. However, the strict layout approach is valid only when the two free spaces are similar.

공간 차이로 인한 문제는 VR에서도 자주 발생한다. 사용자가 더 좁은 물리적 공간 내에서 큰 가상 장면을 거닐 수 있도록 하기 위해, 연구자들은 사용자가 보행 방향을 변경하도록 유도하거나 물리적 공간에 맞게 가상 공간을 최적으로 왜곡시키기 위하여 가상 공간을 미세하게 회전시키는 기법을 개발하였다.Problems caused by spatial differences often occur in VR. To allow a user to walk through a large virtual scene within a narrower physical space, researchers have developed a technique to micro-rotate the virtual space in order to induce the user to change the direction of walking or to optimally distort the virtual space to fit the physical space. developed.

인간 움직임 리타겟팅의 문제는 기존의 인간의 모션이 다른 상황에 맞게 수정된다는 점에서 모션 리타겟팅 문제와 같은 목표를 공유한다. 초기 방법들은 대부분 다른 신체 치수 또는 다양한 환경을 가진 새로운 캐릭터에 대해 주어진 모션을 수정하는 문제를 해결하였다. 두 사람 사이의 상호작용 모션이나 사람과 오브젝트 사이의 상호작용 모션에 관해서, 리타겟팅된 모션은 두 사람 사이의 상호작용 의미들이 보존되어야만 의미가 있을 수 있다. 이를 위해 많은 접근방식은 상호작용하는 실체 사이의 공간적 관계를 정의하였다.The problem of human motion retargeting shares the same goal as the motion retargeting problem in that the existing human motion is modified for different situations. Early methods mostly solved the problem of modifying a given motion for new characters with different body dimensions or different environments. With respect to an interactive motion between two people or an interactive motion between a person and an object, a retargeted motion can be meaningful only if the meanings of the interaction between the two people are preserved. To this end, many approaches have defined spatial relationships between interacting entities.

일상적인 시나리오에서 사람의 배치를 다른 환경으로 리타겟팅하는 것도 해당 위치의 의미를 보존해야 한다. 이전의 모션 리타겟팅 연구와 비교하여, 이 문제는 상술한 바와 같이 인간의 모션에 관련된 훨씬 더 많은 요소들을 고려해야 한다. Retargeting a placement of a person to another environment in a day-to-day scenario should also preserve the meaning of that location. Compared with previous motion retargeting studies, this problem has to consider much more factors related to human motion as described above.

장면에서 캐릭터 배치와 행동의 문제를 해결하기 위해서는 컴퓨터 그래픽 연구에 대한 관심이 높아지고 있는 주변 사람들과 주변 환경과 관련하여 인간의 공간 사용과 행동을 대변할 필요가 있다.In order to solve the problem of character placement and behavior in the scene, it is necessary to represent the human space use and behavior in relation to the surrounding people and the surrounding environment, which is increasingly interested in computer graphics research.

종래 일 실시예 기술은 단일 실내 이미지에서 포즈 세트를 3D 기하학에 맞추기만 하면 장면에서 사용 가능한 3D 기하학과 인간의 포즈를 모두 추정하였다. 다른 종래 일 실시예 기술은 3D 장면에서 행동의 가능성을 예측하는 것을 배우기 위해 실제 인간과 오브젝트와의 상호작용을 관찰하였다. 또 다른 종래 일 실시예 기술은 원하는 모션 입력에 해당하는 인간의 포즈와 가구 배치를 모두 생성할 수 있는 인간의 포즈 속성을 오브젝트의 기하학 및 배치와 연결하는 인간 중심의 상호작용 표현을 개발하였다.The conventional technique of one embodiment estimates both the 3D geometry and the human pose usable in the scene by simply fitting the pose set to the 3D geometry in a single indoor image. Another prior art embodiment technique observes real human interaction with objects in order to learn to predict the likelihood of an action in a 3D scene. Another prior art embodiment technique has developed a human-centered interactive expression that connects the human pose attributes, which can generate both a human pose and a furniture arrangement corresponding to a desired motion input, with the geometry and arrangement of an object.

오브젝트 상호작용 수준에서, 연구원들은 오브젝트 허용량 또는 기능을 나타내고 추정하는 다양한 방법을 개발하였다. 종래 일 실시예 기술은 주어진 장면의 맥락에서 오브젝트 간 상호작용에서 파생된 3D 오브젝트의 기능에 대한 기하학적 설명을 제안하였다. 종래 다른 일 실시예 기술은 3D 형태에서 접촉 가능한 로컬 형태를 식별하고 허용 가능한 인간의 포즈를 검색하여 3D 형태에 대한 인간의 포즈를 예측하는 방법을 개발하였다.At the object interaction level, researchers have developed various methods for representing and estimating object tolerance or function. A prior art embodiment technique has proposed a geometric description of the function of a 3D object derived from the interaction between objects in the context of a given scene. According to another conventional technique, a method for predicting a human pose for a 3D shape by identifying a contactable local shape in a 3D shape and searching for an acceptable human pose has been developed.

대인간 거리(interpersonal distance)는 인간 배치에서 또 다른 중요한 요소이다. 공간학은 자기중심 거리에 따른 공간 사용에서의 인간의 행동에 대한 연구로, 많은 연구에서 공간학이 사람과 가상 아바타 사이에 유효하다고 보고하고 있다. 대인간의 거리 외에도, 종래 일 실시예에 따른 기술은 인간영토에 관한 이론에서 영감을 받고 사회적 상호작용에서 그룹 역학을 모델링하는 데 있어서 에이전트 방향의 중요성을 강조하였다.Interpersonal distance is another important factor in human placement. Spatial science is the study of human behavior in space use according to egocentric distance, and many studies have reported that spatial science is effective between people and virtual avatars. In addition to interpersonal distance, the prior art according to an embodiment is inspired by the theory of human territory and emphasizes the importance of agent orientation in modeling group dynamics in social interaction.

한 장면 내에서 여러 가지 요소를 고려하여 모션을 계획하고 생성하는 것은 복잡한 작업이며, 컴퓨터 그래픽에서는 광범위하게 연구되지 않았다. 이러한 종류의 주목할 만한 최근 연구는 가구와 실내에서 일반적인 시연 작업을 위해 운동, 신체 위치, 행동 실행 및 시선 행동을 포함하는 전신 모션을 합성하였다. 그럴듯한 움직임을 만들기 위해, 종래 기술은 시각 제약, 운동 접근성, 장애물 중 행동 타당성 등 다양한 조건을 고려하였다.Planning and generating motion within a scene taking into account multiple factors is a complex task and has not been extensively studied in computer graphics. Notable recent studies of this kind have synthesized whole-body motions, including movement, body position, action execution, and gaze behavior, for general demonstration tasks in furniture and indoors. In order to make a plausible movement, the prior art considers various conditions such as visual constraints, movement accessibility, and behavioral validity among obstacles.

또한, 종래 일 실시예에 따른 기술인 한국등록특허 제10-1659849호와 미국등록특허 US9843772 B2은 본 발명의 출원인에 의해 출원된 기술로, 아바타를 이용한 텔레프레즌스 방법을 제공하고 있으며, 각각의 공간 정보, 각 공간에서의 사용자 위치 정보와 사용자의 시선 정보를 반영하여 사용자의 모션에 대응하는 아바타의 동작하는 생성하는 기술이다. 여기서, 해당 기술은 공간을 패치로 나누고, 패치 단위로 아바타의 위치를 결정하며, 공간 유사도를 측정하기 위한 요소로 사람-아바타의 거리 및 각도, 패치의 부류 정보 및 기하 정보만을 사용하고, 서로 다른 공간의 환경에 적응적인 모션 변경 기능을 가지고 있다.In addition, Korean Patent No. 10-1659849 and US Patent No. US9843772 B2, which are technologies according to an embodiment of the prior art, are technologies applied by the applicant of the present invention, and provide a telepresence method using an avatar, and each spatial information, It is a technology for generating an avatar that responds to the user's motion by reflecting the user's location information and the user's gaze information in each space. Here, the technology divides the space into patches, determines the location of the avatar in units of patches, and uses only the distance and angle of the person-avatar, and class information and geometric information of the patch as factors for measuring the spatial similarity, and It has a motion change function that is adaptive to the environment of the space.

하지만, 종래 기술은 사람과의 상호작용 및 상황에 따른 주요 특징 중요도 변화를 고려하지 않고 있다.However, the prior art does not consider changes in the importance of key features according to interactions with people and situations.

본 발명의 실시예들은, 서로 다른 곳에 거주하는 사람이 마치 함께 생활하는 듯한 텔레프레즌스를 제공하기 위하여, 실제 공간에 위치한 사람의 맥락과 의미를 보존하는 원격 공간의 대응점에 아바타를 배치할 수 있는 원격 공간 아바타 배치 방법 및 그 장치를 제공한다.Embodiments of the present invention provide a remote space in which an avatar can be placed at a corresponding point in a remote space that preserves the context and meaning of a person located in a real space in order to provide telepresence as if people living in different places live together. An avatar arrangement method and apparatus are provided.

본 발명의 일 실시예에 따른 원격 공간 아바타 배치 방법은 사용자 선호도 조사를 통해 사람들이 선호하는 배치 데이터인 사용자 선호도 데이터를 수집하는 단계; 상기 수집된 사용자 선호도 데이터를 심층 신경망 기반의 뉴럴 네트워크로 학습하여 두 공간에 대한 대응점 간의 사람과 아바타의 유사도를 정의하는 단계; 및 상기 정의된 두 공간에 대한 대응점 간의 사람과 아바타의 유사도에 기초하여 사용자의 아바타를 원격 공간에 배치하는 단계를 포함한다.A remote space avatar arrangement method according to an embodiment of the present invention includes: collecting user preference data, which is arrangement data preferred by people, through a user preference survey; defining a similarity between a person and an avatar between corresponding points for two spaces by learning the collected user preference data with a deep neural network-based neural network; and arranging the user's avatar in the remote space based on the degree of similarity between the person and the avatar between the defined corresponding points for the two spaces.

상기 사용자 선호도 데이터는 사람과 아바타 간의 상대적인 거리와 시선 방향, 사람이 위치한 곳 주변의 높이 값, 사람이 위치한 곳 주변의 가구 종류 및 거리의 합, 사람의 정면 시야에 보이는 가구 종류 및 거리의 합, 사람의 착석여부 중 적어도 하나를 포함할 수 있다.The user preference data includes the relative distance and gaze direction between the person and the avatar, the height value around the location where the person is located, the sum of the types and distances of the furniture around the location where the person is located, the sum of the types of furniture and the distance seen in the front view of the person, It may include at least one of whether a person is seated.

상기 유사도를 정의하는 단계는 타인이나 오브젝트 간의 상호 작용 특징, 사람의 포즈에 대한 포즈 수용 특징 및 주변 공간의 기능적 특징을 특정짓는 공간적 특징을 포함하는 특징 모델링을 수행하고, 상기 특징 모델링에 기초하여 두 공간에 대한 배치 특징 사이의 유사도를 학습함으로써, 상기 두 공간에 대한 대응점 간의 사람과 아바타의 유사도를 정의할 수 있다.In the step of defining the degree of similarity, feature modeling is performed, including spatial features that specify interaction features between other people or objects, pose acceptance features for human poses, and functional features of the surrounding space, and based on the feature modeling, two By learning the similarity between the disposition features for the space, the similarity between the person and the avatar between the corresponding points for the two spaces can be defined.

상기 뉴럴 네트워크는 상기 두 공간 사이의 차이점 또는 거리의 비선형 특성을 학습하는 신경망을 포함할 수 있다.The neural network may include a neural network that learns a nonlinear characteristic of a difference or distance between the two spaces.

상기 배치하는 단계는 상기 아바타의 움직임을 정적 배치, 준정상적 이동, 이동의 세 가지 상태로 구성된 유한 상태 기계로 모델링하고, 상기 모델링된 유한 상태 기계를 이용하여 사용자의 아바타를 원격 공간에 배치할 수 있다.The disposing may include modeling the movement of the avatar as a finite state machine comprising three states of static placement, quasi-normal movement, and movement, and placing the user's avatar in a remote space using the modeled finite state machine. have.

상기 배치하는 단계는 상기 아바타의 배치가 끝나면 상기 아바타의 배치 상태를 준정상적 이동 상태로 전환하고, 상기 사용자가 걷기 시작하면 상기 아바타의 상태를 이동 상태로 전환하며, 상기 사용자가 정지하면 상기 아바타의 상태를 정적 배치 상태로 전환할 수 있다.In the disposing of the avatar, when the disposition of the avatar is finished, the disposition state of the avatar is changed to a quasi-normal moving state, the state of the avatar is changed to the moving state when the user starts walking, and when the user stops, the avatar is moved. The state can be switched to a static batch state.

본 발명의 일 실시예에 따른 원격 공간 아바타 배치 장치는 사용자 선호도 조사를 통해 사람들이 선호하는 배치 데이터인 사용자 선호도 데이터를 수집하는 수집부; 상기 수집된 사용자 선호도 데이터를 심층 신경망 기반의 뉴럴 네트워크로 학습하여 두 공간에 대한 대응점 간의 사람과 아바타의 유사도를 정의하는 정의부; 및 상기 정의된 두 공간에 대한 대응점 간의 사람과 아바타의 유사도에 기초하여 사용자의 아바타를 원격 공간에 배치하는 배치부를 포함한다.A remote space avatar arrangement apparatus according to an embodiment of the present invention includes: a collecting unit for collecting user preference data, which is arrangement data preferred by people, through a user preference survey; a definition unit for learning the collected user preference data with a deep neural network-based neural network to define a degree of similarity between a person and an avatar between corresponding points for two spaces; and an arrangement unit for disposing the user's avatar in the remote space based on the degree of similarity between the person and the avatar between corresponding points for the two defined spaces.

상기 정의부는 타인이나 오브젝트 간의 상호 작용 특징, 사람의 포즈에 대한 포즈 수용 특징 및 주변 공간의 기능적 특징을 특정짓는 공간적 특징을 포함하는 특징 모델링을 수행하고, 상기 특징 모델링에 기초하여 두 공간에 대한 배치 특징 사이의 유사도를 학습함으로써, 상기 두 공간에 대한 대응점 간의 사람과 아바타의 유사도를 정의할 수 있다.The definition unit performs feature modeling including spatial features specifying interaction features between other people or objects, pose acceptance features for a person's pose, and functional features of the surrounding space, and arrangement for two spaces based on the feature modeling By learning the similarity between the features, it is possible to define the similarity between the person and the avatar between the corresponding points for the two spaces.

상기 배치부는 상기 아바타의 움직임을 정적 배치, 준정상적 이동, 이동의 세 가지 상태로 구성된 유한 상태 기계로 모델링하고, 상기 모델링된 유한 상태 기계를 이용하여 사용자의 아바타를 원격 공간에 배치할 수 있다.The arrangement unit may model the movement of the avatar as a finite state machine having three states of static placement, quasi-normal movement, and movement, and may place the user's avatar in a remote space using the modeled finite state machine.

상기 배치부는 상기 아바타의 배치가 끝나면 상기 아바타의 배치 상태를 준정상적 이동 상태로 전환하고, 상기 사용자가 걷기 시작하면 상기 아바타의 상태를 이동 상태로 전환하며, 상기 사용자가 정지하면 상기 아바타의 상태를 정적 배치 상태로 전환할 수 있다.When the arrangement of the avatar is finished, the arrangement unit changes the arrangement state of the avatar to a semi-normal moving state, changes the state of the avatar to the moving state when the user starts walking, and changes the state of the avatar to the moving state when the user stops. You can switch to a static batch state.

본 발명의 실시예들에 따르면, 서로 다른 곳에 거주하는 사람이 마치 함께 생활하는 듯한 텔레프레즌스를 제공하기 위하여, 실제 공간에 위치한 사람의 맥락과 의미를 보존하는 원격 공간의 대응점에 아바타를 배치할 수 있다.According to embodiments of the present invention, in order to provide telepresence as if people living in different places are living together, the avatar may be arranged at a corresponding point in a remote space that preserves the context and meaning of a person located in the real space. .

본 발명의 실시예들에 따르면, 원격 공간의 사용자들간 몰입적 소통을 가능하게 하므로, 직접 만나기 위해 이동하는 시간과 비용, 에너지 소비를 줄일 수 있고, 따라서 경제적, 환경적으로 이득이 높일 수 있다.According to embodiments of the present invention, since immersive communication between users in a remote space is enabled, it is possible to reduce the time, cost, and energy consumption of traveling to meet in person, thereby increasing economic and environmental benefits.

본 발명의 실시예들에 따르면, 실제 공간에 아바타를 배치함으로써, 사용자의 실제 공간에서 원격 공간의 사용자와의 공존 및 상호작용을 가능하게 한다.According to embodiments of the present invention, by arranging the avatar in the real space, the user's coexistence and interaction with the user in the remote space are enabled in the real space.

본 발명은 서로 떨어져 거주하는 사람들에게 공존감을 주는 텔레프레즌스 어플리케이션에 활용 될 수 있으며, 이외에도 원격 회의, 교육, 소셜 이벤트 등 다양한 상황에서 물리적 거리를 극복하고 소통하는 시스템에 응용 될 수 있다.The present invention can be applied to a telepresence application that gives a sense of coexistence to people living apart from each other, and can also be applied to a system that overcomes physical distance and communicates in various situations such as remote conferences, education, and social events.

도 1은 본 발명의 시스템에 대한 스크린샷을 나타낸 것이다.
도 2는 3D 텔레프리젠스에 대한 타겟 시나리오의 일 예시도를 나타낸 것이다.
도 3은 사용자 설문 조사 화면에 대한 예시도를 나타낸 것이다.
도 4는 사용자 선호도 조사의 대표적인 몇 가지 샘플을 나타낸 것이다.
도 5는 사용자 배치의 평균 근접한 인접 거리의 히스토그램에 대한 일 예시도를 나타낸 것이다.
도 6은 배치 의미를 표현하는 특징들에 대한 예시도를 나타낸 것이다.
도 7은 장면에서 상이한 특징을 생성하는 과정을 설명하기 위한 일 예시도를 나타낸 것이다.
도 8은 배치 유사도 예측기의 네트워크 구조에 대한 예시도를 나타낸 것이다.
도 9는 아바타 움직임을 위한 유한 상태 기계를 나타낸 것이다.
도 10은 두 개의 로컬 영역을 만드는 절차에 대한 일 예시도를 나타낸 것이다.1 shows a screenshot of the system of the present invention.
2 shows an exemplary diagram of a target scenario for 3D telepresence.
3 is a diagram illustrating an example of a user survey screen.
4 shows some representative samples of user preference surveys.
5 is a diagram illustrating an example of a histogram of an average nearest neighbor distance of a user arrangement.
6 shows an exemplary diagram of features expressing the meaning of arrangement.
7 is a diagram illustrating an example for explaining a process of generating different features in a scene.
8 shows an exemplary diagram of a network structure of a batch similarity predictor.
9 shows a finite state machine for avatar motion.
10 is a diagram illustrating an example of a procedure for creating two local areas.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형 태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 카테고리를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 카테고리에 의해 정의될 뿐이다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be embodied in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains It is provided to fully inform those who have the category of the invention, and the invention is only defined by the category of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며, 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소, 단계, 동작 및/또는 소자는 하나 이상 의 다른 구성요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.The terminology used herein is for the purpose of describing the embodiments, and is not intended to limit the present invention. As used herein, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” and/or “comprising” refers to a referenced component, step, operation and/or element of one or more other components, steps, operations and/or elements. The presence or addition is not excluded.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사 전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein may be used with the meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예들을 보다 상세하게 설명하고자 한다. 도면 상의 동일한 구성요소에 대해서는 동일한 참조 부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. The same reference numerals are used for the same components in the drawings, and repeated descriptions of the same components are omitted.

본 발명의 실시예들은, 서로 다른 곳에 거주하는 사람이 마치 함께 생활하는 듯한 텔레프레즌스를 제공하기 위하여, 실제 공간에 위치한 사람의 맥락과 의미를 보존하는 원격 공간의 대응점에 아바타를 배치하는 것을 그 요지로 한다.Embodiments of the present invention, in order to provide telepresence as if people living in different places live together, arranging the avatar at a corresponding point in a remote space that preserves the context and meaning of a person located in the real space. do.

여기서, 본 발명은 상황과 공간의 차이에 따라 사람마다 달라질 수 있는 최적의 대응점을 찾기 위해, 다양한 상황에 위치한 사람에 대응되는 아바타의 위치에 대한 사용자 선호도 조사를 선행하고, 다섯 가지 주요 특징 요소로 사람과 아바타 간의 상대적인 거리와 시선 방향, 사람이 위치한 곳 주변의 높이 값, 사람이 위치한 곳 주변의 가구 종류 및 거리의 합, 사람의 정면 시야에 보이는 가구 종류 및 거리의 합, 사람의 착석여부를 활용할 수 있다. 주요 특징 요소로 표현되는 사용자 선호도 데이터를 심층 신경망 기반의 뉴럴 네트워크 예를 들어, Triplet MatchNet으로 학습하여 두 개의 대응점간의 유사도를 정의하고, 공간을 샘플링 기반 최적화 기법을 사용하여 가장 유사도가 높은 최적의 대응점을 찾고 아바타를 배치할 수 있다.Here, the present invention precedes a user preference survey on the location of an avatar corresponding to a person located in various situations in order to find an optimal correspondence point that can vary for each person according to the difference in situation and space, and consists of five main characteristic elements. The relative distance and gaze direction between the person and the avatar, the height value around the place where the person is located, the sum of the furniture types and distances around the place where the person is located, the sum of the furniture types and distances seen in the person's front view, and whether or not a person is seated can be utilized Define the similarity between two correspondence points by learning user preference data expressed as key feature elements with a deep neural network-based neural network, for example, Triplet MatchNet, and use the spatial sampling-based optimization technique to optimize the correspondence point with the highest similarity. You can find and place your avatar.

즉, 본 발명은 타인의 원격공간에서 일어 날 수 있는 다양한 상호작용이 가능한 위치에 아바타를 배치하는 기술을 제공할 수 있으며, 공간을 패치로 나누지 않고, 연속적인 표면상의 임의의 위치에 아바타를 배치시킬 수 있다. 공간 유사도를 측정하기 위한 요소로 사람(아바타) 주변의 오브젝트의 분포, 사람(아바타)의 시야에 들어오는 오브젝트의 정보, 사람(아바타)의 선/앉은 자세 등과 같은 요소들을 정량적으로 고려할 수 있다.That is, the present invention can provide a technique for arranging an avatar at a location where various interactions that can occur in another person's remote space are possible, and disposes the avatar at an arbitrary location on a continuous surface without dividing the space into patches. can do it As factors for measuring spatial similarity, factors such as the distribution of objects around the person (avatar), information on objects entering the person's (avatar) field of view, and the line/sitting posture of the person (avatar) can be considered quantitatively.

본 발명은 아바타 모션 생성을 위해 탐구해야 할 여러 가지 문제들 중, 방 스케일의 실내 환경에서 아바타의 배치를 결정하는 중요한 하위 문제를 다루고 있다. 본 발명의 텔레프레즌스 시나리오는 도 2에 도시된 바와 같이 다른 레이아웃과 가구 배치를 가진 두 개의 원격 방을 보여준다. 공간 A의 사람 X를 공간 B의 아바타 X'으로 텔레포트함으로써, HMD를 착용한 공간 B의 사람 Y는 아바타 X'을 볼 수 있다. 마찬가지로, X는 공간 A에서 Y의 아바타 Y'을 볼 수 있다. 이는 원격 사용자가 해당 장소를 방문하고 있으며 물리적으로 같은 공간에 있는 듯이 자신과 상호작용할 수 있도록 각 사용자가 느끼게 할 수 있다. 또한, 본 발명은 두 사람이 텔레프레즌스 환경에서 일상생활을 하는 것을 가정한다. 그들은 때때로 서로 상호작용을 하며 독립적으로 행동할 수도 있다. 문제를 단순화하기 위해 본 발명은 각 공간마다 한 명씩만 있다고 가정한다. 이제 공간 A의 X가 특정 지점에서 다른 지점으로 이동한다 가정하면, X를 가장 잘 나타내려면 X'을 공간 B의 어디에 배치해야 하는지가 중요하다.The present invention addresses an important sub-problem of determining the placement of an avatar in a room-scale indoor environment, among various problems to be explored for generating an avatar motion. The telepresence scenario of the present invention shows two remote rooms with different layouts and furniture arrangements as shown in FIG. 2 . By teleporting person X in space A to avatar X' in space B, person Y in space B wearing the HMD can see avatar X'. Similarly, X can see Y's avatar Y' in space A. This can make each user feel that the remote user is visiting the place and can interact with them as if they were physically in the same room. In addition, the present invention assumes that two people live daily lives in a telepresence environment. They sometimes interact with each other and may act independently. To simplify matters, the present invention assumes that there is only one person in each space. Now, assuming that X in space A moves from one point to another, it is important to place X' in space B to best represent X.

이는 공간 A의 pX에 있는 X의 의미를 이해하는 것과 관련이 있으며, 여기서 pX는 X의 위치와 방향을 포함하는 배치를 의미할 수 있고, 이를 위해 여러 가지 다른 속성 예를 들어, 상호작용, 포즈와 공간 등을 고려해야 한다.This has to do with understanding the meaning of X in the pX of space A, where pX can mean an arrangement that includes the position and orientation of X, and for this purpose a number of different properties such as interaction, pose and space should be taken into account.

상호작용: X는 pX 근처의 사람이나 오브젝트와 상호작용하기 위해 pX에 있을 수 있으며, 이 경우 X'은 동일한 상호작용이 가능한 곳에 배치되어야 한다.Interaction: X can be in pX to interact with people or objects near pX, in which case X' should be placed where the same interaction is possible.

포즈: X는 pX가 제공하는 특정 포즈 예를 들어, 의자에 앉아 있는 포즈를 취할 수 있으며, 동일한 포즈를 수용할 수 있는 곳에 X'를 배치하는 것이 이상적이다.Pose: X can take certain poses that pX provides, eg sitting in a chair, ideally placing X' where it can accommodate the same poses.

공간: 실내 공간은 식당과 학습 공간과 같은 다양한 기능적인 공간으로 나뉘는데, X'은 X와 동일한 기능적인 공간에 배치할 필요가 있을 수 있다.Space: The indoor space is divided into various functional spaces such as dining and learning spaces, where X' may need to be placed in the same functional space as X.

본 발명에서는 이러한 속성을 정량화하는 특징 집합을 이용한다.The present invention uses a feature set that quantifies these properties.

이러한 특징을 정의한 후, 본 발명은 X'의 적절한 대응 배치를 찾는 방법을 제공한다. 이는 아래 <수학식 1>과 같이 정의되는 최적화 문제로 공식화할 수 있다.After defining these characteristics, the present invention provides a method for finding the appropriate corresponding arrangement of X'. This can be formulated as an optimization problem defined as in Equation 1 below.

[수학식 1][Equation 1]

여기서, x(pX'|pY, B)는 pY와 공간 B가 주어진 의미들을 나타내는 pX'의 특징 벡터를 의미하고, Sim(·,·)은 유사도 함수를 의미하는 것으로 두 특징이 텔레프레즌스의 관점에서 얼마나 가까운지를 측정할 수 있다. 그러면 아바타 배치는 적절한 유사도 함수를 모델링하는 것에 의존한다.Here, x(pX'|pY, B) denotes a feature vector of pX' representing the meanings given by pY and space B, and Sim(·,·) denotes a similarity function. You can measure how close you are. The avatar placement then relies on modeling an appropriate similarity function.

만약 두 개의 원격 공간의 구성 즉, Y와 Y'의 공간과 배치의 형태와 가구 배치가 같다면, 모든 특징은 X'의 배치에 대하여 유사한 위치를 가리키게 될 것이다. 그렇지 않으면 각 특징이 다른 위치에 대한 선호도를 높일 수 있다. 따라서, 유사도 측정을 학습하는 것은 특징 중 상대적 중요도를 추정하는 것과 관련이 있으며, 상대적 중요도는 상황에 따라 다르다. 예를 들어, 두 사람이 통신할 때 상호작용 관련 특성은 다른 특징을 지배할 수 있지만 사용자가 독립적으로 행동할 때 포즈와 공간 관련 특징이 더 중요할 것이다. 따라서, 유사도 측정은 특징의 비선형 함수가 될 것이다.If the configuration of the two remote spaces, i.e., the space of Y and Y', the shape of the arrangement, and the furniture arrangement are the same, all features will point to a similar position with respect to the arrangement of X'. Otherwise, each feature may increase the preference for a different location. Therefore, learning to measure similarity is related to estimating the relative importance among features, and the relative importance varies depending on the situation. For example, when two people communicate, interaction-related characteristics may dominate other characteristics, but pose and spatial-related characteristics will be more important when users act independently. Thus, the similarity measure will be a non-linear function of the feature.

유사도 측정은 인간 인식의 영역에 있으므로, 다양한 상황에서 사용자가 자신의 아바타를 어떻게 찾을 수 있는지에 대한 사용자 조사에 의해 얻은 데이터 집합으로 측정을 학습하는 접근방식을 취한다. 특히 최적화 문제를 해결하기 위하여 순위 접근법을 선택할 수 있고, triplet matchnet 구조를 이용하여 서로 다른 공간에 있는 두 장소의 유사도를 추정할 수 있는 신경망을 트레이닝할 수 있다. 트레이닝된 신경망은 테스트 정확도 면에서 베이스라인 선형 모델을 큰 폭으로 능가한다.Since similarity measures are in the realm of human cognition, we take an approach of learning measures from data sets obtained from user research on how users can find their avatars in various situations. In particular, a ranking approach can be chosen to solve the optimization problem, and a triplet matchnet structure can be used to train a neural network that can estimate the similarity of two places in different spaces. The trained neural network significantly outperforms the baseline linear model in terms of test accuracy.

학습된 유사도 함수는 아바타의 정적 배치를 수행하는 데 사용될 수 있다. 실용적인 텔레프레즌스 어플리케이션은 사용자의 모션도 다루어야 하기 때문에, 본 발명은 사용자가 실내 공간에서 움직일 때 아바타의 대응하는 움직임을 생성하기 위한 다양한 방법을 추가로 이용할 수 있다.The learned similarity function can be used to perform static placement of the avatar. Since practical telepresence applications must also deal with the motion of the user, the present invention may further utilize various methods for generating the corresponding movement of the avatar as the user moves in an indoor space.

텔레프레즌스 시나리오에서 본 발명의 실시예에 따른 방법의 유효성을 조사하기 위해, 다른 공간에 위치한 두 명의 사용자가 상대방의 아바타를 통해 서로 통신하는 프로토타입 VR과 AR 기반 텔레프레즌스 시스템을 구축하고, 다양한 테스트를 실시할 수 있으며, 도 1은 본 발명의 시스템에 대한 스크린샷을 나타낸 것이다. 도 1a는 가상현실 텔레프레즌스 공간에 대한 스크린샷을 나타낸 것이고, 도 1b는 증강현실 텔레프레즌스 공간에 대한 스크린샷을 나타낸 것이다. 도 1a의 평면도와 투시도에 도시된 바와 같이, 왼쪽의 여성 아바타 Y'과 오른쪽의 남성 아바타 X'은 두 원격 공간에서 사용자 X와 Y의 움직임을 나타내기 위해 움직이며, 도 1b의 평면도와 투시도에 도시된 바와 같이, 사용자 X와 사용자 Y는 자기 중심 뷰에서 바라보는 증강현실 텔레프레즌스 공간을 통해 상대방의 아바타 Y'과 X'을 볼 수 있다.In order to investigate the effectiveness of the method according to an embodiment of the present invention in a telepresence scenario, a prototype VR and AR-based telepresence system in which two users located in different spaces communicate with each other through each other's avatars were built, and various tests were conducted. 1 shows a screen shot of the system of the present invention. FIG. 1A shows a screenshot of a virtual reality telepresence space, and FIG. 1B shows a screenshot of an augmented reality telepresence space. As shown in the plan view and perspective view of Fig. 1A, the female avatar Y' on the left and the male avatar X' on the right move to represent the movements of users X and Y in two remote spaces, and in the top view and perspective view of Fig. 1B As shown, the user X and the user Y can see the other's avatars Y' and X' through the augmented reality telepresence space viewed from the self-centered view.

기여contribute

본 발명은 일반 실내 공간에서 가상 아바타의 배치와 움직임을 결정하는 첫 번째 연구로서, 다른 원격 공간에서 사용자의 배치와 움직임의 의미에 대한 다양한 측면을 보존한다. 이를 위해, 본 발명은 사용자의 배치를 위해 고려해야 할 특징 집합을 식별하고 사용자 선호도 조사로부터 얻은 트레이닝 집합을 사용하여 신경망 기반 유사도 예측기를 제공한다. 본 발명은 VR과 AR 기반의 텔레프레즌스 시스템을 프로토타입으로 구현하고, 본 발명의 방법에 대한 유효성을 확인할 수 있다.The present invention is the first study to determine the placement and movement of a virtual avatar in a general indoor space, and preserves various aspects of the meaning of the user's placement and movement in other remote spaces. To this end, the present invention identifies a feature set to be considered for user placement and provides a neural network-based similarity predictor using a training set obtained from a user preference survey. The present invention can implement a VR and AR-based telepresence system as a prototype, and confirm the effectiveness of the method of the present invention.

배치 유사도 학습Batch Similarity Learning

아바타의 최적의 배치를 찾기 위한 중심 요소는 텔레프리젠스 관점에서 아바타와 사람의 배치 유사도를 추정하는 유사도 함수이다. 이를 위해 배치의 의미를 나타내는 두 특징 벡터의 차이점을 출력하는 신경망을 트레이닝할 수 있다. 유사도는 인간의 인식에 따라 다르기 때문에, 본 발명은 먼저 다양한 공간 및 인간 배치 구성에 대해 사람들이 선호하는 배치 데이터를 수집하는 사용자 선호도 조사를 실시하고, 수집된 배치 데이터를 검토함으로써 텔레프리젠스와 관련된 특징 집합을 식별한다. 신경망은 Triple Matchnet 구조를 이용하여 배치의 랭킹을 학습하기 위하여 트레이닝될 수 있다.A central factor for finding the optimal arrangement of the avatar is a similarity function that estimates the similarity of arrangement between the avatar and the person from the telepresence point of view. To this end, we can train a neural network that outputs the difference between two feature vectors representing the meaning of placement. Since the similarity depends on human perception, the present invention first conducts a user preference survey to collect people's preferred placement data for various spatial and human placement configurations, and reviews the collected placement data to provide features related to telepresence. Identifies the set. A neural network can be trained to learn the ranking of a batch using the Triple Matchnet structure.

사용자 선호도 조사에 의한 데이터 획득Data acquisition by user preference survey

다양한 시나리오에서 사용자 위치에 해당하는 선호 아바타 위치를 수집하는 사용자 조사를 실시할 수 있다. 이를 위해 우선 24쌍의 하우스 모델을 준비하여 총 864개의 질문을 생성할 수 있으며, 구체적인 과정은 다음과 같을 수 있다. 본 발명은 Google Sketch UP에서 이용할 수 있는 실내 3D 모델 4종을 선택할 수 있는데, 적당한 수의 가구 아이템과 조리, 학습, 휴식을 위한 공간 등 기능 영역은 충분히 넓다. 그런 다음 가구 배치와 인테리어 예를 들어, 소품을 변경하여 모델 수를 두 배로 늘릴 수 있다. 총 8개의 공간에 28개의 쌍에서 24개의 공간 쌍을 구성하기 위해 4개의 쌍을 그들만의 변형이 있는 것으로 제외할 수 있다. 사용자가 각 공간 쌍에 대해 서로 다른 실내 환경에서 아바타 배치를 선호하는 것을 알기 위해, 본 발명은 사람 X, 사람 Y, 아바타 Y'을 다른 위치와 방향에 배치하여 36개의 질문을 생성할 수 있다. 사람과 아바타 사이의 거리와 방향, 오브젝트와의 상호작용(예를 들어, TV 시청, 창밖을 내다보는 것 등), 포즈(예를 들어, 앉거나 서 있는 것), 기능 영역(예를 들어, 부엌, 거실, 자유 공간)을 갖기 위해 각 공간에 질문을 생성할 수 있으며, 그 결과 총 24 Х 36 = 864개의 질문이 생성될 수 있다.In various scenarios, user research can be conducted to collect preferred avatar positions corresponding to user positions. To this end, a total of 864 questions can be generated by first preparing 24 pairs of house models, and the specific process may be as follows. In the present invention, four types of indoor 3D models available in Google Sketch UP can be selected, and the functional area such as a suitable number of furniture items and a space for cooking, learning, and rest is sufficiently wide. You can then double the number of models by changing the furniture arrangement and the interior, for example, the props. Four pairs can be excluded as having their own variants to compose 24 space pairs from 28 pairs in a total of 8 spaces. In order to know that the user prefers to place an avatar in a different indoor environment for each pair of spaces, the present invention can place Person X, Person Y, Avatar Y' in different positions and orientations to generate 36 questions. Distance and direction between the person and the avatar, interactions with objects (e.g. watching TV, looking out a window, etc.), poses (e.g. sitting or standing), functional areas (e.g., A question can be created in each space to have a kitchen, living room, free space), resulting in a total of 24 Х 36 = 864 questions.

도 3은 사용자 설문 조사 화면에 대한 예시도를 나타낸 것으로, 참가자는 2D 모니터와 동시에 사람과 아바타를 배치하여 공간 A(왼쪽 하단)와 공간 B(오른쪽 하단)의 평면도를 볼 수 있으며, 참가자 X(왼쪽 위) 및 아바타 X'(오른쪽 위)의 보기가 참가자에게 제공될 수 있다. 참가자는 사람 X의 배치에 해당하는 아바타 X'의 배치를 선택하고, 오른쪽 하단에서 10 명의 참가자가 응답한 아바타 X'의 배치는 오버랩될 수 있으며, 황색 및 자주색 시선은 각각 X 및 Y를 나타낸다. 각각의 질문에 대해, 본 발명은 참가자들에게 아바타 X'을 그들이 선호하는 장소에 배치하도록 요청하며, 참가자들에게 도 3에 도시된 바와 같이 각 공간의 2D 평면도와 사람과 아바타의 자기중심적 뷰를 제공할 수 있다. 참가자들은 마우스로 아바타를 바꾸고 회전시킴으로써 공간을 탐험할 수 있다. 실험 동안, 본 발명은 사용자들이 "먼 공간에 사람의 위치를 가장 잘 나타내는 위치에 아바타를 배치하라"라는 지침에 따라 공간을 철저히 검사할 수 있도록 한다. 참가자들에게는 현장의 시각만을 보고 아바타를 배치해 달라는 요청을 받을 수 있으며, 정확히 X와 Y가 보고 있는 내용이나 하고 있는 일 등 추가 정보가 제공되지 않는다. 추가 정보를 주지 않는 이유는 이런 고도의 상황 정보에 의존하지 않는 아바타 배치 알고리즘을 제공하기 위함이다.3 shows an exemplary view of a user survey screen, where the participant can view the top view of space A (bottom left) and space B (bottom right) by placing a person and an avatar at the same time as a 2D monitor, and participant X ( Views of Avatar X' (top right) and Avatar X' (top left) may be presented to the participant. The participant selects the placement of the avatar X' corresponding to the placement of the person X, the placement of the avatar X' answered by the 10 participants in the lower right corner may overlap, and the yellow and purple gazes indicate X and Y, respectively. For each question, the present invention asks the participants to place the avatar X' in their preferred location, and asks the participants a 2D floor plan of each space and an egocentric view of the person and the avatar as shown in FIG. can provide Participants can explore the space by changing and rotating the avatar with the mouse. During experimentation, the present invention allows users to thoroughly inspect the space according to the guidelines "Place the avatar in a location that best represents the person's location in a distant space". Participants may be asked to place their avatars by viewing only the scene's perspective, and no additional information is provided, such as exactly what X and Y are seeing or what they are doing. The reason for not providing additional information is to provide an avatar placement algorithm that does not depend on such high-level contextual information.

각 질문에 대해 총 10개의 응답을 얻어서 총 사용자 응답 864 Х 10 = 8640이 될 수 있다. 약 200명이 조사에 참여하고, 참가자는 평균 40개의 질문에 답할 수 있다.A total of 10 responses can be obtained for each question, resulting in a total user response of 864 Х 10 = 8640. About 200 people take part in the survey, and participants can answer an average of 40 questions.

도 4는 사용자 선호도 조사의 대표적인 몇 가지 샘플을 나타낸 것으로, 도 4에 도시된 바와 같이 두 사람이 서로 쳐다보거나 오브젝트를 함께 바라보는 듯할 때, 참가자들은 같은 행동을 할 수 있는 장소에 아바타를 배치하고, 그 결과 참가자들의 대답은 비슷하였다. 반면에, 참가자들의 대답은 사람 또는 아바타의 분리된 배치, 다른 공간의 배치, 또는 특정 카테고리의 가구 부족으로 인해 두 공간의 구성이 현저하게 다를 때 다양할 수 있다. 즉, 도 4에 도시된 바와 같이, 사람 X와 아바타 Y'가 상호 작용하는 것처럼 보이는 경우 사용자는 아바타 X'를 비슷한 위치에 배치하고, 사람 X와 아바타 Y'가 독립적으로 행동하는 경우 아바타 X'의 사용자 배치는 더 큰 차이를 보일 수 있다.Figure 4 shows some representative samples of user preference survey. As a result, the participants' responses were similar. On the other hand, participants' answers may vary when the configuration of the two spaces is significantly different due to the separate arrangement of people or avatars, the arrangement of different spaces, or the lack of furniture in a particular category. That is, as shown in Fig. 4, when person X and avatar Y' seem to interact, the user places avatar X' in a similar position, and when person X and avatar Y' act independently, avatar X' The user placement of can make a bigger difference.

사용자 데이터의 군집화/분산 패턴을 이해하기 위해 도 5에 도시된 바와 같이 각 질문에서 사용자 배치의 평균 근접한 인접 거리의 히스토그램을 계산할 수 있다. 각 사용자 배치의 가장 가까운 이웃은 동일한 질문에 대한 나머지 9개의 사용자 배치 중 가장 가까운 것으로 확인될 수 있으며, 거리는 위치 및 각도 차이의 가중 합계로 정의될 수 있다. 사용자 배치와 가장 가까운 이웃 사이의 평균 거리는 히스토그램 분석에 사용될 수 있으며, 히스토그램은 사용자 측량 데이터의 군집화 경향을 지속적으로 변화시키는, 잘리지 않는 정규 분포의 패턴을 보여준다.In order to understand the clustering/dispersion pattern of user data, a histogram of the average nearest neighbor distance of the user placement in each question can be calculated as shown in FIG. 5 . The nearest neighbor of each user batch may be identified as the closest of the remaining nine user batches for the same question, and the distance may be defined as the weighted sum of the position and angle differences. The average distance between user placement and nearest neighbor can be used for histogram analysis, which shows an untruncated normal distribution pattern that constantly changes the clustering tendency of user survey data.

특징 Characteristic 모델링modelling

사람 X의 위치에 해당하는 아바타 X'의 배치를 결정하려면 해당 배치의 의미를 나타내는 배치의 특징을 정의하고 아바타를 배치할 때 특징을 보존해야 한다. 사용자 선호도 조사 결과를 검토한 후, 실내 환경에서 사람이 배치하는 의미를 나타내기 위해 도 6에 도시된 바와 같이 상호 작용 특징, 포즈 수용 특징과 공간적 특징에 대한 낮은 수준의 특징을 식별할 수 있다.Determining the placement of avatar X' corresponding to the position of person X requires defining the characteristics of the placement that represent the meaning of that placement, and preserving the characteristics when placing the avatar. After reviewing the user preference survey results, low-level features for interaction features, pose acceptance features, and spatial features can be identified as shown in FIG. 6 to indicate the meaning of human placement in an indoor environment.

상호 작용 특징에 대해 설명하면, 사람은 종종 다른 사람이나 오브젝트와 상호작용을 하는데, 그 사람과 다른 사람이 참석하는 실체 사이의 공간적 관계는 텔레프레즌스에서 보존되어야 할 중요한 특징이다. Hall's Proxemics에 따르면, 두 사람 사이의 거리는 그들의 관계의 친밀함을 정의한다. 다른 사람이나 오브젝트의 가시성도 상호작용에 강하게 영향을 미치기 때문에 사람의 머리가 지향하는 방향도 중요하다.As for the interaction characteristic, a person often interacts with another person or object, and the spatial relationship between that person and the entity the other person is present is an important characteristic to be preserved in telepresence. According to Hall's Proxemics, the distance between two people defines the closeness of their relationship. The direction a person's head is pointing is also important, as the visibility of other people or objects also strongly influences the interaction.

본 발명은 두 카테고리의 상호작용 특징을 모델화할 수 있다. 하나는 다른 파티의 상대방과 아바타 사이의 대인관계로, 즉 x_ip = {d_rel , θ₁, θ₂}일 수 있다. 여기서, d_rel은 두 사람 사이의 거리를 의미하고, θ_1,2는 다른 파티의 정면 방향에서 상대방 위치까지의 각도차를 의미할 수 있다. 물론, 각도차는 항상 음수가 아닐 수 있다.The present invention can model the interaction characteristics of two categories. One is the interpersonal relationship between the opponent and the avatar of another party, that is, x _ip = {d _rel , θ ₁ , θ ₂ }. Here, d _rel may mean a distance between two people, and θ _1,2 may mean an angular difference from the front direction of the other party to the position of the other party. Of course, the angle difference may not always be negative.

다른 카테고리는 시각적 주의 특징이다. 본 발명은 한 사람이 12개의 오브젝트 카테고리(예를 들어, 소파, 의자, 테이블, TV, 에어컨, 냉장고, 싱크대, 램프, 피아노, 캐비닛, 선반, 창문)까지 참석할 수 있다고 가정한다. 실제 상황에서 인식하기 어려운 단일 오브젝트를 특정하기보다는 일정 거리 내의 좁은 시야(예를 들어, 40도) 내의 모든 오브젝트는 시각적 관심의 후보 오브젝트며, 사람의 위치와 시선 중심에서 가까운 오브젝트는 보다 높은 시각 주의 값을 확보한다고 가정한다. 따라서 시각 주의 기능 x_va의 i번째 요소는 아래 <수학식 2>와 같이 정의될 수 있다.Another category is the visual attention feature. The present invention assumes that one person can attend up to 12 object categories (eg, sofa, chair, table, TV, air conditioner, refrigerator, sink, lamp, piano, cabinet, shelf, window). Rather than specifying a single object that is difficult to recognize in real situations, all objects within a narrow field of view (eg, 40 degrees) within a certain distance are candidate objects of visual interest, and objects closer to the human position and center of gaze have higher visual attention. Assume you get a value. Therefore, the i-th element of the visual attention function x _va may be defined as in <Equation 2> below.

[수학식 2][Equation 2]

여기서, d_i _,j 및 θ_i,j는 i번째 카테고리(i = 1···12)의 j번째 오브젝트의 거리와 각도를 의미하며,

와

는 각각 최대 거리(예를 들어, 4m) 및 각도(예를 들어, 40도)를 의미할 수 있다.Here, d _i _,j and θ _i,j mean the distance and angle of the j-th object of the i-th category (i = 1 ... 12),

Wow

may mean a maximum distance (eg, 4 m) and an angle (eg, 40 degrees), respectively.

오브젝트와 달리, 상대방이나 아바타가 다른 당사자가 보이지 않는 경우에도 대인관계 특징 x_ip을 통한 아바타 배치를 항상 고려할 수 있다. 이 선택의 근거는 비록 두 사람이 직접적으로 상호작용을 하고 있지 않더라도, 본 발명은 사람들이 끊임없이 상대방의 입장을 염두에 두고 있다고 가정한다. 일단 트레이닝을 받으면, 본 발명의 최적화기는 상황에 따라 이 기능을 적절하게 강조할 것이다.Unlike objects, the placement of avatars through _{interpersonal characteristics x ip} can always be considered even when the other party or avatar is invisible to the other party. The rationale for this choice is that even if two people are not directly interacting, the present invention assumes that people are constantly taking the other's position in mind. Once trained, the optimizer of the present invention will highlight this function appropriately depending on the situation.

포즈 수용 특징에 대해 설명하면, 앉아 있거나 서 있는 자세와 같은 사람의 포즈는 인간의 행동을 특징짓는 중요한 요소로서, 원격 사용자의 포즈를 반영하기 위해 사용자의 포즈를 수용하는 위치에 아바타를 배치할 필요가 있다. 사람의 포즈를 수용하는 일반적인 가구 항목(예를 들어, 바닥, 의자, 침대)은 높이 필드가 다르기 때문에 포즈 수용은 위치 주변의 높이 필드로 표현된다. 구체적으로는 사람 주위의 반지름이 0.5m인 원형 영역을 중심부 즉, 반경 0.25m의 원로 나누고, 외측 링에 있는 16개의 주변 섹터로 나눈다. 그런 다음, 포즈 수용 특징 x_pa는 각 세분화된 영역의 평균 높이로 나타낸다. 또한 사람 또는 아바타가 앉아 있는지 서 있는지를 나타내기 위해 2진수 특징값 x_ss를 추가한다. 예를 들어, x_ss는 사람 또는 아바타가 자리 가능한 소파, 의자와 같은 가구 카테고리에 위치할 경우 좌석에 앉는 것으로 설정될 수 있으며, 그렇지 않으면 서 있는 것으로 설정될 수 있다.Speaking of pose receptive characteristics, a person's pose, such as a sitting or standing posture, is an important factor in characterizing human behavior, requiring placement of the avatar in a position that accepts the user's pose to reflect the remote user's pose. there is Because typical furniture items that accept a person's pose (eg, floor, chair, bed) have different height fields, pose acceptance is expressed as a height field around the location. Specifically, a circular area with a radius of 0.5 m around the person is divided into a center, that is, a circle with a radius of 0.25 m, and divided into 16 peripheral sectors in the outer ring. Then, the pose receptive feature x _pa is expressed as the average height of each subdivided area. We also add a _{binary feature x ss} to indicate whether the person or avatar is sitting or standing. For example, x _ss may be set to sit on a seat when a person or avatar is placed in a furniture category such as a sofa or chair where the person or avatar can sit, and may be set to stand otherwise.

포즈 수용 특징은 주어진 위치에서의 가능한 포즈를 특징지을 뿐만 아니라 사람(또는 아바타)과 근처의 가구들 사이의 공간적 관계 예를 들어, 테이블 옆에 서 있거나 테이블 앞에 서 있는 경우 등을 규정할 수 있다. The pose receptive feature may not only characterize possible poses at a given location, but may also define the spatial relationship between the person (or avatar) and nearby furniture, eg, standing next to a table or standing in front of a table.

공간적 특징에 대해 설명하면, 본 발명의 공간적 특징의 목적은 주변 공간의 기능적 특징을 특정짓는 것이다. 이를 위해 가구 분포가 중요한 요인이다. 공간적 특징을 모델화하기 위해, 본 발명은 일정 거리 내에서 동일한 카테고리에 속하는 가구 항목의 거리를 요약하는 다소 단순한 접근법을 취할 수 있다. 따라서 공간적 특징 x_sp의 i번째 요소는 아래 <수학식 3>과 같이 정의될 수 있다.Regarding spatial characteristics, the purpose of spatial characteristics of the present invention is to characterize the functional characteristics of the surrounding space. For this, household distribution is an important factor. In order to model spatial features, the present invention can take a rather simplistic approach that summarizes the distances of furniture items belonging to the same category within a certain distance. Therefore, the i-th element of the spatial feature x _sp may be defined as in Equation 3 below.

[수학식 3][Equation 3]

여기서,

는 공간적 특징 범위의 최대 거리(예를 들어, 3m)를 의미하며, d_i,j는 i번째 카테고리(i = 1···12)의 j번째 오브젝트의 거리를 의미할 수 있다.here,

may mean the maximum distance (eg, 3m) of the spatial feature range, and d _i,j may mean the distance of the j-th object of the i-th category (i = 1 ... 12).

배치의 특징 벡터는 모두 45차원 벡터 x = [x_ip, x_va, x_pa, x_ss, x_sp]로 표현될 수 있다.All of the feature vectors of the batch can be expressed as a _{45-dimensional vector x = [x ip} , x _va , x _pa , x _ss , x _{sp ].}

유사도 학습similarity learning

상술한 바와 같이, 본 발명은 사용자 선호도 조사로부터 몇 가지 샘플을 획득하고, 실내 장면에서의 배치를 특징짓는 특징 벡터를 정의할 수 있다. 사람 X'의 배치 특징은 x⁰으로, 사용자 선호도 조사를 통해 얻은 아바타 X'의 배치 특징은 x⁺로 나타낼 수 있다. x_i ⁰의 각 샘플에 대해 x_i,j ⁺, (i = 1···864, j = 1···10)를 획득할 수 있다. As described above, the present invention can obtain several samples from a user preference survey and define feature vectors that characterize their placement in an indoor scene. The arrangement characteristic of person X' may be expressed as ^{x 0} , and the arrangement characteristic of the avatar X' obtained through the user preference survey ^{may be expressed as x + .} For each sample of _{x i} ⁰ _{, x i,j} ⁺ , (i = 1...864, j = 1...10) can be obtained.

사용자가 선택한 배치는 다른 배치보다 낫지만 절대 정답은 아니라는 것을 표시할 수 있다. 즉, 사용자 샘플 x⁺는 더 나은 배치가 있었다면 선택되지 않았을 수 있다. 따라서 이 문제는 데이터 간의 순위 면에서 해결될 필요가 있다. 즉, x^-가 사용자가 선택하지 않은 배치의 특징일 때 d(x⁰, x⁺) < d(x⁰, x^-)를 판단할 수 있는 유사도 함수 d(·, ·)를 트레이닝함으로써 해결될 수 있다.A user-selected batch may indicate that one is better than another, but is not an absolute correct answer. That is, the user sample x ⁺ may not have been selected if there had been a better placement. Therefore, this problem needs to be solved in terms of ranking among data. That is, it can be solved by training a similarity function d(·, ·) that can determine that ^{d(x 0} , x ⁺ ) < d(x ⁰ , x ^- ^{) when x -} is a feature of a batch not selected by the user can

이를 위한 기본적인 접근방식은 2선형식의 유사도 함수 즉, d(x⁰,x) = (x⁰-x)^TW(x⁰-x)를 구하는 것이다. 2선형식에서의 유사도 학습 또는 거리 측정 학습을 위해 많은 기법이 개발되었으며, 그 중 일부는 본 발명의 문제와 마찬가지로 학습 상대적 유사도를 지원할 수 있다. 데이터의 차이가 순위를 매기기 위한 충분한 정보를 제공한다면 이 접근법은 효과적일 수 있다. 불행하게도, 하위 특징의 중요성은 x⁰의 구성에 따라 다르기 때문에 이것은 본 발명의 문제에 해당되지 않는다. 데이터 유사도의 비선형 특성을 본 발명의 문제에 반영하기 위해, 두 배치 특징 사이의 유사도를 학습하는 심층 신경망을 트레이닝할 수 있다.The basic approach to this is to find a two-linear similarity function, d(x ⁰ ,x) = (x ⁰ -x) ^T W(x ⁰ -x). Many techniques have been developed for similarity learning or distance measurement learning in a two-line format, and some of them can support learning relative similarity like the problem of the present invention. This approach can be effective if the differences in the data provide enough information for ranking. Unfortunately, this does not fall within the scope of the present invention as the significance of the sub-features ^{depends on the configuration of x 0 .} In order to reflect the nonlinear characteristic of data similarity to the problem of the present invention, a deep neural network that learns the similarity between two batch features can be trained.

데이터 집합(dataset)에 대해 설명하면, 사용자 선호도 조사로부터 유사한 쌍(x⁰,x⁺)을 얻었지만, 트레이닝에 대해서는 덜 유사한 쌍(x⁰,x^-)이 필요하다. 장면에서 상이한 특징 x^-를 생성하기 위해, 도 7에 도시된 바와 같이 장면에서 랜덤 배치를 생성하고 해당 배치에 해당하는 특징을 계산할 수 있다. 도 7의 왼쪽 도면은 사람 X의 배치를 나타내고, 도 7의 오른쪽 도면은 아바타 X'의 사용자 선택 유사 배치(갈색)와 아바트 X'의 생성된 덜 유사한 배치를 나타낸다.Regarding the dataset, similar pairs (x ⁰ ,x ⁺ ) were obtained from the user preference survey, but less similar pairs (x ⁰ ,x ^- ) are needed for training. In order to generate different features x ⁻ in the scene, as shown in FIG. 7 , one can generate a random arrangement in the scene and calculate the features corresponding to that arrangement. The figure on the left of Fig. 7 shows the placement of Person X, and the figure on the right of Fig. 7 shows the user-selected similar arrangement (brown color) of Avatar X' and the created less similar arrangement of Avat X'.

거리, 각도 및 포즈와 관련하여 사용자가 선택한 위치 중 하나에 너무 가까운 랜덤 배치 샘플을 폐기할 수 있다. 랜덤 배치는 100개의 샘플에 대해 실내 공간 전체에 걸쳐 수행될 수 있고 사용자가 선택한 추가 샘플 10개에 가까운 위치에 배치되어 도전적이고 상이한 데이터를 만들 수 있다. 따라서 각 x⁰에 대해 10개의 양 데이터 x⁺와 110개의 음 데이터 x^-를 가지며, 이는 트레이닝에서 총 1100개의 튜플(x⁰, x⁺, x^-)을 만들 수 있다.A random batch sample that is too close to one of the user-selected positions with respect to distance, angle and pose may be discarded. Random placement can be performed over the entire room space for 100 samples and placed close to 10 additional samples selected by the user to create challenging and different data. So for each x ⁰ we have 10 positive data x ⁺ and 110 negative data x ^- , which can make a total of 1100 tuples (x ⁰ , x ⁺ , x ^- ) in training.

본 발명은 두 장소 사이 차이점(또는 거리)의 비선형 특성을 학습하는 신경망을 제공할 수 있다. 여기서, 신경망은 Triplet match network(MatchNet) 프레임워크에 기초하여 트레이닝될 수 있다.The present invention can provide a neural network that learns the nonlinear characteristics of the difference (or distance) between two places. Here, the neural network may be trained based on a triplet match network (MatchNet) framework.

Triplet MatchNet은 도 8a에 도시된 바와 같이 지도학습 방식으로 입력 특징의 차이점을 학습할 수 있다. 트레이닝 단계에서 모델은 세 가지 유형의 입력 {x}:(x⁰, x⁺, x^-) 데이터 집합을 수신한 다음, Triplet MatchNet은 x^-보다 x⁺가 x⁰과 더 유사하다고 예측하기 위해 입력 특징 간의 차이점을 학습한다. 그 차이점은 순위를 매길 수 있다. 전형적인 Triplet MatchNet 모델은 특징 추상화를 위한 FeatureNet과 차이점 추정을 위한 MetricNet으로 구성된다. Triplet MatchNet은 유사한 쌍(x⁰, x⁺) 사이의 거리 d⁺를 낮음(예를 들어, 0)으로, 상이한 쌍(x⁰, x^-) 사이의 거리 d^-를 높음(예를 들어, 1)으로 푸시하도록 최적화될 수 있다. 동시에 d⁺가 d^-보다 적도록 두 거리 간의 비교를 활용한다. 본 발명에서는 손실함수 φ({x})와 ψ({x})로 두 최적화 목표를 달성할 수 있으며, 두 손실함수는 아래 <수학식 4>, <수학식 5>와 같이 나타낼 수 있다.Triplet MatchNet can learn the difference between input features in a supervised learning method as shown in FIG. 8A. In the training phase, the model receives a dataset of three types of input {x}:(x ⁰ , x ⁺ , x ^- ), then the Triplet MatchNet inputs to predict that ^{x +} is ^{more similar to x 0} than x ^- Learn the differences between features. The differences can be ranked. A typical Triplet MatchNet model consists of FeatureNet for feature abstraction and MetricNet for difference estimation. Triplet MatchNet sets the distance d ⁺ between similar pairs (x ⁰ , x ⁺ ) to low (e.g. 0) and the distance d ^- ^{between different pairs (x 0} , x ^- ) to high (e.g. 1 ) can be optimized to push At the same time, we use the comparison between the two distances so that ^{d +} ^{is less than d -.} In the present invention, two optimization goals can be achieved with the loss functions φ({x}) and ψ({x}), and the two loss functions can be expressed as <Equation 4> and <Equation 5> below.

[수학식 4][Equation 4]

[수학식 5][Equation 5]

여기서, 총 손실함수는 φ({x})+ψ({x})로 정의될 수 있다.Here, the total loss function may be defined as φ({x})+ψ({x}).

본 발명은 일반적인 Triplet MatchNet 구조를 약간 확장하여 본 발명의 문제에 대해 차이점을 더 잘 추정할 수 있는 신경망에 대한 추가 힌트를 제공한다. triplet 입력 외에도, 본 발명의 모델은 두 특징 사이의 차이 |x⁰-x^±|를 명시적으로 계산하여 추가 입력으로 사용할 수 있다. 도 8b는 네트워크 모델의 전반적인 구조를 나타낸 것으로, 명시적 거리 입력은 Distance FeatureNet이라는 다른 FeatureNet을 통해 전달될 수 있다.The present invention slightly extends the general Triplet MatchNet architecture to provide additional hints for neural networks that can better estimate differences for the problem of the present invention. In addition to the triplet input, our model can be used as an additional input by explicitly calculating the ^{difference |x 0} -x ^{± | between two features.} 8B shows the overall structure of the network model, and an explicit distance input may be transmitted through another FeatureNet called Distance FeatureNet.

본 발명의 신경망 모델은 도 8c에 도시된 바와 같이 블랙 박스 모듈(BBM)과 하위 특징 처리 모듈(SFPM)의 두 가지 기본 모듈로 구성될 수 있다. SFPM은 입력 특징 집합 중 하나인 χ ∈ {x⁰, x⁺, x^-, |x⁰-x⁺|, |x⁰-x^-|}을 받고, 입력을 동일한 유형의 하위 특징 x_ip, x_pa, x_va, x_ss, x_sp로 분할한다. 로컬 상관 관계인 x_pa, x_va 및 x_sp의 고차원 하위 특징은 별도의 BBM을 통해 처리될 수 있다. 그런 다음 BBM과 저차원 하위 특징인 x_ip과 x_ss의 출력이 연계되어 SFPM의 최종 출력이 된다. 하단의 FeatureNet과 Distance FeatureNet으로 두 개의 SFPM이 사용되며, 상단의 MetricNet으로 간단한 BBM이 사용된다.The neural network model of the present invention may be composed of two basic modules, a black box module (BBM) and a sub-feature processing module (SFPM), as shown in FIG. 8C . SFPM takes one of the input feature sets, χ ∈ {x ⁰ , x ⁺ , x ^- , |x ⁰ -x ⁺ |, |x ⁰ ^{-x -} |}, and converts the _{input into subfeatures x ip} , x of the same type. Split into _pa , x _va , x _ss , and x _{sp .} The high-order sub-features of the local correlations x _pa , x _va and x _{sp can be processed through a separate BBM.} Then, the output of the BBM and the low-dimensional sub-features x _ip and x _ss is linked to become the final output of the SFPM. Two SFPMs are used as the FeatureNet and Distance FeatureNet at the bottom, and a simple BBM is used as the MetricNet at the top.

본 발명의 모델은 BBM의 5가지 종류를 가지고 있고 그 계층의 크기는 아래 <표 1>과 같을 수 있으며, 본 발명은 BBM을 SFPM과 FeatureNet(BBM FeatureNet)으로 비교할 수 있고, BBM의 각 계층은 SFPM에서 해당 계층 셀과 동일한 수의 셀을 유지할 수 있다.The model of the present invention has five types of BBM, and the size of the layer may be as shown in <Table 1> below, and the present invention can compare BBM with SFPM and FeatureNet (BBM FeatureNet), and each layer of BBM is In the SFPM, the same number of cells as the corresponding layer cells can be maintained.

상기 표 1에서 x는 입력 셀들을 의미하고, h1과 h2는 후속 레이어의 셀들을 의미하며, y는 출력 셀들을 의미하고, N은 일반 Triplet MatchNet인 경우 2이고 본 발명의 모델인 경우 3을 의미할 수 있다.In Table 1, x denotes input cells, h1 and h2 denote cells of a subsequent layer, y denotes output cells, and N denotes 2 in the case of a general Triplet MatchNet and 3 in the case of the model of the present invention. can do.

아바타 배치 및 이동Place and move your avatar

두 배치 사이의 학습된 유사도 함수에 기초하여, 원격 사용자의 배치나 이동을 가장 잘 나타낼 수 있도록 아바타를 배치하거나 이동하는 방법을 제공할 수 있다.Based on the learned similarity function between the two placements, it is possible to provide a way to position or move the avatar to best represent the placement or movement of the remote user.

본 발명은 사용자가 좁은 지역 내에서 준정상적으로 움직이거나 다른 목적지를 향해 걸어가는 것으로 가정한다. 이러한 행동을 표현하기 위해 아바타의 움직임은 정적 배치, 준정상적 이동, 이동의 세 가지 상태로 구성된 유한 상태 기계로 모델링된다. 도 9는 아바타 움직임을 위한 유한 상태 기계를 나타낸 것으로, 정적 배치 상태에서 아바타는 원격 사용자의 배치에 가장 적합한 위치로 이동한다. 배치가 끝나면 상태는 준정상적 이동 상태로 전환된다. 준정상적 이동 상태는 사용자가 지역 내에서 느리게 이동하는 상황에 해당하는 상태로, 이 경우 아바타의 위치는 기준 프레임에서 사용자의 위치의 변위를 보여주는 것과 아바타가 환경으로 침투하는 것을 방지하는 것 사이에서 절충하기로 결정된다. 사용자가 걷기 시작하면 아바타 상태는 이동 상태로 전환되며, 본 발명은 사용자 이동의 의미에 대한 다양한 측면을 보존하기 위한 몇 가지 방법을 제공한다. 사용자가 정지하면 아바타 상태가 정적 배치 상태로 전환되어 배치를 업데이트한다. 각 상태에 따른 아바타의 배치 및 이동 정책에 대해 설명하면 다음과 같다.The present invention assumes that the user moves quasi-normally within a small area or walks towards another destination. To express these behaviors, the movement of the avatar is modeled as a finite state machine consisting of three states: static placement, quasi-normal movement, and movement. Fig. 9 shows a finite state machine for avatar movement, wherein in a static placement state, the avatar moves to a position most suitable for placement of a remote user. At the end of deployment, the state transitions to a quasi-normal movement state. A quasi-normal movement state corresponds to a situation in which the user moves slowly within an area, in which case the avatar's position is a compromise between showing the displacement of the user's position in the frame of reference and preventing the avatar from penetrating into the environment. it is decided to When the user starts walking, the avatar state transitions to the moving state, and the present invention provides several methods for preserving various aspects of the meaning of the user's movement. When the user pauses, the avatar state transitions to a static placement state to update the placement. An avatar placement and movement policy according to each state will be described as follows.

정적 배치에 대해 설명하면, 본 발명은 샘플링 기반 최적화 방식을 사용하여 다른 공간에 있는 사람 X의 배치를 고려하여 아바타 X'의 최적 배치를 찾는다. 간결성을 위해, 아바타 X'과 사람 X의 배치 사이의 차이점인 d(x(pX'|pY,B),x(pX|pY')을 D(pX',pX)로 정의하여 설명하면, 본 발명의 최적화 문제는 아래 <수학식 6>과 같이 나타낼 수 있다.Regarding the static placement, the present invention uses a sampling-based optimization method to find the optimal placement of the avatar X' by considering the placement of the person X in different spaces. For brevity, if d(x(pX'|pY,B),x(pX|pY'), which is the difference between the placement of avatar X' and person X, is defined as D(pX',pX), The optimization problem of the invention can be expressed as in Equation 6 below.

[수학식 6][Equation 6]

본 발명은 최적화를 두 단계로 나누어 해결한다. 첫째, 본 발명은 각 공간을 0.25미터의 그리드 크기 2D 그리드 맵으로 샘플링하고 각 그리드마다 15도당 하나씩 24개의 방향 샘플을 채취한다. 주어진 그리드에서 가장 낮은 차이점을 가진 최고의 샘플을 찾기 위해 모든 샘플 배치에서 사람 X의 형상 벡터와 아바타 X'의 특징 벡터 사이의 차이를 비교한다. 다음으로, 최상의 샘플 배치부터 시작해서, 본 발명은 입자 무리 최적화(PSO; particle swarm optimization)를 사용하여 아바타 X'의 최적 배치를 찾는다. 본 발명에서는 PSO 파라미터에 관한 관성 w = 0.73, 압축 계수 c₁, c₂ = 1.49, 10개의 입자 및 10개의 최대 에포크(epoch)를 사용할 수 있다.The present invention solves the optimization by dividing it into two steps. First, the present invention samples each space as a 0.25-meter grid size 2D grid map and takes 24 orientation samples, one for every 15 degrees for each grid. Compare the difference between the shape vector of person X and the feature vector of avatar X' in every batch of samples to find the best sample with the lowest difference in a given grid. Next, starting with the best sample placement, the present invention finds the optimal placement of avatar X' using particle swarm optimization (PSO). In the present invention, inertia w = 0.73 for PSO parameters, compression coefficients c ₁ , c ₂ = 1.49, 10 particles and 10 maximum epochs can be used.

다음으로, 아바타 X'는 최적의 배치로 빠르게 옮겨진다. 사람 Y가 이러한 전송을 인지하도록 돕기 위해, 아바타 X'의 배치는 이전 배치에서 새로운 배치로 보간된다. 또한, 이러한 빠른 전송이 사용자가 취한 원래 동작이 아님을 나타내기 위해, 보간 중에 더해진 투명성으로 아바타를 시각화한다. 정적 배치가 완료되면 아바타의 상태가 준정상적 이동으로 전환된다.Next, Avatar X' is quickly moved to the optimal placement. To help Person Y perceive this transmission, the placement of avatar X' is interpolated from the old placement to the new placement. It also visualizes the avatar with added transparency during interpolation to indicate that this fast transfer is not the original action taken by the user. When static placement is complete, the avatar's state transitions to quasi-normal movement.

준정상적 이동에 대해 설명하면, 사람이 상대방에게 말을 걸거나 정지된 자세로 업무를 수행할 때도 가만히 서 있기보다는 점차 그 위치를 바꾸는 경향이 있다. 본 발명은 이 행동을 q준정상적 이동이라고 부른다. 준정상적 이동에서 사용자의 움직임의 의미를 표현하기 위한 효과적인 방법은 로컬 기준 좌표와 관련하여 사용자의 상대적 변위를 보존하기 위해 아바타의 위치를 이동하는 것이다. 이를 실현하기 위해 정적 배치 직후에 사용자와 아바타의 주변 로컬 영역을 식별하고 두 영역 간의 관련성을 정의한다. 사용자가 로컬 영역 내에서 준정상적 움직임을 취할 경우, 사용자의 위치가 실시간으로 아바타의 해당 위치에 매핑된다.Explaining quasi-normal movement, people tend to change positions gradually rather than standing still, even when talking to the other person or performing tasks in a stationary position. The present invention calls this behavior q-quasi-normal movement. An effective way to express the meaning of the user's movement in quasi-normal movement is to move the position of the avatar to preserve the user's relative displacement with respect to the local reference coordinates. To realize this, we identify the surrounding local area of the user and avatar immediately after static placement and define the relationship between the two areas. When the user makes a quasi-normal movement within the local area, the user's location is mapped to the corresponding location of the avatar in real time.

도 10은 두 개의 로컬 영역을 만드는 절차에 대한 일 예시도를 나타낸 것으로, 도 10에서의 노란색은 사용자의 로컬 영역을 의미하고, 파란색은 아바타의 대응 영역을 의미한다. 첫째, 사용자가 일정한 거리 내에서 자유롭게 접근할 수 있는 로컬 영역은 사용자의 위치에서 선을 발사하여 구한다. 그러면 이 사용자의 로컬 영역 메쉬가 아바타의 기준 좌표 프레임으로 엄격하게 변환된다. 다음으로, 충돌 확인은 로컬 영역 메쉬의 정점과 아바타 공간의 장애물 사이에서 수행되며 충돌한 정점은 중앙을 향해 충돌 없는 위치로 이동하여 아바타의 로컬 영역 메쉬를 정의한다. 두 로컬 영역 지점 사이의 관련성은 로컬 영역 메쉬에 있는 삼각형의 양방향 좌표를 사용하여 결정된다. 본 발명의 방법에서 기준 프레임에 관한 사용자의 방향은 아바타의 방향과 직접적으로 적용된다. 로컬 영역의 반경은 서 있는 자세의 경우 0.7m, 앉은 자세의 경우 0에 가깝게 설정될 수 있다.10 is a diagram illustrating an example of a procedure for creating two local areas. In FIG. 10, yellow indicates the user's local area and blue indicates the corresponding area of the avatar. First, the local area that the user can freely access within a certain distance is obtained by emitting a line from the user's location. This user's local area mesh is then strictly transformed into the frame of reference coordinates of the avatar. Next, collision check is performed between the vertices of the local area mesh and the obstacles in the avatar space, and the collided vertices move toward the center to a collision-free position to define the local area mesh of the avatar. The relationship between two local area points is determined using the bidirectional coordinates of the triangles in the local area mesh. In the method of the present invention, the orientation of the user with respect to the frame of reference is applied directly to the orientation of the avatar. The radius of the local area may be set to 0.7 m for a standing posture and close to 0 for a sitting posture.

이동에 대해 설명하면, 사용자가 걷는 동안 아바타는 움직이는 이동 상태에 있는데, 이 상태에서 아바타는 사용자의 보행의 의미를 전달하기 위해 움직인다. 사용자의 목적지를 미리 알 수 있는 경우, 간단하지만 효과적인 해결책 중 하나는 아바타가 사용자의 목적지와 관련하여 상기 수학식 6에 의해 계산된 오브젝트 목적지로 이동하도록 하는 것이다. 그러나 사용자의 목적지를 정확하게 예측하는 것은 매우 어려우며, 따라서 본 발명에서는 세 가지 대안을 제공한다.In terms of movement, while the user is walking, the avatar is in a moving state of movement. In this state, the avatar moves to convey the meaning of the user's walking. If the user's destination is known in advance, one of the simple but effective solutions is to have the avatar move to the object destination calculated by Equation (6) in relation to the user's destination. However, it is very difficult to accurately predict the user's destination, so the present invention provides three alternatives.

첫 번째 방법인 WIP(Walk-In-Place)은 아바타를 상태가 준정상적 이동에서 이동으로 전환하는 장소에서 걷게 하는 것이다. 이것은 단순히 아바타가 이동 상태에 있다는 것을 나타내는 베이스라인 방법이다.The first method, Walk-In-Place (WIP), is to have the avatar walk in places where the state transitions from quasi-normal movement to movement. This is simply a baseline method of indicating that the avatar is in a mobile state.

WIP 방법은 사용자의 목적지 이동 경로에 대한 정보를 제공할 수 없다. 따라서, 본 발명은 사용자의 보행 경로의 의미를 반영하기 위해 아바타를 보행 사용자의 중간 위치에 해당하는 위치로 걷게 하는 두 번째 방법인 지속적인 경로 업데이트(CPU; Continuous Path Update)를 제공한다. 이를 위해, 경로에서 중간 위치의 특징은 위치로부터 전방 방향에 위치한 샘플 지점에 관해서 얻는다. 걸어다니는 사용자의 현재 배치 pX에 해당하는 샘플 위치 {pX1,...,pXs}와 아바타의 후보 배치 pX'에 해당하는 샘플 위치 {pX'1,...,pX's}를 가정하면, 두 장소의 차이점은 아래 <수학식 7>과 같이 정의될 수 있다.The WIP method cannot provide information about the user's destination movement path. Accordingly, the present invention provides a continuous path update (CPU), which is a second method for making the avatar walk to a position corresponding to the intermediate position of the walking user in order to reflect the meaning of the user's walking path. For this purpose, the characteristic of the intermediate position in the path is obtained with respect to the sample point located in the forward direction from the position. Assuming a sample position {pX1,...,pXs} corresponding to the current placement pX of the walking user and a sample position {pX'1,...,pX's} corresponding to the candidate placement pX' of the avatar, two locations can be defined as in <Equation 7> below.

[수학식 7][Equation 7]

그런 다음 아래 <수학식 8>과 같이 최적화 문제를 해결하여 아바타의 배치를 결정한다.Then, the avatar arrangement is determined by solving the optimization problem as shown in Equation 8 below.

[수학식 8][Equation 8]

여기서, pX'_prev는 이전 시간 단계에서 아바타의 배치 위치를 의미할 수 있다. 가중치 w(pX',pX'_prev)는 아바타의 경로의 연속성을 위해 이전 배치와 더 가깝게 최적의 배치를 결정하도록 하기 위해 pX'과 pX'_prev 사이의 위치와 방향 차이에 비례하여 설정된다.Here, pX' _prev may mean the placement position of the avatar in the previous time step. The weight w(pX',pX' _prev ) is set in proportion to the difference in position and orientation between _{pX' and pX' prev in} order to determine the optimal placement closer to the previous placement for the continuity of the avatar's path.

실시간 성능을 위해 샘플링 기반 최적화 계획을 취한다. 구체적으로, 본 발명은 전체 공간을 랜덤으로 넘나들 뿐 아니라 15도 간격으로 pX'_prev 주위 배치를 샘플링한다. 상기 수학식 8의 경로 업데이트는 일정 시간 예를 들어, 100ms마다 수행될 수 있다.Take a sampling-based optimization scheme for real-time performance. Specifically, the present invention not only randomly traverses the entire space but also samples the placement around _{pX' prev at 15 degree intervals.} The path update of Equation (8) may be performed for a predetermined time, for example, every 100 ms.

마지막 방법인 리디렉션(방향 수정)은 두 번째 방법과 비슷하지만, 이 방법에서는 아바타가 방향을 틀기만 하고 공간을 뛰어넘지 않도록 하기 위해 15도 간격으로 pX_'prev 주변을 샘플링할 뿐이다. 이 방법은 중간 경로 정보를 전달하기보다는 아바타의 자연스러운 이동을 보장한다.The last method, redirection (orientation correction), is similar to the second method, but in this method we only sample around _{pX 'prev at 15 degree intervals to ensure that the avatar only turns and does not jump out of space.} This method ensures a natural movement of the avatar rather than conveying intermediate route information.

상술한 모든 전략에서, 사용자의 포즈는 사용자의 아바타에 직접 적용되고, 따라서 아바타 근원의 바뀐 위치와 방향이 풋 스케이팅(foot skating)을 만든다. 풋 스케이팅 아티팩트를 제거하는 방법은 아바타 모션의 품질을 향상시킬 것이다. 본 발명의 준정상적 이동과 리디렉션 전략은 충돌이 없는 반면, 정적 배치와 연속 경로 업데이트 전략에 의해 유도된 텔레포트는 아바타가 방 안의 벽이나 가구를 통과할 수도 있다.In all of the strategies described above, the user's pose is applied directly to the user's avatar, so the altered position and orientation of the avatar's origin creates foot skating. A method of removing foot skating artifacts will improve the quality of the avatar motion. While the quasi-normal movement and redirection strategy of the present invention is collision-free, the teleport induced by the static placement and continuous path update strategy may cause the avatar to pass through walls or furniture in the room.

배치 알고리즘은 실시간 어플리케이션에 아바타를 즉시 배치하기 위해 빠르게 실행되어야 한다. 이를 위해 각 공간에 대해 x_pa, x_sp, 2D 장애물 메쉬와 같은 공간별 특징을 사전 계산한다. 온라인 과정에서는 정적 배치 및 이동 절차를 조작하기 위해 멀티스레딩(예를 들어, 16개의 스레드)을 사용할 수 있다. 각각의 정적 배치에는 0.4초(그리드 레벨 최적화의 경우 0.3초, PSO의 경우 0.1초)가 소요되며, 경로 최적화에는 약 100ms가 소요될 수 있다.The placement algorithm must run quickly to immediately place the avatar in a real-time application. For this purpose, spatial-specific features such as _{x pa} , x _{sp , and 2D obstacle meshes are precomputed for each space.} Online courses may use multithreading (eg 16 threads) to manipulate static placement and move procedures. Each static batch takes 0.4 seconds (0.3 seconds for grid level optimization, 0.1 seconds for PSO), and path optimization can take around 100ms.

이와 같이, 본 발명의 실시예에 따른 방법은 서로 다른 곳에 거주하는 사람이 마치 함께 생활하는 듯한 텔레프레즌스를 제공하기 위하여, 실제 공간에 위치한 사람의 맥락과 의미를 보존하는 원격 공간의 대응점에 아바타를 배치할 수 있다.As described above, in the method according to an embodiment of the present invention, in order to provide telepresence as if people living in different places are living together, avatars are arranged at corresponding points in a remote space that preserves the context and meaning of people located in the real space. can do.

또한, 본 발명의 실시예에 따른 방법은 원격 공간의 사용자들간 몰입적 소통을 가능하게 하므로, 직접 만나기 위해 이동하는 시간과 비용, 에너지 소비를 줄일 수 있고, 따라서 경제적, 환경적으로 이득이 높일 수 있다.In addition, since the method according to an embodiment of the present invention enables immersive communication between users in a remote space, it is possible to reduce the time, cost, and energy consumption to travel to meet in person, and thus increase economic and environmental benefits. have.

또한, 본 발명의 실시예에 따른 방법은 실제 공간에 아바타를 배치함으로써, 사용자의 실제 공간에서 원격 공간의 사용자와의 공존 및 상호작용을 가능하게 한다.In addition, the method according to an embodiment of the present invention enables coexistence and interaction with a user in a remote space in the user's real space by arranging the avatar in the real space.

이러한 본 발명의 방법은 장치 또는 시스템으로 구현될 수 있으며, 각각의 기능적인 과정이 기능적인 구성 수단으로 구성될 수 있다. 예를 들어, 본 발명의 일 실시예에 따른 장치는 사용자 선호도 조사를 통해 사람들이 선호하는 배치 데이터인 사용자 선호도 데이터를 수집하는 수집부, 상기 수집된 사용자 선호도 데이터를 심층 신경망 기반의 뉴럴 네트워크로 학습하여 두 공간에 대한 대응점 간의 사람과 아바타의 유사도를 정의하는 정의부, 및 상기 정의된 두 공간에 대한 대응점 간의 사람과 아바타의 유사도에 기초하여 사용자의 아바타를 원격 공간에 배치하는 배치부를 포함한다.The method of the present invention may be implemented as an apparatus or a system, and each functional process may be configured as a functional component. For example, the device according to an embodiment of the present invention has a collection unit that collects user preference data, which is placement data preferred by people through a user preference survey, and learns the collected user preference data with a deep neural network-based neural network. and a defining unit defining a degree of similarity between the person and the avatar between corresponding points in the two spaces, and a disposing unit disposing the user's avatar in a remote space based on the similarity between the person and the avatar between the defined corresponding points in the two spaces.

여기서, 상기 사용자 선호도 데이터는 사람과 아바타 간의 상대적인 거리와 시선 방향, 사람이 위치한 곳 주변의 높이 값, 사람이 위치한 곳 주변의 가구 종류 및 거리의 합, 사람의 정면 시야에 보이는 가구 종류 및 거리의 합, 사람의 착석여부 중 적어도 하나를 포함할 수 있다.Here, the user preference data includes the relative distance and gaze direction between the person and the avatar, the height value around the place where the person is located, the sum of the furniture types and distances around the place where the person is located, and the furniture types and distances seen in the front view of the person. It may include at least one of sum and whether a person is seated.

물론, 본 발명의 실시예들에 따른 장치는 상기 도 1 내지 도 10에서 설명한 모든 내용을 포함할 수 있으며, 이는 본 발명의 기술 분야에 종사하는 당업자에게 있어서 자명하다.Of course, the device according to the embodiments of the present invention may include all the contents described with reference to FIGS. 1 to 10 , which will be apparent to those skilled in the art.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 시스템, 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or apparatus described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, the systems, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA). ), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예들에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiments may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

collecting user preference data, which is placement data preferred by people, through a user preference survey;
defining a similarity between a person and an avatar between corresponding points for two spaces by learning the collected user preference data with a deep neural network-based neural network; and
arranging the user's avatar in the remote space based on the degree of similarity between the person and the avatar between corresponding points for the two defined spaces;
A remote space avatar placement method comprising:

According to claim 1,
The user preference data is
Among the relative distance and gaze direction between the person and the avatar, the height value around the place where the person is located, the sum of the furniture types and distances around the place where the person is located, the sum of the furniture types and distances seen in the front view of the person, and whether or not a person is seated A remote space avatar placement method comprising at least one.

According to claim 1,
The step of defining the similarity is
Feature modeling is performed, including spatial features specifying the interaction features between other people or objects, pose acceptance features for poses of people, and functional features of the surrounding space, and based on the feature modeling, it is possible to determine the difference between the arrangement features for two spaces. A remote space avatar arrangement method, characterized in that by learning the degree of similarity, the degree of similarity between the person and the avatar between the corresponding points in the two spaces is defined.

According to claim 1,
The neural network is
and a neural network for learning a nonlinear characteristic of a difference or distance between the two spaces.

According to claim 1,
The arranging step
A remote space, characterized in that the movement of the avatar is modeled as a finite state machine composed of three states of static placement, quasi-normal movement, and movement, and the user's avatar is placed in a remote space using the modeled finite state machine. How to place your avatar.

6. The method of claim 5,
The arranging step
When the arrangement of the avatar is finished, the arrangement state of the avatar is changed to a semi-normal moving state, the state of the avatar is changed to a moving state when the user starts walking, and when the user stops, the state of the avatar is changed to a static arrangement state A remote space avatar placement method, characterized in that switching to .

a collection unit for collecting user preference data, which is placement data preferred by people through a user preference survey;
a definition unit for learning the collected user preference data with a deep neural network-based neural network to define a degree of similarity between a person and an avatar between corresponding points for two spaces; and
An arrangement unit for disposing the user's avatar in a remote space based on the degree of similarity between the person and the avatar between corresponding points for the two defined spaces.
A remote space avatar placement device comprising a.

8. The method of claim 7,
The user preference data is
Among the relative distance and gaze direction between the person and the avatar, the height value around the place where the person is located, the sum of the furniture types and distances around the place where the person is located, the sum of the furniture types and distances seen in the front view of the person, and whether or not a person is seated A remote space avatar placement device comprising at least one.

8. The method of claim 7,
the definition section
Feature modeling is performed, including spatial features specifying the interaction features between other people or objects, pose acceptance features for poses of people, and functional features of the surrounding space, and based on the feature modeling, it is possible to determine the difference between the arrangement features for two spaces. A remote space avatar arrangement apparatus, characterized in that by learning the degree of similarity, the degree of similarity between the person and the avatar between the corresponding points for the two spaces is defined.

8. The method of claim 7,
The neural network is
and a neural network for learning a nonlinear characteristic of a difference or distance between the two spaces.

8. The method of claim 7,
the arrangement part
A remote space, characterized in that the movement of the avatar is modeled as a finite state machine composed of three states of static placement, quasi-normal movement, and movement, and the user's avatar is placed in a remote space using the modeled finite state machine. Avatar placement device.

12. The method of claim 11,
the arrangement part
When the arrangement of the avatar is finished, the arrangement state of the avatar is changed to a semi-normal moving state, the state of the avatar is changed to a moving state when the user starts walking, and when the user stops, the state of the avatar is changed to a static arrangement state A remote space avatar placement device, characterized in that it converts to .