KR20220122349A

KR20220122349A - Cloud-based metaverse content collaboration system

Info

Publication number: KR20220122349A
Application number: KR1020210026868A
Authority: KR
Inventors: 이준; 조은상; 이성찬
Original assignee: 주식회사 야타브엔터
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2022-09-02
Also published as: KR102531789B1

Abstract

Disclosed is a cloud-based metaverse content collaboration system. The cloud-based metaverse content collaboration system includes a behavior recognition algorithm that can analyze a structural feature of a person when recognizing a motion of a user and accordingly recognize and remember motion actions.

Description

Cloud-based metaverse content collaboration system

본 발명은 클라우드 기반 메타버스 콘텐츠 협업 시스템에 관한 것이다.The present invention relates to a cloud-based metaverse content collaboration system.

메타버스(Metaverse)는 가상·초월(meta)과 세계·우주(universe)의 합성어로, 3차원 가상 세계를 뜻한다. 보다 구체적으로는, 정치·경제·사회·문화의 전반적 측면에서 현실과 비현실 모두 공존할 수 있는 생활형·게임형 가상 세계라는 의미로 폭넓게 사용되고 있다. 이러한 메타버스는 현실의 동작이 가상세계에 반영되고, 가상세계의 결과가 가상 세계에 적용이 되기 때문에 현실과 가상세계의 경계가 허물어 지는 환경을 의미한다. 이러한 메타버스를 구현하는 가장 핵심 기술인 가상현실 기술은 1970년대부터 사용자가 가상현실세계에서 들어가서 몰입하여 특정한 작업을 할 수 있는 기술로 적용되다가 1990년도부터 현실 세계에 가상의 기술들을 가져오는 증강현실 기술 등으로 분할이 되었다. 이후 이러한 메타버스 콘텐츠와 같은 기술들이 적용이 되면서 상황에 따라서 가상현실이 될 수도 증강현실이 될 수도 있는 혼합 현실 환경 및 여기에 AI 기술과 서비스가 들어간 확장현실 환경으로 적용이 되었다. Metaverse is a compound word of virtual/transcendent (meta) and world/universe, meaning a three-dimensional virtual world. More specifically, it is widely used in the meaning of a life-type and game-type virtual world where both real and unreal can coexist in the overall aspects of politics, economy, society, and culture. This metaverse refers to an environment in which the boundary between the real and the virtual world is broken because the motions of the real world are reflected in the virtual world and the results of the virtual world are applied to the virtual world. Virtual reality technology, the most core technology that implements this metaverse, has been applied as a technology that allows users to enter and immerse in the virtual reality world and perform specific tasks since the 1970s. It has been divided into After that, as technologies such as metaverse contents were applied, it was applied to a mixed reality environment that could be virtual reality or augmented reality depending on the situation, and an extended reality environment with AI technology and services.

이러한 가상현실 및 증강현실 기술과 기존의 온라인 게임 및 SNS 형태의 결합이 된 솔루션으로 메타버스 솔루션들이 최근에는 많이 출시되고 있다. 메타 버스 솔루션은 크게 온라인 게임에서 메타버스가 된 솔루션인 포트나이트[5], 로블록스, 마인크래프트 같은 솔루션들 및 SNS 및 교류 중심에서 게이미피케이션 콘텐츠를 담은 VRChat 및 네이버의 제페토와 같은 솔루션들이 있다. 이중 포트나이트는 네트워크기반 FPS 게임의 스페셜 모드로 사용자들이 모여서 춤을 추고 대화를 할 수 있는 게임모드를 출시하고 이에 기반하여 사용자들이 가상으로 콘서트를 즐길 수 있는 모드를 출시하였고 ‘트래비스 스콧’의 콘서트에는 천 만명 넘는 사람들이 참여해서 같이 메타버스 콘서트를 즐기는 모습이 구성되었다.Recently, many metaverse solutions are being released as solutions that combine these virtual and augmented reality technologies with existing online games and SNS. Metaverse solutions include solutions such as Fortnite [5], Roblox, and Minecraft, which are solutions that have become metaverses in online games, and solutions such as VRChat and Naver's ZEPETO, which contain gamification content in the center of social media and exchanges. have. Among them, Fortnite is a special mode of a network-based FPS game, and released a game mode where users can gather to dance and talk, and based on this, a mode where users can enjoy a virtual concert was released. More than 10 million people participated and enjoyed the metaverse concert together.

메타버스를 구성하는 솔루션들은 크게 현실 세계의 사용자의 동작을 모사해서 가상 세계의 아바타로 옮기는 부분, 네트워크를 통하여 가상 세계의 사용자들끼리 협업을 할 수 있는 시스템의 제공 등으로 이루어진다. 이러한 메타버스의 가장 기술적인 근간이 되는 확장현실 환경은 실세계에 가상의 객체를 증강하여 사용자에게 새로운 경험을 제공할 수 있는 기술로써 게임 분야뿐만 아니라 산업 분야 및 시뮬레이션 교육 등에 널리 이용되어 왔다. 특히, 각각 가상현실, 증강현실 등에 게임 및 시뮬레이터 등의 콘텐츠들이 많이 출시되었다. 가상현실 콘텐츠의 경우 “Beat Saber”는 많은 사람이 가상현실을 통한 게임의 재미와 기존에는 비용적인 측면이나 기술적인 한계로 인해 쉽게 경험할 수 없는 가상현실에 대한 경험을 제공한 대표적인 사례 중 하나로 손꼽힌다. 도 1은 대중적인 가상현실 게임 중 하나인 “Beat Saber”의 플레이 예시이다. 또한 가상현실에서 운동과 게임을 접목한 2019년 최초의 가상현실 헬스장인 “Black Box VR”이 출시되었고, 반복적인 행동을 반복해야 하는 무산소 운동의 지루함을 게임을 통해 재밌게 해결하는 게이미피케이션의 장점을 선보였다. 하지만 이런 가상현실 기반의 콘텐츠들은 가상현실의 디바이스에 종속적인 경향을 가지며, 사용자의 모든 동작을 측정하는 것이 아니라 제한된 동작 (HMD를 착용한 상태에서 머리 및 컨트롤러에서 손의 3차원 정보들을 추적)을 인식할 수 있기 때문에 사용자들에게 완전한 자유도를 제공하지 못한다는 단점이 있다.The solutions constituting the metaverse largely consist of mimicking the actions of users in the real world and transferring them to avatars in the virtual world, and providing a system that enables users in the virtual world to collaborate through a network. The extended reality environment, which is the most technical basis of the metaverse, is a technology that can provide a new experience to users by augmenting virtual objects in the real world, and has been widely used not only in the game field but also in the industrial field and simulation education. In particular, a lot of contents such as games and simulators have been released in virtual reality and augmented reality, respectively. In the case of virtual reality content, “Beat Saber” is one of the representative examples of providing an experience of virtual reality that many people cannot easily experience due to the fun of the game through virtual reality and the conventional cost or technical limitations. . 1 is a play example of “Beat Saber,” which is one of the popular virtual reality games. In addition, “Black Box VR,” the first virtual reality gym in 2019 that combines exercise and games in virtual reality, was released. showed off However, these virtual reality-based contents tend to be dependent on the device of virtual reality, and do not measure all the user's movements, but restrict movement (tracking 3D information of the hand from the head and the controller while wearing an HMD). Since it can be recognized, there is a disadvantage in that it does not provide complete freedom to users.

기존의 가상현실 기기에서 사용자의 동작을 인식할 수 있는 기술들에 대한 연구들은 주로 영화촬영에서 사용되는 모션 트래킹 기술을 사용하여 구현되어왔다. 모션 트래킹 기술은 다음과 같이 사용자가 적외선으로 반사되는 마커들을 사용자의 관절 부분에 부착하고, 여러 대의 적외선 카메라 공간을 돌아다니면서 동작을 하는 경우 마커들을 통해서 인식된 정보들을 사람의 주요 관절 부위와 매칭을 하여 실시간으로 가상의 캐릭터에 동작은 인식시켜주는 기술로 사용되어 왔으며, 정확하고 빠른 측정 방법으로 인해 널리 사용되어 왔다. 하지만 이 방법은 사용자가 특정한 마커를 항상 몸에 부착하여 인식을 할 수밖에 없다는 점에서 사용성이 떨어지고 장비들이 고가이기 때문에 영화 산업 등 특수한 분야를 제외하고 대중화되기 어렵다는 특징을 가지고 있다.Studies on technologies that can recognize user's motion in existing virtual reality devices have been implemented using motion tracking technology mainly used in movie shooting. The motion tracking technology matches the information recognized through the markers with the main joint parts of a person when the user attaches the infrared-reflected markers to the user's joints and moves around the space of several infrared cameras as follows. Therefore, it has been used as a technology to recognize the motion of a virtual character in real time, and has been widely used due to its accurate and fast measurement method. However, this method has a characteristic that it is difficult to popularize except in special fields such as the movie industry because the usability is inferior in that the user has no choice but to always attach a specific marker to the body to recognize it, and the equipment is expensive.

이러한 특정 장치를 사용자의 몸에 부착하여 사용자의 몸 동작을 추적하는 방법 중 HTC Vive 사의 Full Body Tracking은 비교적 저렴한 가격으로 일반 사용자들이 모셥 캡쳐 및 동작을 인식할 수 있다는 점에서 메타버스 시스템에서 많이 사용되어 왔다. 이 장치의 특징은 사용자가 사용자는 지정된 공간에 HMD 및 트래커들을 몸에 부착하고 부착된 트래커와 가상 세계의 아바타의 특정 관절 (Joint) 분야에 부착을 하여 현실세계의 움직임을 가상세계에 동기화를 한다. 이러한 기술은 메타버스 중 VR Chat 과 같은 솔루션과 연동이 된다. 하지만 사용자의 몸동작을 인식하는 트래커들 중 일부만을 부착하였기 때문에 정확도가 많이 떨어지며, 이 장비를 사용하는 것 역시 다소 비용이 들고 사용성이 떨어진다는 특징이 있다.Among the methods of tracking the user's body movement by attaching such a specific device to the user's body, HTC Vive's Full Body Tracking is widely used in the metaverse system in that it allows general users to recognize the capture and movement of the mouse at a relatively low price. has been The feature of this device is that the user attaches the HMD and trackers to the body in a designated space and synchronizes the movement of the real world with the virtual world by attaching the attached tracker to a specific joint area of the avatar of the virtual world. . These technologies are linked with solutions such as VR Chat among metaverses. However, since only some of the trackers that recognize the user's body movements are attached, the accuracy is greatly reduced, and using this equipment is also somewhat costly and has a low usability.

한편, 증강현실 분야에서의 게임의 대표적으로는 “포켓몬고"라는 게임이 있다. 이 게임은 사용자가 핸드폰을 사용하여 외부를 다니면 사용자의 위치를 GPS로 추적하고 몬스터가 출몰하거나 전투 등이 발생하면 핸드폰을 사용하여 증강현실로 몬스터를 시각화하고 터치를 사용하여 게임을 할 수 있도록 제공한다. 또한 ”Roboraid“라는 게임은 증강현실 환경에서 외계인과의 전투를 하는 게임을 제안하였다.On the other hand, a representative game in the field of augmented reality is a game called “Pokemon Go.” This game tracks the user's location with GPS when the user goes outside using a mobile phone, and when a monster appears or a battle occurs, Visualize monsters in augmented reality using a mobile phone and play a game using touch, and a game called “Roboraid” is a game that fights aliens in an augmented reality environment.

이러한 종류의 증강현실 게임은 역시 제한된 디바이스의 제약이라는 단점이 있다. 스마트폰 사용한 증강현실 게임들의 경우에는 사용자가 증강된 가상 게임 객체와의 인터랙션을 위해서는 화면을 터치하는 동작을 통해서 게임을 플레이할 수 있도록 제공하고 있다. 이 경우 사용자는 한 손으로는 스마트폰을 잡고 다른 한 손으로 터치를 해야 하기 때문에 사용자가 입력할 수 있는 입력의 제한이 존재한다. 디바이스의 상호작용 제약은 사용자들에게 더 현실세계와 더 유사하여 실감나는 게임을 제공해 주지 못한다는 단점을 가지고 있다. 이러한 문제를 해결하기 위해서 사용자가 특수한 터치 컨트롤러를 지원하고 있다. 하지만 이러한 경우에는 사용자가 계속적으로 무거운 컨트롤러를 잡기 때문에 사용자에게 피로도를 증가시킨다는 점과, 이러한 컨트롤러를 잡는 조작 방식은 주로 게임 환경이 고정된 가상현실 환경에서 게임에서 주로 통용된다는 점이 있다.This kind of augmented reality game also has the disadvantage of being limited by limited devices. In the case of augmented reality games using a smartphone, the user can play the game by touching the screen in order to interact with the augmented virtual game object. In this case, since the user has to hold the smartphone with one hand and touch it with the other hand, there is a limit to the input that the user can input. The device's interaction restrictions have a disadvantage in that it does not provide users with a realistic game because it is more similar to the real world. To solve this problem, the user supports a special touch controller. However, in this case, the user continuously grips the heavy controller, which increases the user's fatigue, and the control method of holding the controller is mainly used in games in a virtual reality environment in which the game environment is fixed.

혼합 현실에서 사용자의 몸동작을 추적하기 위해서 IMU와 같은 센서들을 사용해서 동작을 추적하는 연구들도 제안되었다. 하지만 이 방법의 경우에도 역시 사용자는 여러 개의 복잡한 장비들을 부착한다는 점에서 사용성이 떨어지고 센서들에 대한 가격이 비싼 어려움을 가지고 있다.In order to track user's body motion in mixed reality, studies using sensors such as IMUs to track motion have also been proposed. However, even in this method, the user has difficulty in attaching several complex devices, which makes usability less useful and the sensors are expensive.

이러한 문제를 해결하기 위해서 센서를 부착하지 않고 카메라를 사용하여 사용자의 동작을 추적하는 방법도 잇다 대표적인 방법으로는 깊이 카메라를 사용하여 사용자의 몸동작 이미지에서 동작을 추출해서 가상현실에 적용한 사례들이다. 하지만 이 방법의 경우 사용자의 동작이 조금만 복잡해지거나 물건들과 상호작용을 하는 경우에는 인식이 잘되지 않는 다는 단점이 있다. 또한 Depth 카메라의 경우 적외선 추적이 어려운 야외 등에서는 인식이 어렵다는 점과 일반 카메라를 통해서는 인식이 어렵다는 단점을 가지고 있다. 이러한 문제를 해결하기 위해서 딥러닝 기반에 사용자의 동작을 인식할 수 있는 연구들이 이루어졌다. “Open Pose”는 2D 사람의 동작을 인식할 수 있는 연구들을 만들고 다수의 사용자들에 대한 추적도 원활하게 만들었다 하지만 2D 로 인식된 사람의 동작이기 때문에 이를 3D로 다시 변환하는 과정에서는 일부 측정된 관절 정보들의 정확도가 떨어지는 부분과 사용자들이 상호작용을 하는 과정에서 겹치는 문제들이 있다는 단점이 있다. “Deeplabcut”은 움직이는 사람에 대한 인식 연구들을 수행하였으나, 역시 가려지거나 겹쳐지는 부분들에 대해서는 인식율이 다소 낮다는 문제가 있다. 또한 사용자의 동작 인식을 할 때, 실시간으로 해당 사용자의 관절만 측정을 하기 때문에 사용자가 의미를 가진 동작 즉 특정 제스처들을 할 때는 시간의 흐름에 따라서 사용자의 동작을 추적하고, 인식할 수 있는 방법들이 필요하다. 한편 이를 3D 캐릭터에 적용시키는 시도는 XNect에서 이루어졌다. 해당 시스템은 모션으로 캡쳐된 정보를 3D 캐릭터에 적용을 하여 메타버스연결에 대한 가능성을 보여주었으나 캐릭터들이 겹쳐지거나 물체 등에 동작들이 가려지는 문제점들을 보여주었다.To solve this problem, there is a method that uses a camera to track the user's motion without attaching a sensor. A representative method is the case of using a depth camera to extract motion from a user's body motion image and apply it to virtual reality. However, in the case of this method, there is a disadvantage that recognition is not good when the user's actions are slightly complicated or when interacting with objects. In addition, in the case of a depth camera, it has disadvantages that it is difficult to recognize in the outdoors, where infrared tracking is difficult, and that it is difficult to recognize through a general camera. In order to solve this problem, studies that can recognize the user's motion based on deep learning have been conducted. “Open Pose” made studies that can recognize 2D human motion and facilitated tracking for multiple users. There is a disadvantage that there are overlapping problems in the process of user interaction with the part where the accuracy of information is lowered. “Deeplabcut” conducted recognition studies on moving people, but there is a problem that the recognition rate is somewhat low for parts that are also covered or overlapped. In addition, when the user's motion is recognized, only the user's joints are measured in real time, so when the user performs meaningful actions, that is, specific gestures, there are ways to track and recognize the user's motion according to the passage of time. need. Meanwhile, an attempt to apply this to 3D characters was made in XNect. The system showed the possibility of metaverse connection by applying the information captured in motion to 3D characters, but it also showed problems in that characters overlapped or motions were obscured by objects.

또한 기존의 메타버스 콘텐츠들은 Vive 트래커를 지원하는 VR Chat을 제외하고는 대부분 키보드나 마우스로 캐릭터를 조작하는 기술들을 제공하고 있기 때문에 사용자들이 자신의 몸동작들이 가상 세계에 반영되어 현실과 가상의 경계가 모호해지는 메타버스 콘텐츠들을 즐기기 어렵다는 문제들이 있다.In addition, since most existing metaverse contents provide technologies to manipulate characters with a keyboard or mouse, except for VR Chat that supports Vive trackers, users can see their body movements reflected in the virtual world, so that the boundary between reality and virtual reality is blurred. There are problems that it is difficult to enjoy the obscure metaverse contents.

본 시스템에서 제안하는 딥러닝 기반 동작인식을 통한 메타버스 동기화 협업 시스템에서는 다음의 단계별로 기존의 문제들의 해결을 제안한다. 1) 사용자의 신체적인 구조의 특징에 따라 상체와 하체의 동작들로 분류하고 시간의 흐름에 따른 동작 인식을 하기 위한 Attentional GRU-RNN 방법을 적용하였다. 2) 사용자가 특정 물체를 잡거나 사용자간 복잡한 동작을 수행하는 경우 사용자의 동작과 상호작용 객체의 분류 및 간섭을 분리하기 위한 Complex Gesture-GAN 네트워크를 적용하였다. 3) 마지막으로 사용자가 클라이언트에서 동작인식을 하는 경우 이를 네트워크를 통해서 인식하여 제공을 해주는 클라우드 기반의 동작 인식 및 메타버스 적용 시스템을 제공한다.In the metaverse synchronization collaboration system through deep learning-based motion recognition proposed in this system, we propose to solve existing problems in the following steps. 1) The Attentional GRU-RNN method was applied to classify the upper and lower body movements according to the characteristics of the user's physical structure, and to recognize the movements according to the passage of time. 2) Complex Gesture-GAN network is applied to separate the user's motion and the classification and interference of the interaction object when the user grabs a specific object or performs a complex action between users. 3) Finally, when the user recognizes motion in the client, it provides a cloud-based motion recognition and metaverse application system that recognizes and provides it through the network.

본 발명에 따른 클라우드 기반 메타버스 콘텐츠 협업 시스템은 사용자의 동작을 인식할 때 사람의 구조적인 특징을 분석하고 이에 따른 동작 행위들을 인식하고 기억할 수 있는 행동 인식 알고리즘을 포함한다.The cloud-based metaverse content collaboration system according to the present invention includes a behavior recognition algorithm capable of analyzing structural characteristics of a person when recognizing a user's motion, and recognizing and remembering motion actions accordingly.

또한, 사람의 척추, 상체 및 하체에 대한 동작의 변위들에 대한 부분을 어텐션으로 처리하여 해당 값에 대한 출력을 예측하기 위해서 어텐션 메커니즘을 통해서 관리할 수 있다.In addition, in order to predict an output of a corresponding value by processing a portion of displacements of motion with respect to the human spine, upper body, and lower body as attention, it can be managed through an attention mechanism.

또한, 전단계에서 학습된 데이터들을 바탕으로 사용자의 의미 있는 제스처를 분별해내고, 물리적인 객체 추적을 한 후에 이에 대한 구분을 할 수 있는 Complex Gesture GAN알고리즘을 포함할 수 있다.In addition, it may include a Complex Gesture GAN algorithm that can discriminate the user's meaningful gestures based on the data learned in the previous step, and distinguish them after tracking a physical object.

본 발명의 효과는 크게 3가지로 이루어질 수 있다. 첫 번째는 기존에 메타버스 시스템에서 사용자의 동작인식이 안되거나 비싼 비용을 통해서 인식이 이루어지는 문제를 해결할 수 있다. 본 발명에서 제안한 시스템을 사용하면 수천만원에서 수백만원 드는 모션 캡쳐 장치를 사용할 필요 없이 몇 만원 상당의 카메라를 사용하여 사용자의 동작을 인식하여 메타버스의 아바타에 대응을 할 수 있기 때문에 시스템에 드는 비용을 감소할 수 있다. The effect of the present invention can be largely composed of three types. First, it is possible to solve the problem that the user's motion is not recognized in the existing metaverse system or is recognized through high cost. Using the system proposed in the present invention, it is possible to respond to the avatar of the metaverse by recognizing the user's motion using a camera worth tens of thousands of won, without the need for a motion capture device, which costs tens of thousands to millions of won, so the cost of the system can be reduced.

두 번째로는 기본의 동작 인식 시스템이 사용자에 몸에 모션 캡쳐 센서 및 트래커 등의 장치를 부착해야 하는 점에서 사용성이 다소 떨어지는 문제가 있었다면 본 발명에서 제안하는 시스템은 웹캠 만 가지고 있다면 동작 인식이 가능하다는 점에서 사용성을 대폭적으로 개선할 수 있다.Second, if the basic motion recognition system had a problem that the usability was somewhat inferior in that the user had to attach a motion capture sensor and a device such as a tracker to the body, the system proposed in the present invention can recognize motion only if it has a webcam. In that sense, usability can be significantly improved.

세 번째로는 기존의 카메라 기반의 동작 인식 기술과 비교를 하였을 때, 동작 인식을 통한 사용자 스켈레톤 추출 따로, 제스처를 따로 인식하던 것에 비해 본 발명에서는 이를 동시에 추출을 함으로써 사용자가 의미 있는 동작들을 메타버스 콘텐츠를 사용하여 다른 사용자들과 교류하는데 도움을 줄 수 있다.Third, when compared with the existing camera-based gesture recognition technology, the present invention simultaneously extracts the user's meaningful gestures in the metaverse compared to extracting the user's skeleton through gesture recognition and separately recognizing the gestures. You can use content to help you connect with other users.

마지막으로는 본 발명에서 제안한 시스템은 사용자가 몸동작뿐만 아니라 홈트레이닝 중 아령 등과 같은 물체를 잡고 동작하는 과정에서 발생할 수 있는 사람 몸의 가려지는 등의 정확도 문제를 GRU_RNN 과 Complex Gesutre GAN 알고리즘을 통해서 해결하였기 때문에 보다 복잡한 메타버스 콘텐츠를 사용자가 만드는데 더 유용하게 적용될 수 있다.Finally, the system proposed in the present invention solves the accuracy problems such as occlusion of the human body that may occur in the process of holding and operating objects such as dumbbells during home training as well as body movements by the user through the GRU_RNN and Complex Gesutre GAN algorithms. Therefore, it can be applied more usefully to users to create more complex metaverse content.

도 1은 대중적인 가상현실 게임 중 하나인 “Beat Saber”의 플레이 예시이다.
도 2는 본 발명에 따른 클라우드 기반 메타버스 콘텐츠 협업 시스템을 나타내는 도면이다.
도 3은 GRU_RNN을 적용한 사용자의 동작 추적 시스템을 설명하기 위한 도면이다.
도 4는 동작들의 Attention을 추측하여 결합하는 과정을 설명하기 위한 도면이다.
도 5는 Complex-Gesutre GAN을 적용한 인식 방법을 설명하기 위한 도면이다.
도 6 및 도 7은 본 발명에서 제안된 시스템을 통해서 추출된 동작 정보를 메타버스 콘텐츠에 적용시킨 장면을 나타내는 도면이다.
도 8은 메타버스 헬스 콘텐츠를 설명하기 위한 도면이다.1 is a play example of “Beat Saber,” which is one of the popular virtual reality games.
2 is a diagram illustrating a cloud-based metaverse content collaboration system according to the present invention.
3 is a diagram for explaining a user's motion tracking system to which GRU_RNN is applied.
4 is a diagram for explaining a process of guessing and combining the attention of the operations.
5 is a diagram for explaining a recognition method to which Complex-Gesutre GAN is applied.
6 and 7 are diagrams illustrating scenes in which motion information extracted through the system proposed in the present invention is applied to metaverse content.
8 is a diagram for explaining metaverse health contents.

이하, 첨부된 도면들을 참조하여 본 발명의 바람직한 실시 예를 상세히 설명할 것이다. 그러나 본 발명의 기술적 사상은 여기서 설명되는 실시 예에 한정되지 않고 다른 형태로 구체화될 수도 있다. 오히려, 여기서 소개되는 실시 예는 개시된 내용이 철저하고 완전해질 수 있도록 그리고 당업자에게 본 발명의 사상이 충분히 전달될 수 있도록 하기 위해 제공되는 것이다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the technical spirit of the present invention is not limited to the embodiments described herein and may be embodied in other forms. Rather, the embodiments introduced herein are provided so that the disclosed content may be thorough and complete, and the spirit of the present invention may be sufficiently conveyed to those skilled in the art.

본 명세서에서, 어떤 구성요소가 다른 구성요소 상에 있다고 언급되는 경우에 그것은 다른 구성요소 상에 직접 형성될 수 있거나 또는 그들 사이에 제 3의 구성요소가 개재될 수도 있다는 것을 의미한다. 또한, 도면들에 있어서, 막 및 영역들의 두께는 기술적 내용의 효과적인 설명을 위해 과장된 것이다. In this specification, when a component is referred to as being on another component, it may be directly formed on the other component or a third component may be interposed therebetween. In addition, in the drawings, the thicknesses of the films and regions are exaggerated for effective description of technical contents.

또한, 본 명세서의 다양한 실시 예 들에서 제1, 제2, 제3 등의 용어가 다양한 구성요소들을 기술하기 위해서 사용되었지만, 이들 구성요소들이 이 같은 용어들에 의해서 한정되어서는 안 된다. 이들 용어들은 단지 어느 구성요소를 다른 구성요소와 구별시키기 위해서 사용되었을 뿐이다. 따라서, 어느 한 실시 예에 제 1 구성요소로 언급된 것이 다른 실시 예에서는 제 2 구성요소로 언급될 수도 있다. 여기에 설명되고 예시되는 각 실시 예는 그것의 상보적인 실시 예도 포함한다. 또한, 본 명세서에서 '및/또는'은 전후에 나열한 구성요소들 중 적어도 하나를 포함하는 의미로 사용되었다.Also, in various embodiments of the present specification, terms such as first, second, third, etc. are used to describe various components, but these components should not be limited by these terms. These terms are only used to distinguish one component from another. Accordingly, what is referred to as a first component in one embodiment may be referred to as a second component in another embodiment. Each embodiment described and illustrated herein also includes a complementary embodiment thereof. In addition, in this specification, 'and/or' is used in the sense of including at least one of the elements listed before and after.

명세서에서 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함한다. 또한, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 구성요소 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 구성요소 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하는 것으로 이해되어서는 안 된다. 또한, 본 명세서에서 "연결"은 복수의 구성 요소를 간접적으로 연결하는 것, 및 직접적으로 연결하는 것을 모두 포함하는 의미로 사용된다. In the specification, the singular expression includes the plural expression unless the context clearly dictates otherwise. In addition, terms such as "comprise" or "have" are intended to designate that a feature, number, step, element, or a combination thereof described in the specification exists, and one or more other features, numbers, steps, or configurations It should not be construed as excluding the possibility of the presence or addition of elements or combinations thereof. In addition, in this specification, "connection" is used in a sense including both indirectly connecting a plurality of components and directly connecting a plurality of components.

또한, 하기에서 본 발명을 설명함에 있어 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략할 것이다.In addition, in the following description of the present invention, if it is determined that a detailed description of a related well-known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted.

도 2는 본 발명에 따른 클라우드 기반 메타버스 콘텐츠 협업 시스템을 나타내는 도면이다.2 is a diagram illustrating a cloud-based metaverse content collaboration system according to the present invention.

도 2를 참조하면, 본 발명에 따른 클라우드 기반 메타버스 콘텐츠 협업 시스템은 서버 및 클라이언트 2가지 환경으로 구성된다. 먼저 서버에서는 딥러닝 기반의 사용자 동작인식 및 메타버스 아바타 변환 시스템과 메타버스로 인식된 사용자들 간의 상호작용을 사용자의 영역 및 태스크 별로 처리해주는 부분 및 사용자의 클라이언트 요청에 따라 상호작용 콘텐츠들을 적응형 변환을 통해서 AR 혹은 VR 로 콘텐츠를 변환시켜주는 부분으로 구성되어 있다. 한편 사용자가 사용하는 클라이언트에서는 기본적으로 RGB 카메라를 사용하여 사용자의 동작을 인식하여 서버로 전송하는 부분 및 사용자가 사용하는 환경에 따라 VR이나 AR 등으로 메타버스를 제공하는 뷰어로 구성되어 있다.Referring to FIG. 2 , the cloud-based metaverse content collaboration system according to the present invention consists of two environments: a server and a client. First, in the server, the deep learning-based user motion recognition and metaverse avatar transformation system and the part that processes the interaction between users recognized as metaverse by user area and task, and interactive contents according to the user's client request, are adaptively processed. It consists of a part that converts content into AR or VR through conversion. On the other hand, the client used by the user basically consists of a part that recognizes the user's motion using an RGB camera and transmits it to the server, and a viewer that provides a metaverse in VR or AR depending on the user's environment.

먼저 딥러닝 기반의 사용자 동작인식 및 메타버스 아바타 변환 시스템은 두 단계로 딥러닝 기반의 인식을 수행한다. 여기서는 사용자의 동작을 인식할 때 사람의 구조적인 특징을 분석하고 이에 따른 동작 행위들을 인식하고 기억할 수 있는 행동 인식 알고리즘을 개발하였다. 사람의 구조적인 특징은 척추를 중심으로 머리는 위로 향하고 양팔이 상체, 척추를 중심으로 양 발이 하체에 있다는 것이다. 이런 사람의 동작을 인식하기 위해서는 일정시간 사람의 동작들의 변화 정보들을 관측하여 동작을 인식할 수 있어야함 따라서 이전 시간의 동작 정보들을 지속적으로 기억할 수 있어야 하고, 동작들의 변화가 빠르기 때문에 학습을 효과적으로 빠르게 할 수 있어야한다. 이를 위하여 GRU(Gated Recurrent Unit)-RNN(Recurrent Neural Network)을 사용한다.First, the deep learning-based user motion recognition and metaverse avatar transformation system performs deep learning-based recognition in two steps. Here, we developed a behavior recognition algorithm that can recognize and remember behaviors by analyzing the structural characteristics of a person when recognizing a user's motion. The structural characteristic of a person is that the head is turned upward with the spine as the center, the arms are on the upper body, and the feet are on the lower body with the spine in the center. In order to recognize such a human motion, it must be possible to recognize the motion by observing the change information of the human motion for a certain period of time. should be able to For this, a Gated Recurrent Unit (GRU)-RNN (Recurrent Neural Network) is used.

도 3은 GRU_RNN을 적용한 사용자의 동작 추적 시스템을 설명하기 위한 도면이다.3 is a diagram for explaining a user's motion tracking system to which GRU_RNN is applied.

도 3을 참조하면, 인식된 사람에 대한 영상정보를 바탕으로 스켈레톤 정보를 인식하면서 해부학적인 구조에 기반을 두어 소동물의 경우 척추의 움직임을 기본으로 하여 다리의 움직임들이 나오기 때문에 동물의 동작 정보들에 대한 정보를 각각 GRU-RNN으로 학습을 하여 최종적으로 동적인 움직임을 인식할 수 있다.Referring to FIG. 3 , while recognizing skeleton information based on image information about a recognized person, based on the anatomical structure, in the case of small animals, the movement of the legs is based on the movement of the spine, so the motion information of the animal is By learning the information on each GRU-RNN, it is possible to finally recognize a dynamic movement.

이때 사람의 척추, 상체 및 하체에 대한 동작의 변위들에 대한 부분을 어텐션으로 처리하여 해당 값에 대한 출력을 예측하기 위해서 어텐션 메커니즘을 통해서 관리를 한다. At this time, in order to predict the output of the corresponding value by processing the part about the displacement of the human spine, the upper body and the lower body as attention, it is managed through the attention mechanism.

도 4는 동작들의 Attention을 추측하여 결합하는 과정을 설명하기 위한 도면이다.4 is a diagram for explaining a process of guessing and combining the attention of the operations.

도 4를 참조하면, 어텐션 메커니즘에서는 시간의 흐름에 따른 동작의 결과 예측에 또 다른 값을 필요로 하는데 바로 어텐션 값(Attention Value)이라는 새로운 값이다. t번째 단어를 예측하기 위한 어텐션 값을 at이라고 정의할 수 있다. 이 at 값을 구하기 위해서 어텐션 스코어라는 값을 사용한다. 이 값은 현재 포즈 시점 t에서 다음 포즈를 예측하기 위해, 인코더의 모든 은닉 상태 각각이 디코더의 현 시점의 은닉 상태 st와 얼마나 유사한지를 판단하는 스코어이다. 닷-프로덕트 어텐션에서는 이 스코어 값을 구하기 위해 st를 전치(transpose)하고 각 은닉 상태와 내적(dot product)을 수행한다. 즉, 모든 어텐션 스코어 값은 스칼라입니다. 예를 들어 st과 인코더의 i번째 은닉 상태의 어텐션 스코어의 계산 방법은 아래와 같다. st 와 인코더의 모든 은닉 상태의 어텐션 스코어의 모음값을 et라고 정의하겠습니다. e^t의 수식 1은 다음과 같다.Referring to FIG. 4 , the attention mechanism requires another value to predict the result of an operation over time, which is a new value called an attention value. An attention value for predicting the t-th word may be defined as at. To obtain this at value, a value called attention score is used. This value is a score for determining how similar each of all the hidden states of the encoder is to the hidden state st of the current time of the decoder in order to predict the next pose at the current pause time t. In dot-product attention, to obtain this score value, st is transposed and a dot product is performed with each hidden state. That is, all attention score values are scalars. For example, the method of calculating the attention score of st and the i-th hidden state of the encoder is as follows. Let et be the collection value of st and the attention scores of all hidden states of the encoder. Equation 1 of e ^t is as follows.

[수식 1][Formula 1]

디코더의 시점 t에서의 어텐션 가중치의 모음값인 어텐션 분포를 α^t이라고 할 때, α^t을 식으로 정의하면 수식 2와 같다.When the attention distribution, which is a collection value of attention weights at the time t of the decoder, is α ^t , α ^t is defined as Equation 2 as shown in Equation 2.

[수식 2][Equation 2]

이제 어텐션의 최종 결과값을 얻기 위해서 각 인코더의 은닉 상태와 어텐션 가중치값들을 곱하고, 최종적으로 모두 더함, 즉 요약하면 가중합(Weighted Sum)을 구하는 과정이다. 아래의 수식 3은 어텐션의 최종 결과. 즉, 어텐션 함수의 출력값인 어텐션 값(Attention Value) at에 대한 식을 보여준다.Now, in order to obtain the final result of attention, it is a process of multiplying the hidden state of each encoder and the attention weight values, and finally adding them all together, that is, in summary, a weighted sum is obtained. Equation 3 below is the final result of attention. That is, the expression for the attention value at, which is the output value of the attention function, is shown.

[수식 3][Equation 3]

이후 인식된 사용자의 모습이 메타버스 내에서 댄스, 스포츠 등의 특정 의미를 가진 제스처로 인식이 되거나, 사용자가 메타버스 콘텐츠를 표현하는데 야구 배트나 봉이나 의자와 같은 물리적인 객체를 가지고 상호작용을 하는 경우, 해당 객체들을 인식하여 사용자 몸동작의 가려짐 등에 의한 인식률 저하를 막을 수 있어야 한다. 본 발명에서 제안한 시스템은 전단계에서 학습된 데이터들을 바탕으로 사용자의 의미 있는 제스처를 분별해내고, 물리적인 객체 추적을 한 후에 이에 대한 구분을 할 수 있는 Complex Gesture GAN알고리즘을 제안한다. 일반적으로 사용자의 제스처가 복잡하거나 물체들을 활용한 동작 인식의 경우 가려지게 되면 해당 부분에 대한 정보를 제대로 인식하지 못하는 문제가 있는데, 본 발명에서 제안하는 생성 저항 네트워크(Generative Adversary Network)를 통하여 이미지를 생성하여 시간의 흐름에 따라 인식된 사용자의 몸동작을 제스처로 인식할 수 있을 뿐 아니라 다른 장애물 등에 동작이 가려지더라도 제스처를 정밀하게 추론할 수 있는 Complex Gesture GAN 알고리즘을 제안한다. 도 5는 Complex-Gesutre GAN을 적용한 인식 방법을 설명하기 위한 도면이다.Afterwards, the recognized user's appearance is recognized as a gesture with a specific meaning such as dance or sports in the metaverse, or the user interacts with a physical object such as a baseball bat, a stick, or a chair to express the metaverse content. In this case, it should be possible to prevent a decrease in the recognition rate due to the occlusion of the user's body motion by recognizing the corresponding objects. The system proposed in the present invention proposes a Complex Gesture GAN algorithm that can discriminate a user's meaningful gesture based on the data learned in the previous step, track a physical object, and then distinguish it. In general, when a user's gesture is complex or when motion recognition using objects is obscured, there is a problem in that information on the corresponding part is not recognized properly. We propose a Complex Gesture GAN algorithm that can not only recognize the user's body motion recognized as a gesture over time by creating it, but also precisely infer the gesture even when the motion is obscured by other obstacles. 5 is a diagram for explaining a recognition method to which Complex-Gesutre GAN is applied.

도 6 및 도 7은 본 발명에서 제안된 시스템을 통해서 추출된 동작 정보를 메타버스 콘텐츠에 적용시킨 장면을 나타내는 도면이다.6 and 7 are diagrams illustrating scenes in which motion information extracted through the system proposed in the present invention is applied to metaverse content.

도 6 및 도 7을 참조하면, 본 발명에서 만든 시스템은 다양한 메타버스 클라이언트에 적용될 수 있도록 알고리즘들을 통합하여 사용자들이 클라우드 환경을 통해서 스마트폰, 웹 캠 및 홀로 렌즈와 같은 증강현실 기기를 통해서 인식할 수 있는 클라우드 기반 실시간 메타버스 포즈 동작 인식 시스템을 제안한다. 제안한 시스템은 다음의 그림과 같이 사용자가 걷는 동작과 동시에 음성 정보를 입력 기기를 통해서 입력을 하면 클라우드 기반 서버에서는 센서 매니저를 통하여 해당 정보들을 받아서 입력을 받고, 사용자가 원하는 아바타와 매칭을 시킨 후에, 딥러닝 알고리즘들을 통해서 얻어진 아바타 정보를 표현해 준다.6 and 7, the system made in the present invention integrates algorithms so that it can be applied to various metaverse clients so that users can recognize it through augmented reality devices such as smartphones, web cams and HoloLens through a cloud environment. We propose a cloud-based real-time metaverse pose motion recognition system that can In the proposed system, as shown in the following figure, when the user inputs voice information through an input device at the same time as the user's walking motion, the cloud-based server receives the information through the sensor manager and receives the input. After matching the user's desired avatar, It expresses the avatar information obtained through deep learning algorithms.

도 8은 메타버스 헬스 콘텐츠를 설명하기 위한 도면이다.8 is a diagram for explaining metaverse health contents.

도 8을 참조하면, 본 발명에서 제안된 시스템은 다양한 메타버스 콘텐츠에 적용이 가능 하다 다음의 그림과 같이 메타버스 헬스 콘텐츠, 즉 아령을 들고 운동을 하는 과정 등에서 응용이 가능 하다.Referring to FIG. 8 , the system proposed in the present invention can be applied to various metaverse contents. As shown in the following figure, it can be applied to metaverse health contents, that is, in the process of exercising with dumbbells.

이상, 본 발명을 바람직한 실시 예를 사용하여 상세히 설명하였으나, 본 발명의 범위는 특정 실시 예에 한정되는 것은 아니며, 첨부된 특허청구범위에 의하여 해석되어야 할 것이다. 또한, 이 기술분야에서 통상의 지식을 습득한 자라면, 본 발명의 범위에서 벗어나지 않으면서도 많은 수정과 변형이 가능함을 이해하여야 할 것이다.As mentioned above, although the present invention has been described in detail using preferred embodiments, the scope of the present invention is not limited to specific embodiments and should be construed according to the appended claims. In addition, those skilled in the art should understand that many modifications and variations are possible without departing from the scope of the present invention.

Claims

A cloud-based metaverse content collaboration system that includes a behavior recognition algorithm that can analyze the structural characteristics of a person when recognizing a user's motion, and recognize and remember motion actions accordingly.

The method of claim 1,
A cloud-based metaverse content collaboration system that manages the part about movement displacements of a person's spine, upper body, and lower body as attention, and manages it through an attention mechanism to predict the output of the corresponding value.

The method of claim 1,
A cloud-based metaverse content collaboration system that includes the Complex Gesture GAN algorithm that can discriminate the user's meaningful gestures based on the data learned in the previous step and distinguish them after tracking the physical object.