KR20210081082A

KR20210081082A - Server, method and user device for providing avatar contents based on motion data of object

Info

Publication number: KR20210081082A
Application number: KR1020190173287A
Authority: KR
Inventors: 이경준; 김민규; 박예진; 송인선; 이강태; 이원희
Original assignee: 주식회사 케이티
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2021-07-01

Abstract

The present invention provides a server for providing avatar content based on motion data of an object, which is capable of extracting the motion data of the object from each frame of an image. The server for providing avatar content based on motion data of an object includes a receiver configured to receive an image of an object from a user terminal and metadata necessary for generating the avatar content; an image analysis unit configured to analyze the image based on the received metadata; a motion data extraction unit configured to extract motion data of the object from each frame of the image based on the analyzed image; a motion data analysis unit configured to analyze whether at least a portion of the motion data is omitted for each frame based on the extracted motion data; a compensator configured to compensate for at least a portion of the missing motion data based on motion data of another frame or a replacement animation for a frame in which at least a portion of the motion data is missing; and a content generation unit configured to generate the avatar content by retargeting the extracted motion data to the avatar.

Description

Server, method, and user terminal providing avatar contents based on motion data of an object {SERVER, METHOD AND USER DEVICE FOR PROVIDING AVATAR CONTENTS BASED ON MOTION DATA OF OBJECT}

본 발명은 객체의 모션 데이터 기반의 아바타 컨텐츠를 제공하는 서버, 방법 및 사용자 단말에 관한 것이다. The present invention relates to a server, a method, and a user terminal for providing avatar content based on motion data of an object.

아바타란 가상 공간에서 자신의 분신을 의미하는 시각적 이미지를 나타내며, 인터넷 채팅, 쇼핑몰, 온라인 게임 등에서 사용자를 대신하는 가상 육체로 각광받고 있다. An avatar refers to a visual image that means an alter ego in a virtual space, and is in the spotlight as a virtual body replacing a user in Internet chatting, shopping malls, and online games.

종래에는 사용자가 자신의 취향에 따라 아이템을 선택하여 생성되는 2차원 아바타가 이용되었으나, 최근에는 사용자의 얼굴의 특징점을 추출하고, 특징점에 기초하여 생성된 사용자의 얼굴 기반의 3차원 아바타가 이용되고 있다. Conventionally, a two-dimensional avatar generated by a user selecting an item according to his or her taste has been used, but recently, a three-dimensional avatar based on the user's face created based on the feature point extracted from the user's face has been used. have.

이러한 아바타 생성과 관련하여, 선행기술인 한국공개특허 제 2019-0101834호는 얼굴의 특징점의 움직임에 따라 모션이 수행된 아바타를 표시하는 전자 장치와 이의 동작 방법을 개시하고 있다. In relation to such an avatar generation, Korean Patent Laid-Open Publication No. 2019-0101834, a prior art, discloses an electronic device for displaying an avatar in which motion is performed according to the movement of a feature point of a face, and an operating method thereof.

최근에는 사용자의 얼굴뿐만 아니라, 사용자의 모션에 기초하여 아바타가 동작하는 아바타 컨텐츠 서비스가 제공되고 있다. 그러나 아바타 컨텐츠의 경우, 사용자 동작을 오인식하는 경우, 사용자가 아닌 사물을 인식하는 경우 등으로 인해 사용자의 모션을 부정확하게 인식함으로써, 사용자의 모션이 충분히 반영되지 못한 아바타 컨텐츠가 생성된다는 문제점을 가지고 있다. Recently, an avatar content service in which an avatar operates based on the user's motion as well as the user's face has been provided. However, in the case of avatar content, there is a problem in that avatar content that does not sufficiently reflect the user's motion is generated by inaccurately recognizing the user's motion due to misrecognition of the user's motion or recognizing an object other than the user. .

사용자 단말로부터 객체를 촬영한 영상 및 아바타 컨텐츠에 필요한 메타데이터에 기초하여 영상을 분석하고, 분석된 영상에 기초하여 영상의 각 프레임으로부터 객체의 모션 데이터를 추출하는 컨텐츠 제공 서버, 방법 및 사용자 단말을 제공하고자 한다. A content providing server, method, and user terminal for analyzing an image based on an image of an object photographed from a user terminal and metadata required for avatar content, and extracting motion data of an object from each frame of the image based on the analyzed image would like to provide

객체의 모션 데이터에 기초하여 각 프레임에 대해 모션 데이터 중 적어도 일부의 누락 여부를 분석하고, 누락된 프레임에 대해 다른 프레임의 모션 데이터 또는 대체 애니메이션에 기초하여 누락된 모션 데이터의 적어도 일부를 보완하는 컨텐츠 제공 서버, 방법 및 사용자 단말을 제공하고자 한다.Content that analyzes whether at least a part of the motion data is missing for each frame based on the motion data of the object, and supplements at least a part of the missing motion data based on the motion data of another frame or an alternative animation for the missing frame An object of the present invention is to provide a providing server, a method, and a user terminal.

누락된 모션 데이터의 보완하고, 추출된 모션 데이터를 아바타에 리타겟팅하여 아바타 컨텐츠를 생성하는 컨텐츠 제공 서버, 방법 및 사용자 단말을 제공하고자 한다.An object of the present invention is to provide a content providing server, a method, and a user terminal that supplement missing motion data and retarget the extracted motion data to an avatar to generate avatar content.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problems to be achieved by the present embodiment are not limited to the technical problems described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 사용자 단말로부터 객체를 촬영한 영상 및 상기 아바타 컨텐츠의 생성에 필요한 메타데이터를 수신하는 수신부, 상기 수신한 메타데이터에 기초하여 상기 영상을 분석하는 영상 분석부, 상기 분석된 영상에 기초하여 상기 영상의 각 프레임으로부터 상기 객체의 모션 데이터를 추출하는 모션 데이터 추출부, 상기 추출된 모션 데이터에 기초하여 상기 각 프레임에 대해 상기 모션 데이터 중 적어도 일부의 누락 여부를 분석하는 모션 데이터 분석부, 상기 모션 데이터 중 적어도 일부가 누락된 프레임에 대해 다른 프레임의 모션 데이터 또는 대체 애니메이션에 기초하여 상기 누락된 모션 데이터 중 적어도 일부를 보완하는 보정부 및 상기 추출된 모션 데이터를 아바타에 리타겟팅하여 상기 아바타 컨텐츠를 생성하는 컨텐츠 생성부를 포함하는 컨텐츠 제공 서버를 제공할 수 있다. As a means for achieving the above-described technical problem, an embodiment of the present invention provides a receiver that receives an image of an object from a user terminal and metadata necessary for generating the avatar content, based on the received metadata. An image analysis unit for analyzing the image, a motion data extraction unit for extracting motion data of the object from each frame of the image based on the analyzed image, and the motion for each frame based on the extracted motion data A motion data analysis unit that analyzes whether at least a portion of the data is missing, a motion data for a frame in which at least a portion of the motion data is missing, based on the motion data of another frame or an alternative animation, the information for supplementing at least a portion of the missing motion data It is possible to provide a content providing server including a content generator for generating the avatar content by retargeting the government and the extracted motion data to the avatar.

본 발명의 다른 실시예는, 객체를 촬영한 영상을 획득하는 촬영부, 상기 객체를 촬영한 영상 및 상기 아바타 컨텐츠의 생성에 필요한 메타데이터를 컨텐츠 제공 서버로 전송하는 전송부, 상기 컨텐츠 제공 서버로부터 상기 영상 및 상기 메타데이터에 기초하여 생성된 상기 아바타 컨텐츠를 수신하는 수신부 및 상기 수신한 아바타 컨텐츠를 표시하는 표시부를 포함하되, 상기 메타데이터에 기초하여 상기 영상이 분석되고, 상기 분석된 영상에 기초하여 상기 영상의 각 프레임으로부터 상기 객체의 모션 데이터가 추출되고, 상기 추출된 모션 데이터에 기초하여 상기 각 프레임에 대해 상기 모션 데이터 중 적어도 일부의 누락 여부가 분석되고, 상기 모션 데이터 중 적어도 일부가 누락된 프레임에 대해 다른 프레임의 모션 데이터 또는 대체 애니메이션에 기초하여 상기 누락된 모션 데이터 중 적어도 일부가 보완되고, 상기 추출된 모션 데이터를 아바타에 리타겟팅하여 상기 아바타 컨텐츠가 생성된 것인 사용자 단말을 제공할 수 있다. Another embodiment of the present invention includes a photographing unit that acquires an image of an object, a transmission unit that transmits an image of the object and metadata necessary for generating the avatar content to a content providing server, and the content providing server A receiving unit for receiving the avatar content generated based on the image and the metadata, and a display unit for displaying the received avatar content, wherein the image is analyzed based on the metadata, and based on the analyzed image to extract the motion data of the object from each frame of the image, and analyze whether at least a part of the motion data is missing for each frame based on the extracted motion data, and at least a part of the motion data is missing Provided is a user terminal in which at least a portion of the missing motion data is supplemented based on the motion data of another frame or an alternative animation for a given frame, and the avatar content is generated by retargeting the extracted motion data to an avatar. can do.

본 발명의 또 다른 실시예는, 사용자 단말로부터 객체를 촬영한 영상 및 상기 아바타 컨텐츠의 생성에 필요한 메타데이터를 수신하는 단계, 상기 수신한 메타데이터에 기초하여 상기 영상을 분석하는 단계, 상기 분석된 영상에 기초하여 상기 영상의 각 프레임으로부터 상기 객체의 모션 데이터를 추출하는 단계, 상기 추출된 모션 데이터에 기초하여 상기 각 프레임에 대해 상기 모션 데이터 중 적어도 일부의 누락 여부를 분석하는 단계, 상기 모션 데이터 중 적어도 일부가 누락된 프레임에 대해 다른 프레임의 모션 데이터 또는 대체 애니메이션에 기초하여 상기 누락된 모션 데이터 중 적어도 일부를 보완하는 단계 및 상기 추출된 모션 데이터를 아바타에 리타겟팅하여 상기 아바타 컨텐츠를 생성하는 단계를 포함하는 컨텐츠 제공 방법을 제공할 수 있다. Another embodiment of the present invention provides the steps of: receiving, from a user terminal, an image of an object and metadata necessary for generating the avatar content; analyzing the image based on the received metadata; extracting motion data of the object from each frame of the image based on an image, analyzing whether at least a portion of the motion data is missing for each frame based on the extracted motion data, the motion data Compensating for at least a part of the missing motion data based on motion data or replacement animation of another frame with respect to a frame in which at least a part of which is missing, and retargeting the extracted motion data to an avatar to generate the avatar content It is possible to provide a content providing method including the steps.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary, and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 사용자의 다양한 동작(예를 들어, 댄스, 제스쳐 등)을 촬영한 영상을 메타데이터에 기초하여 분석하여 사용자의 모션 데이터를 추출하고, 추출한 모션 데이터에 기초하여 아바타를 리타겟팅하여 아바타 컨텐츠를 생성하는 컨텐츠 제공 서버, 방법 및 사용자 단말을 제공할 수 있다.According to any one of the above-described problem solving means of the present invention, the user's motion data is extracted by analyzing an image of the user's various motions (eg, dance, gesture, etc.) based on the metadata, and the extracted motion A content providing server, a method, and a user terminal for generating avatar content by retargeting an avatar based on data may be provided.

사용자의 관절을 오인식하거나, 사용자의 신체가 촬영 영역을 벗어나거나, 사물 등의 인식 등으로 인해 모션 데이터의 누락이 발생된 경우, 누락된 모션 데이터의 누락 정도에 따라 다른 프레임의 모션 데이터 또는 대체 애니메이션을 이용하여 보완하도록 하는 컨텐츠 제공 서버, 방법 및 사용자 단말을 제공할 수 있다.If motion data is missing due to misrecognition of the user's joint, the user's body is out of the shooting area, or recognition of an object, etc., depending on the degree of omission of the missing motion data, motion data of another frame or alternative animation It is possible to provide a content providing server, method, and user terminal to be supplemented using

3D 뎁스 카메라를 이용하여 촬영한 영상뿐만 아니라, 2D 일반 카메라를 이용하여 촬영한 영상에 대해서도 모션 데이터를 추출하여 아바타에 리타겟팅함으로써, 아바타 컨텐츠를 생성할 수 있도록 하는 컨텐츠 제공 서버, 방법 및 사용자 단말을 제공할 수 있다.A content providing server, method, and user terminal for generating avatar content by extracting motion data from an image captured using a 3D depth camera as well as an image captured using a 2D general camera and retargeting it to an avatar can provide

사용자 단말로 저작 도구를 제공하여 사용자가 직접 맞춤형 아바타 컨텐츠로 편집할 수 있도록 하는 컨텐츠 제공 서버, 방법 및 사용자 단말을 제공할 수 있다.It is possible to provide a content providing server, a method, and a user terminal that provide an authoring tool to a user terminal so that the user can directly edit the customized avatar content.

사용자가 영상에 포함될 음원을 선택한 경우, 음원의 장르, 템포 등을 분석하여 음원에 적합한 아바타를 추천하는 컨텐츠 제공 서버, 방법 및 사용자 단말을 제공할 수 있다.When the user selects a sound source to be included in an image, the content providing server, method, and user terminal that analyze the genre, tempo, etc. of the sound source and recommend an avatar suitable for the sound source may be provided.

도 1은 본 발명의 일 실시예에 따른 컨텐츠 제공 시스템의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 컨텐츠 제공 서버의 구성도이다.
도 3은 본 발명의 일 실시예에 따른 아바타를 도시한 예시적인 도면이다.
도 4는 본 발명의 일 실시예에 따른 모션 데이터를 추출하여 포맷을 변환하는 과정을 설명하기 위한 예시적인 도면이다.
도 5a 및 도 5b는 본 발명의 일 실시예에 따른 보정 구간에 대한 신규 모션 데이터를 생성하는 과정을 설명하기 위한 예시적인 도면이다.
도 6a 및 도 6b는 본 발명의 일 실시예에 따른 보정 구간에 대해 대체 애니메이션을 적용하는 과정을 설명하기 위한 예시적인 도면이다.
도 7은 본 발명의 일 실시예에 따른 컨텐츠 제공 서버에서 객체의 모션 데이터 기반의 아바타 컨텐츠를 제공하는 방법의 순서도이다.
도 8은 본 발명의 일 실시예에 따른 사용자 단말의 구성도이다.
도 9는 본 발명의 일 실시예에 따른 편집된 아바타 컨텐츠를 도시한 예시적인 도면이다.
도 10은 본 발명의 사용자 단말에서 객체의 모션 데이터 기반의 아바타 컨텐츠를 제공하는 방법의 순서도이다. 1 is a block diagram of a content providing system according to an embodiment of the present invention.
2 is a block diagram of a content providing server according to an embodiment of the present invention.
3 is an exemplary diagram illustrating an avatar according to an embodiment of the present invention.
4 is an exemplary diagram for explaining a process of extracting motion data and converting a format according to an embodiment of the present invention.
5A and 5B are exemplary views for explaining a process of generating new motion data for a correction section according to an embodiment of the present invention.
6A and 6B are exemplary views for explaining a process of applying an alternative animation to a correction section according to an embodiment of the present invention.
7 is a flowchart of a method for providing avatar content based on motion data of an object in a content providing server according to an embodiment of the present invention.
8 is a block diagram of a user terminal according to an embodiment of the present invention.
9 is an exemplary diagram illustrating edited avatar content according to an embodiment of the present invention.
10 is a flowchart of a method of providing avatar content based on motion data of an object in a user terminal according to the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . Also, when a part "includes" a component, it means that other components may be further included, rather than excluding other components, unless otherwise stated, and one or more other features However, it is to be understood that the existence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded in advance.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다.In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.Some of the operations or functions described as being performed by the terminal or device in the present specification may be instead performed by a server connected to the terminal or device. Similarly, some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다. Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 컨텐츠 제공 시스템의 구성도이다. 도 1을 참조하면, 컨텐츠 제공 시스템(1)은 컨텐츠 제공 서버(110) 및 사용자 단말(120)을 포함할 수 있다. 컨텐츠 제공 서버(110) 및 사용자 단말(120)은 컨텐츠 제공 시스템(1)에 의하여 제어될 수 있는 구성요소들을 예시적으로 도시한 것이다. 1 is a block diagram of a content providing system according to an embodiment of the present invention. Referring to FIG. 1 , the content providing system 1 may include a content providing server 110 and a user terminal 120 . The content providing server 110 and the user terminal 120 exemplarily illustrate components that can be controlled by the content providing system 1 .

도 1의 컨텐츠 제공 시스템(1)의 각 구성요소들은 일반적으로 네트워크(network)를 통해 연결된다. 예를 들어, 도 1에 도시된 바와 같이, 컨텐츠 제공 서버(110)는 사용자 단말(120)과 동시에 또는 시간 간격을 두고 연결될 수 있다. Each component of the content providing system 1 of FIG. 1 is generally connected through a network. For example, as shown in FIG. 1 , the content providing server 110 may be connected to the user terminal 120 at the same time or at a time interval.

네트워크는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다. A network refers to a connection structure that enables information exchange between each node, such as terminals and servers, and includes a local area network (LAN), a wide area network (WAN), and the Internet (WWW: World). Wide Web), wired and wireless data communication networks, telephone networks, wired and wireless television networks, and the like. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi, Bluetooth communication, infrared communication, ultrasound Communication, Visible Light Communication (VLC), LiFi, etc. are included, but are not limited thereto.

컨텐츠 제공 서버(110)는 사용자 단말(120)로부터 객체를 촬영한 영상 및 아바타 컨텐츠의 생성에 필요한 메타데이터를 수신할 수 있다. 여기서, 메타데이터는 리타겟팅될 아바타의 정보 및 아바타 컨텐츠의 생성에 필요한 부가 정보를 포함할 수 있다. 아바타는 휴먼형 아바타 또는 동물형 아바타 중 어느 하나를 포함하고, 부가 정보는 영상에 포함된 음원 정보, 객체를 촬영한 카메라 정보 및 배경 정보 등을 포함할 수 있다. The content providing server 110 may receive from the user terminal 120 an image obtained by photographing an object and metadata necessary for generating the avatar content. Here, the metadata may include information on the avatar to be retargeted and additional information necessary for generating the avatar content. The avatar may include either a human-type avatar or an animal-type avatar, and the additional information may include sound source information included in an image, camera information for capturing an object, background information, and the like.

컨텐츠 제공 서버(110)는 사용자 단말(120)로부터 음원 정보를 수신한 경우, 수신한 음원 정보에 기초하여 음원의 장르(예를 들어, 댄스곡, 발라드곡, R&B, 힙합 등)를 분석하고, 분석된 음원의 장르에 기초하여 복수의 아바타 중 어느 하나의 추천 아바타를 선택하여 제공할 수 있다. When the content providing server 110 receives sound source information from the user terminal 120, it analyzes the genre of the sound source (eg, dance song, ballad song, R&B, hip-hop, etc.) based on the received sound source information, Any one of the plurality of avatars may be selected and provided based on the analyzed genre of the sound source.

컨텐츠 제공 서버(110)는 수신한 메타데이터에 기초하여 영상을 분석할 수 있다. The content providing server 110 may analyze the image based on the received metadata.

컨텐츠 제공 서버(110)는 카메라 정보(예를 들어, 2D 카메라, 3D 카메라 등)에 기초하여 영상의 유형 정보를 분석하고, 분석된 영상에 기초하여 영상의 각 프레임으로부터 객체의 모션 데이터를 추출할 수 있다. 이 때, 컨텐츠 제공 서버(110)는 분석된 영상의 유형 정보에 기초하여 추출된 모션 데이터의 포맷을 변환할 수 있다. The content providing server 110 analyzes the type information of the image based on the camera information (eg, 2D camera, 3D camera, etc.), and extracts motion data of the object from each frame of the image based on the analyzed image. can In this case, the content providing server 110 may convert the format of the extracted motion data based on the analyzed image type information.

컨텐츠 제공 서버(110)는 포맷이 변환된 모션 데이터에 기초하여 영상의 각 프레임에 대해 모션 데이터의 누락 여부를 1차적으로 분석한 후, 포맷 변환된 모션 데이터를 리타겟팅될 아바타의 크기에 대응하도록 스케일링할 수 있다. The content providing server 110 primarily analyzes whether motion data is missing for each frame of the image based on the format-converted motion data, and then converts the format-converted motion data to correspond to the size of the avatar to be retargeted. can be scaled.

컨텐츠 제공 서버(110)는 추출된 모션 데이터에 기초하여 각 프레임에 대해 모션 데이터 중 적어도 일부의 누락 여부를 분석할 수 있다. 이를 위해, 컨텐츠 제공 서버(110)는 영상의 전체 프레임에 대해 모션 데이터를 분석하고, 분석된 모션 데이터를 신체 영역 중 어느 하나로 분류할 수 있다. The content providing server 110 may analyze whether at least a portion of the motion data is missing for each frame based on the extracted motion data. To this end, the content providing server 110 may analyze motion data for the entire frame of the image, and classify the analyzed motion data into any one of body regions.

컨텐츠 제공 서버(110)는 아바타의 정보에 기초하여 분류된 어느 하나의 신체 영역과 매핑될 매핑 대상 영역을 선택하고, 선택된 매핑 대상 영역에 해당하는 구간별 좌표 정보에 기초하여 매핑 대상 영역에 해당하는 전체 구간의 좌표에 대한 평균값을 도출할 수 있다. The content providing server 110 selects a mapping target region to be mapped with any one classified body region based on the avatar information, and selects a mapping target region corresponding to the mapping target region based on coordinate information for each section corresponding to the selected mapping target region. An average value for the coordinates of the entire section can be derived.

컨텐츠 제공 서버(110)는 도출된 평균값에 기초하여 매핑 대상 영역에 해당하는 구간별 좌표값 간의 차이가 제 1 임계치 이상인 경우, 매핑 대상 영역에 해당하는 적어도 하나의 구간에서 모션 데이터가 누락된 것으로 판단할 수 있다. If the difference between the coordinate values for each section corresponding to the mapping target area is equal to or greater than the first threshold based on the derived average value, the content providing server 110 determines that motion data is missing from at least one section corresponding to the mapping target area can do.

컨텐츠 제공 서버(110)는 모션 데이터 중 적어도 일부가 누락된 프레임에 대해 다른 프레임의 모션 데이터 또는 대체 애니메이션에 기초하여 누락된 모션 데이터 중 적어도 일부를 보완할 수 있다. The content providing server 110 may supplement at least a portion of the missing motion data based on motion data of another frame or an alternative animation for a frame in which at least a portion of the motion data is missing.

컨텐츠 제공 서버(110)는 신체 영역 중 기설정된 제 1 영역으로 분류된 모션 데이터에 기초하여 아바타의 대체 애니메이션을 적용할 보정 구간을 계산하고, 계산된 보정 구간에서 제 1 영역의 누락 비율이 제 2 임계치 이상인지 여부를 판단할 수 있다. The content providing server 110 calculates a correction section to which an alternative animation of the avatar is to be applied based on the motion data classified as a preset first region among the body regions, and the omission rate of the first region in the calculated correction section is the second It can be determined whether or not it is above a threshold value.

예를 들어, 컨텐츠 제공 서버(110)는 제 1 영역의 누락 비율이 제 2 임계치 미만인 경우, 보정 구간의 이전 프레임 및 이후 프레임 중 적어도 하나에 해당하는 모션 데이터를 보간하여 보정 구간에 대한 신규 모션 데이터를 생성할 수 잇다. For example, when the omission rate of the first region is less than the second threshold, the content providing server 110 interpolates motion data corresponding to at least one of a frame before and after the correction period to obtain new motion data for the correction period. can create

다른 예를 들어, 컨텐츠 제공 서버(110)는 제 1 영역의 누락 비율이 제 2 임계치 이상인 경우, 보정 구간에 대체 애니메이션을 적용할 수 있다. 구체적으로, 컨텐츠 제공 서버(110)는 대체 애니메이션을 적용할 보정 구간의 길이를 산출하고, 보정 구간의 이전 프레임 및 이후 프레임 중 적어도 하나에 해당하는 모션 데이터에 기초하여 객체의 모션 크기를 검출하고, 보정 구간의 길이 및 검출된 객체의 모션 크기에 기초하여 복수의 후보 대체 애니메이션 중 어느 하나의 최종 대체 애니메이션을 선택할 수 있다. As another example, when the omission rate of the first region is equal to or greater than the second threshold, the content providing server 110 may apply the replacement animation to the correction section. Specifically, the content providing server 110 calculates the length of the correction section to which the replacement animation is to be applied, and detects the motion size of the object based on motion data corresponding to at least one of the frames before and after the correction section, One final replacement animation may be selected from among a plurality of candidate replacement animations based on the length of the correction section and the motion size of the detected object.

컨텐츠 제공 서버(110)는 보정 구간에 대해 카메라 워크 및 무대 효과를 적용하여 추가 보정을 수행할 수 있다.The content providing server 110 may perform additional correction by applying camera work and stage effects to the correction section.

컨텐츠 제공 서버(110)는 보정 구간이 시작되는 시점을 키 프레임으로 설정하고, 설정된 키 프레임을 기준으로 영상에 포함된 음원의 장르 및 검출된 객체의 모션 크기에 기초하여 카메라 워크를 짧게 또는 길게 적용할 수 있다. The content providing server 110 sets the starting point of the correction section as a key frame, and based on the set key frame, the genre of the sound source included in the image and the motion size of the detected object, short or long, the camera work is applied. can do.

컨텐츠 제공 서버(110)는 추출된 모션 데이터를 아바타에 리타겟팅하여 아바타 컨텐츠를 생성할 수 있다. The content providing server 110 may generate avatar content by retargeting the extracted motion data to the avatar.

사용자 단말(120)은 아바타 컨텐츠를 생성하기 위한 아바타 컨텐츠 서비스 제공 앱을 실행시킬 수 있다. 여기서, 아바타 컨텐츠 서비스 제공 앱은 아바타 컨텐츠의 생성 모드, 편집 모드 및 재생 모드를 제공할 수 있다. The user terminal 120 may execute an avatar content service providing app for generating avatar content. Here, the avatar content service providing app may provide an avatar content creation mode, editing mode, and playback mode.

예를 들어, 사용자 단말(120)은 아바타 컨텐츠 제공 서비스 앱의 생성 모드를 통해 자신이 취한 동작에 기초하여 아바타를 리타겟팅하여 아바타 컨텐츠를 생성할 수 있다. 사용자 단말(120)은 아바타 컨텐츠 제공 서비스 앱의 생성 모드를 통해 기생성된 아바타 컨텐츠를 아바타 컨텐츠 서비스 앱을 통해 컨텐츠 제공 서버(110)로 업로드할 수도 있다. 이 때, 사용자 단말(120)은 기생성된 아바타 컨텐츠를 업로드하는 경우, 기생성된 아바타 컨텐츠의 분석을 위한 부가 정보(예를 들어, 음원명 등)를 추가로 입력받을 수 있다. For example, the user terminal 120 may generate the avatar content by retargeting the avatar based on an action taken by the user terminal 120 through the creation mode of the avatar content providing service app. The user terminal 120 may also upload the avatar content previously generated through the creation mode of the avatar content providing service app to the content providing server 110 through the avatar content providing service app. In this case, when uploading the pre-generated avatar content, the user terminal 120 may additionally receive additional information (eg, sound source name, etc.) for analyzing the pre-generated avatar content.

다른 예를 들어, 사용자 단말(120)은 아바타 컨텐츠 제공 서비스 앱의 편집 모드를 통해 컨텐츠 제공 서버(110)에서 생성된 아바타 컨텐츠를 저작 도구를 이용하여 직접 수정할 수도 있다. For another example, the user terminal 120 may directly modify the avatar content generated by the content providing server 110 using an authoring tool through the editing mode of the avatar content providing service app.

또 다른 예를 들어, 사용자 단말(120)은 아바타 컨텐츠 서비스 앱의 재생 모드를 통해 자신의 모션에 기초하여 아바타가 리타겟팅됨으로써 생성된 아바타 컨텐츠를 감상할 수 있다. 또는, 사용자 단말(120)은 아바타 컨텐츠 서비스 제공 앱의 재생 모드를 통해 다른 사용자 단말에서 생성한 아바타 컨텐츠를 감상할 수도 있다. As another example, the user terminal 120 may enjoy the avatar content generated by retargeting the avatar based on its own motion through the playback mode of the avatar content service app. Alternatively, the user terminal 120 may enjoy the avatar content generated by another user terminal through the play mode of the avatar content service providing app.

사용자 단말(120)은 영상에 포함될 음원 정보를 입력받으면, 입력된 음원 정보를 컨텐츠 제공 서버(110)로 전송할 수 있다. 사용자 단말(120)은 컨텐츠 제공 서버(110)로부터 음원 정보에 기초하여 선택된 추천 아바타를 수신할 수 있다. When the user terminal 120 receives sound source information to be included in an image, the user terminal 120 may transmit the input sound source information to the content providing server 110 . The user terminal 120 may receive a recommended avatar selected based on sound source information from the content providing server 110 .

사용자 단말(120)은 객체를 촬영한 영상을 획득할 수 있다. 여기서, 사용자 단말(120)은 댄스, 제스처 등의 다양한 동작을 취하고 있는 객체를 촬영할 수 있다. The user terminal 120 may acquire an image obtained by photographing the object. Here, the user terminal 120 may photograph an object taking various actions such as a dance or a gesture.

사용자 단말(120)은 객체를 촬영한 영상 및 아바타 컨텐츠의 생성에 필요한 메타데이터를 컨텐츠 제공 서버(110)로 전송할 수 있다. The user terminal 120 may transmit an image of the object and metadata necessary for generating the avatar content to the content providing server 110 .

사용자 단말(120)은 컨텐츠 제공 서버(110)로부터 영상 및 메타데이터에 기초하여 생성된 아바타 컨텐츠를 수신할 수 있다. The user terminal 120 may receive the avatar content generated based on the image and metadata from the content providing server 110 .

사용자 단말(120)은 수신한 아바타 컨텐츠를 표시할 수 있다. 사용자 단말(120)은 기기에 따라 다양한 방법으로 아바타 컨텐츠가 소비될 수 있다. 예를 들어, 사용자 단말(120)이 HMD(Head Mounted Display)인 경우, 사용자 단말(120)은 2D 형태의 아바타 컨텐츠를 360˚ 형태의 아바타 컨텐츠로 표시할 수 있다. 이 때, 사용자 단말(120)은 360˚형태의 아바타 컨텐츠를 소비하는 경우, 카메라 워크 등의 추가 보정된 정보를 활용하지 않고, 사용자가 원하는 위치에서 실시간으로 아바타 컨텐츠를 감상할 수 있도록 할 수 있다. 다른 예를 들어, 사용자 단말(120)이 모바일 또는 일반 TV인 경우, 사용자 단말(120)은 모든 효과가 적용된 2D 형태의 아바타 컨텐츠를 표시하여 감상할 수 있다. The user terminal 120 may display the received avatar content. The user terminal 120 may consume the avatar content in various ways depending on the device. For example, when the user terminal 120 is a head mounted display (HMD), the user terminal 120 may display 2D avatar content as 360° avatar content. In this case, when the user terminal 120 consumes the avatar content in the form of 360˚, the user can enjoy the avatar content in real time at a desired location without using additional corrected information such as camera work. . For another example, when the user terminal 120 is a mobile or general TV, the user terminal 120 may display and enjoy 2D avatar content to which all effects are applied.

사용자 단말(120)은 수신한 아바타 컨텐츠에 대해 저작 도구를 이용하여 편집할 수 있도록 할 수 있다. The user terminal 120 may edit the received avatar content using an authoring tool.

사용자 단말(120)은 아바타 컨텐츠에 대해 카메라 워크의 시점 조정, 아바타에 대한 추가 애니메이션 효과 적용, 아바타 컨텐츠에 대한 음원의 싱크 조정, 무대 효과의 적용, 아바타에 대한 특수 효과의 적용, 화면 필터의 적용 등을 편집할 수 있다. The user terminal 120 adjusts the viewpoint of the camera work for the avatar content, applies additional animation effects to the avatar, adjusts the sync of the sound source to the avatar content, applies a stage effect, applies a special effect to the avatar, and applies a screen filter You can edit etc.

사용자 단말(120)은 아바타 컨텐츠 제공 서비스 앱을 통해 아바타 컨텐츠의 생성외에도 다양한 부가 기능을 이용할 수 있다. 예를 들어, 사용자 단말(120)은 생성된 아바타 컨텐츠에 기초하여 부가적인 컨텐츠를 생성할 수 있다. 다른 예를 들어, 사용자 단말(120)이 아바타 컨텐츠의 재생 중 이미지를 캡쳐한 경우, 캡쳐된 이미지를 사용자 단말(120)의 사진첩에 저장할 수도 있다. 또 다른 예를 들어, 사용자 단말(120)이 완성된 아바타 컨텐츠의 타임라인에서 사용자로부터 소정의 구간을 지정받은 경우, 지정된 소정의 구간에 대한 하이라이트 영상을 생성 및 저장할 수도 있다. The user terminal 120 may use various additional functions in addition to generating avatar content through the avatar content providing service app. For example, the user terminal 120 may generate additional content based on the generated avatar content. As another example, when the user terminal 120 captures an image while the avatar content is being reproduced, the captured image may be stored in a photo album of the user terminal 120 . As another example, when a predetermined section is designated by the user in the timeline of the completed avatar content, the user terminal 120 may generate and store a highlight image for the designated predetermined section.

사용자 단말(120)은 아바타 컨텐츠가 편집된 경우, 편집된 아바타 컨텐츠를 프로젝트 형태로 컨텐츠 제공 서버(110)로 재업로드할 수 있다. 또한, 사용자 단말(120)은 사용자 단말(120)에서 이용한 부가 기능을 통해 생성된 캡쳐 이미지, 하이라이트 영상 등을 컨텐츠 제공 서버(110)로 재업로드할 수도 있다. When the avatar content is edited, the user terminal 120 may re-upload the edited avatar content to the content providing server 110 in the form of a project. Also, the user terminal 120 may re-upload a captured image, a highlight image, etc. generated through an additional function used in the user terminal 120 to the content providing server 110 .

이러한 사용자 단말(120)은 아바타 컨텐츠가 2D 형태로 소비되는 스마트 폰, 태블릿 PC 등을 포함하고, 아바타 컨텐츠가 360˚ 형태로 소비되는 HMD(Head Mounted Display) 등을 포함할 수 있다. The user terminal 120 may include a smart phone, a tablet PC, etc. in which the avatar content is consumed in a 2D form, and a head mounted display (HMD) in which the avatar content is consumed in a 360° form.

도 2는 본 발명의 일 실시예에 따른 컨텐츠 제공 서버의 구성도이다. 도 2를 참조하면, 컨텐츠 제공 서버(110)는 수신부(210), 추천 아바타 제공부(220), 영상 분석부(230), 모션 데이터 추출부(240), 스케일링부(250), 모션 데이터 분석부(260), 보정부(270) 및 컨텐츠 생성부(280)를 포함할 수 있다. 2 is a block diagram of a content providing server according to an embodiment of the present invention. Referring to FIG. 2 , the content providing server 110 includes a receiving unit 210 , a recommended avatar providing unit 220 , an image analyzing unit 230 , a motion data extracting unit 240 , a scaling unit 250 , and a motion data analysis unit. It may include a unit 260 , a correction unit 270 , and a content generation unit 280 .

수신부(210)는 사용자 단말(120)로부터 객체를 촬영한 영상 및 아바타 컨텐츠의 생성에 필요한 메타데이터를 수신할 수 있다. 여기서, 영상은 댄스, 제스쳐 등과 같이 객체의 다양한 동작을 촬영한 영상일 수 있으며, 메타데이터는 리타겟팅될 아바타의 정보 및 아바타 컨텐츠의 생성에 필요한 부가 정보를 포함할 수 있다. 부가 정보는 예를 들어, 영상에 포함된 음원 정보, 객체를 촬영한 카메라 정보(예를 들어, 2D 카메라, 3D 카메라 등) 및 배경 정보(무대 정보) 등을 포함할 수 있다. 리타겟팅될 아바타에 대해서는 도 3을 통해 상세히 설명하도록 한다. The receiver 210 may receive from the user terminal 120 an image obtained by photographing an object and metadata necessary for generating avatar content. Here, the image may be an image obtained by photographing various motions of an object, such as a dance or gesture, and the metadata may include information on an avatar to be retargeted and additional information necessary for generating avatar content. The additional information may include, for example, sound source information included in an image, camera information (eg, a 2D camera, a 3D camera, etc.) photographing an object, and background information (stage information). An avatar to be retargeted will be described in detail with reference to FIG. 3 .

도 3은 본 발명의 일 실시예에 따른 아바타를 도시한 예시적인 도면이다. 도 3을 참조하면, 리타겟팅될 아바타는 휴먼형 아바타(300) 또는 동물형 아바타(310) 중 어느 하나를 포함할 수 있다. 3 is an exemplary diagram illustrating an avatar according to an embodiment of the present invention. Referring to FIG. 3 , the avatar to be retargeted may include either a human avatar 300 or an animal avatar 310 .

추천 아바타 제공부(220)는 사용자 단말(120)로부터 영상에 포함될 음원 정보를 우선적으로 수신한 경우, 수신한 음원 정보에 기초하여 음원의 장르를 분석하고, 분석된 음원의 장르에 기초하여 복수의 아바타 중 어느 하나의 추천 아바타를 선택하여 사용자 단말(120)로 제공할 수 있다. When the recommended avatar providing unit 220 first receives sound source information to be included in the image from the user terminal 120, it analyzes the genre of the sound source based on the received sound source information, and analyzes the genre of the sound source based on the analyzed genre of the sound source. Any one of the avatars may be selected and provided to the user terminal 120 .

예를 들어, 추천 아바타 제공부(220)는 음원의 장르가 댄스곡이고, 템포가 빨라 움직임이 많이 필요한 음원의 경우, 아바타의 움직임이 다양하게 표현될 수 있도록 하는 휴먼형 아바타(300)를 선택하여 사용자 단말(120)로 제공할 수 있다. For example, if the genre of the sound source is a dance song and the sound source requires a lot of movement because of the fast tempo, the recommended avatar providing unit 220 selects the human avatar 300 that allows the movement of the avatar to be expressed in various ways. Thus, it can be provided to the user terminal 120 .

다른 예를 들어,추천 아바타 제공부(220)는 음원의 장르가 동요인 경우, 동물형 아바타(310)를 선택하여 사용자 단말(120)로 제공할 수 있다. As another example, when the genre of the sound source is a children's song, the recommended avatar providing unit 220 may select the animal-shaped avatar 310 and provide it to the user terminal 120 .

또 다른 예를 들어, 추천 아바타 제공부(220)는 음원의 장르가 발라드이고, 템포가 느려 움직임이 많이 필요하지 않음 음원의 경우, 컨텐츠 제공 서버(110)에 기등록된 아바타 컨텐츠에서 선택된 아바타를 분석하고, 다수의 사용자들이 주로 선택한 아바타를 추천 아바타로써 사용자 단말(120)로 제공할 수 있다. As another example, the recommended avatar providing unit 220 provides an avatar selected from the avatar contents previously registered in the content providing server 110 in the case of a sound source that does not require much movement because the genre of the sound source is ballad and the tempo is slow. After analyzing, an avatar mainly selected by a plurality of users may be provided to the user terminal 120 as a recommended avatar.

다시 도 2로 돌아와서, 영상 분석부(230)는 수신한 메타데이터에 기초하여 영상을 분석할 수 있다. 여기서, 영상 분석부(230)는 카메라 정보에 기초하여 영상의 유형 정보를 분석할 수 있다. 예를 들어, 영상 분석부(230)는 카메라 정보가 2D 카메라인 경우, 영상의 유형 정보를 2D 영상으로 분석할 수 있다. 다른 예를 들어, 영상 분석부(230)는 카메라 정보가 3D 카메라이고, 단순 카메라로 촬영된 경우, 영상의 유형 정보를 2D 영상으로 분석하고, 심도 카메라로 촬영된 경우, 영상의 유형 정보를 3D 영상으로 분석할 수 있다.Returning to FIG. 2 , the image analyzer 230 may analyze the image based on the received metadata. Here, the image analyzer 230 may analyze the type information of the image based on the camera information. For example, when the camera information is a 2D camera, the image analyzer 230 may analyze the type information of the image as a 2D image. For another example, the image analysis unit 230 analyzes the type information of the image as a 2D image when the camera information is a 3D camera and is photographed with a simple camera, and converts the type information of the image into a 3D image when photographed with a depth camera. The video can be analyzed.

모션 데이터 추출부(240)는 분석된 영상에 기초하여 영상의 각 프레임으로부터 객체의 모션 데이터를 추출할 수 있다. 예를 들어, 모션 데이터 추출부(240)는 분석된 영상이 2D 영상인 경우, 영상의 각 프레임으로부터 객체의 모션 데이터를 x, y 좌표로 추출할 수 있다. 다른 예를 들어, 모션 데이터 추출부(240)는 분석된 영상이 3D 영상인 경우, 영상의 각 프레임으로부터 객체의 모션 데이터를 x, y, z 좌표로 추출할 수 있다. The motion data extractor 240 may extract motion data of an object from each frame of an image based on the analyzed image. For example, when the analyzed image is a 2D image, the motion data extractor 240 may extract motion data of an object from each frame of the image as x and y coordinates. As another example, when the analyzed image is a 3D image, the motion data extractor 240 may extract motion data of an object from each frame of the image as x, y, and z coordinates.

모션 데이터 추출부(240)는 분석된 영상의 유형 정보에 기초하여 추출된 모션 데이터의 포맷을 변환할 수 있다. 모션 데이터의 포맷을 변환하는 과정에 대해서는 도 4를 통해 상세히 설명하도록 한다. The motion data extractor 240 may convert the format of the extracted motion data based on the analyzed image type information. A process of converting the format of the motion data will be described in detail with reference to FIG. 4 .

도 4는 본 발명의 일 실시예에 따른 모션 데이터를 추출하여 포맷을 변환하는 과정을 설명하기 위한 예시적인 도면이다. 도 4를 참조하면, 예를 들어, 모션 데이터 추출부(240)는 영상의 각 프레임에 대해 객체의 각 관절에 기초하여 구분된 객체의 신체 영역(400)과 관련된 모션 데이터를 추출할 수 있다. 4 is an exemplary diagram for explaining a process of extracting motion data and converting a format according to an embodiment of the present invention. Referring to FIG. 4 , for example, the motion data extractor 240 may extract motion data related to the body region 400 of the object divided based on each joint of the object for each frame of the image.

여기서, 모션 데이터는 영상을 촬영한 카메라(예를 들어, 2D 카메라, 3D 카메라, 단순 카메라, 심도 카메라 등)의 특성에 따라 데이터의 포맷이 다르므로, 아바타에 리타겟팅할 공통의 포맷으로 변환(410)할 수 있다. Here, the motion data has a different format according to the characteristics of the camera (for example, a 2D camera, a 3D camera, a simple camera, a depth camera, etc.) that captured the image, so it is converted into a common format to be retargeted to the avatar ( 410) can be done.

예를 들어, 모션 데이터 추출부(240)는 키넥트 데이터(kinnect data)의 경우, 머리 영역의 눈(15, 16), 귀(17, 18), 발바닥 부위(19, 22, 21, 24) 등으로 모션 데이터를 변환할 수 있다. For example, in the case of kinect data, the motion data extraction unit 240 may include eyes 15, 16, ears 17, 18, and soles 19, 22, 21, 24 of the head region. and so on to convert motion data.

다른 예를 들어, 모션 데이터 추출부(240)는 영상이 2D 카메라에 의해 촬영된 2D 영상인 경우, 2D 영상에서 추출된 x, y 좌표로 구성된 모션 데이터에 대해 딥러닝과 같은 머신 러닝을 이용하여 z값을 생성함으로써, x, y 좌표를 x, y, z 좌표로 구성되도록 변환할 수 있다. For another example, when the image is a 2D image captured by a 2D camera, the motion data extraction unit 240 uses machine learning such as deep learning for motion data composed of x and y coordinates extracted from the 2D image. By generating the z value, we can transform the x and y coordinates to consist of x, y, and z coordinates.

모션 데이터 분석부(260)는 포맷이 변환된 모션 데이터(420)에 기초하여 영상의 각 프레임에 대해 모션 데이터의 누락 여부를 1차적으로 분석할 수 있다. The motion data analyzer 260 may primarily analyze whether motion data is omitted for each frame of the image based on the format-converted motion data 420 .

스케일링부(250)는 포맷 변환된 모션 데이터(420)를 리타겟팅될 아바타의 크기에 대응하도록 스케일링할 수 있다. The scaling unit 250 may scale the format-converted motion data 420 to correspond to the size of the avatar to be retargeted.

다시 도 2로 돌아와서, 모션 데이터 분석부(260)는 추출된 모션 데이터에 기초하여 각 프레임에 대해 모션 데이터 중 적어도 일부의 누락 여부를 2차적으로 분석할 수 있다. 모션 데이터 분석부(260)는 영상의 전체 프레임에 대해 모션 데이터를 분석하고, 분석된 모션 데이터를 신체 영역 중 어느 하나로 분류할 수 있다. 예를 들어, 모션 데이터 분석부(260)는 영상의 전체 프레임 단위(1초 30fps인 경우, 30이면, 900frame을 모두 추출)로 추출된 모션 데이터를 분석할 수 있다. 이후, 모션 데이터 분석부(260)는 영상의 전체 프레임에 대해 분석된 모션 데이터를 신체 영역 중 얼굴(face) 영역, 손(hand) 영역, 다리(foot) 영역, 몸통(body) 영역으로 분류할 수 있다. Returning to FIG. 2 , the motion data analyzer 260 may secondaryly analyze whether at least a portion of the motion data is missing for each frame based on the extracted motion data. The motion data analyzer 260 may analyze motion data for an entire frame of an image and classify the analyzed motion data into any one of body regions. For example, the motion data analyzer 260 may analyze the extracted motion data in units of all frames of the image (in the case of 30 fps for 1 second, if it is 30, all 900 frames are extracted). Thereafter, the motion data analysis unit 260 classifies the motion data analyzed for the entire frame of the image into a face region, a hand region, a foot region, and a body region among body regions. can

모션 데이터 분석부(260)는 아바타의 정보에 기초하여 분류된 어느 하나의 신체 영역과 매핑될 매핑 대상 영역을 선택할 수 있다. 여기서, 잠시 도 3을 참조하여 설명하자면, 모션 데이터 분석부(260)는 리타겟팅할 아바타가 휴먼형 아바타(300)의 경우, 휴먼형 아바타(300)의 전신을 신체 영역과 매핑될 매핑 대상 영역으로 선택할 수 있다. 다른 예를 들어, 모션 데이터 분석부(260)는 리타겟팅할 아바타가 동물형 아바타(310)의 경우, 동물형 아바타(310)의 몸통 영역(311)에 기초하여 신체 영역과 매핑될 매핑 대상 영역으로 선택할 수 있다. 이 때, 몸통 영역(311)을 제외한 나머지 영역의 모션 데이터는 아바타 컨텐츠의 생성에 이용되지 않을 수 있다. The motion data analyzer 260 may select a mapping target region to be mapped with any one classified body region based on the avatar information. Here, to briefly explain with reference to FIG. 3 , when the avatar to be retargeted is the human avatar 300 , the motion data analyzer 260 maps the entire body of the human avatar 300 to the body region to be mapped. can be selected as For another example, when the avatar to be retargeted is the animal avatar 310 , the motion data analyzer 260 may determine a mapping target region to be mapped with a body region based on the body region 311 of the animal avatar 310 . can be selected as In this case, motion data of the region other than the body region 311 may not be used to generate the avatar content.

모션 데이터 분석부(260)는 선택된 매핑 대상 영역에 해당하는 구간별(예를 들어, 1초) 좌표 정보에 기초하여 매핑 대상 영역에 해당하는 전체 구간의 좌표에 대한 평균값을 도출할 수 있다. The motion data analyzer 260 may derive an average value of the coordinates of the entire section corresponding to the mapping target area based on coordinate information for each section (eg, 1 second) corresponding to the selected mapping target area.

모션 데이터 분석부(260)는 도출된 평균값에 기초하여 매핑 대상 영역에 해당하는 구간별 좌표값 간의 차이가 제 1 임계치 이상(예를 들어, 40% 이상)인 경우, 매핑 대상 영역에 해당하는 적어도 하나의 구간에서 모션 데이터가 누락된 것으로 판단할 수 있다. The motion data analysis unit 260 is configured to, based on the derived average value, when the difference between coordinate values for each section corresponding to the mapping target area is equal to or greater than a first threshold (eg, 40% or greater), at least corresponding to the mapping target area. It may be determined that motion data is missing in one section.

모션 데이터 분석부(260)는 전체 프레임에 대해 누락된 모션 데이터를 분석하여 타임라인으로 관리할 수 있다. 모션 데이터 분석부(260)는 전체 구간에 대해 누락된 모션 데이터를 분석하여 아바타 컨텐츠의 생성 가능 여부를 판단할 수 있다. 예를 들어, 모션 데이터 분석부(260)는 전체 구간에서 누락된 모션 데이터가 약 20% 이상인 경우, 해당 영상을 이용한 아바타 컨텐츠의 생성이 불가능한 것으로 판단할 수 있다. 이 때, 알림부(미도시)를 통해 아바타 컨텐츠의 생성이 불가능함을 사용자 단말(120)로 알릴 수 있다. 다른 예를 들어, 모션 데이터 분석부(260)는 전체 구간에서 누락된 모션 데이터가 약 20% 미만인 경우, 해당 영상을 이용한 아바타 컨텐츠의 생성이 가능한 것으로 판단할 수 있다. 이 때, 적어도 하나의 프레임 별로 모션 데이터에 대한 모션 크기가 검출되어, 각 구간 별로 관리될 수 있다. The motion data analyzer 260 may analyze the missing motion data for the entire frame and manage it as a timeline. The motion data analyzer 260 may analyze the missing motion data for the entire section to determine whether avatar content can be generated. For example, the motion data analyzer 260 may determine that generation of avatar content using the corresponding image is impossible when the missing motion data in the entire section is about 20% or more. In this case, it is possible to notify the user terminal 120 that generation of avatar content is impossible through a notification unit (not shown). As another example, when the missing motion data in the entire section is less than about 20%, the motion data analyzer 260 may determine that it is possible to generate avatar content using the corresponding image. In this case, a motion size for motion data is detected for each at least one frame, and may be managed for each section.

보정부(270)는 모션 데이터 중 적어도 일부가 누락된 프레임에 대해 다른 프레임의 모션 데이터 또는 대체 애니메이션에 기초하여 누락된 모션 데이터 중 적어도 일부를 보완할 수 있다. The compensator 270 may compensate at least a portion of the missing motion data for a frame in which at least a portion of the motion data is missing based on motion data of another frame or an alternative animation.

보정부(270)는 신체 영역 중 기설정된 제 1 영역으로 분류된 모션 데이터에 기초하여 아바타의 대체 애니메이션을 적용할 보정 구간을 계산하고, 계산된 보정 구간에서 제 1 영역(예를 들어, 몸통 영역)의 누락 비율이 제 2 임계치(예를 들어, 50%) 이상인지 여부를 판단할 수 있다. The compensator 270 calculates a correction section to which an alternative animation of the avatar is to be applied based on the motion data classified as a preset first region among the body regions, and in the calculated correction section, the first region (eg, body region) ), it may be determined whether the omission rate is equal to or greater than a second threshold (eg, 50%).

예를 들어, 보정부(270)는 제 1 영역(예를 들어, 몸통 영역)의 누락 비율이 제 2 임계치(예를 들어, 50%) 미만인 경우, 보정 구간의 이전 프레임 및 이후 프레임 중 적어도 하나에 해당하는 모션 데이터를 선형적으로 보간하여 보정 구간에 대한 신규 모션 데이터를 생성할 수 있다. For example, when the omission ratio of the first region (eg, torso region) is less than a second threshold (eg, 50%), the compensator 270 may be configured to configure at least one of a frame before and after the correction section. By linearly interpolating the motion data corresponding to , new motion data for the correction section may be generated.

다른 예를 들어, 보정부(270)는 제 1 영역(예를 들어, 몸통 영역)의 누락 비율이 제 2 임계치(예를 들어, 50%) 이상인 경우, 보정 구간에 대체 애니메이션을 적용할 수 있다. 여기서, 대체 애니메이션이란 아바타와 관련하여 기생성된 점프 동작, 날개짓 동작, 흔들기 동작 등을 포함할 수 있다. 예를 들어, 보정부(270)는 제 1 영역(예를 들어, 몸통 영역)의 누락 비율이 제 2 임계치(예를 들어, 50%) 이상이면서, 보정 구간이 1초를 초과하는 경우, 보정 구간에 대해 대체 애니메이션을 적용할 수 있다. 이 때, 대체 애니메이션이 적용된 보정 구간은 타임라인에 기초하여 관리될 수 있다. 이와 달리, 제 1 영역(예를 들어, 몸통 영역)의 누락 비율이 제 2 임계치(예를 들어, 50%) 이상이면서, 보정 구간이 1초 이하인 경우, 보정부(270)는 보정 구간에 대해 대체 애니메이션을 적용할 정도가 아닌 구간으로 판단하여, 보정 구간의 이전 프레임 및 이후 프레임 중 적어도 하나에 해당하는 모션 데이터를 선형적으로 보간하여 보정 구간에 대한 신규 모션 데이터를 생성할 수 있다.As another example, when the omission rate of the first region (eg, body region) is greater than or equal to a second threshold (eg, 50%), the compensator 270 may apply an alternative animation to the correction section. . Here, the alternative animation may include a jumping motion, a wing flapping motion, a shaking motion, and the like, which are pre-generated in relation to the avatar. For example, if the omission rate of the first region (eg, torso region) is equal to or greater than a second threshold (eg, 50%) and the correction period exceeds 1 second, the correction unit 270 may correct Alternative animations can be applied to sections. In this case, the correction section to which the alternative animation is applied may be managed based on the timeline. On the other hand, when the omission rate of the first region (eg, torso region) is greater than or equal to the second threshold (eg, 50%) and the correction period is 1 second or less, the correction unit 270 determines the correction period for the correction period. New motion data for the correction period may be generated by linearly interpolating motion data corresponding to at least one of a frame before and after a frame before and after the correction period by determining the period not to the extent to which the replacement animation is applied.

보정부(270)는 대체 애니메이션을 적용할 보정 구간의 길이를 산출하고, 보정 구간의 이전 프레임 및 이후 프레임 중 적어도 하나에 해당하는 모션 데이터에 기초하여 객체의 모션 크기를 검출할 수 있다. 보정부(270)는 보정 구간의 길이 및 검출된 객체의 모션 크기에 기초하여 복수의 후보 대체 애니메이션 중 어느 하나의 최종 대체 애니메이션을 선택할 수 있다. 여기서, 복수의 후보 대체 애니메이션은 객체의 모션 크기가 큰 순서 또는 작은 순서의 순으로 아바타와 관련된 복수의 후보 대체 애니메이션을 추출할 수 있다. The compensator 270 may calculate a length of a correction section to which an alternative animation is to be applied, and detect a motion size of the object based on motion data corresponding to at least one of frames before and after the correction section. The compensator 270 may select one final replacement animation from among a plurality of candidate replacement animations based on the length of the compensation section and the detected motion size of the object. Here, as for the plurality of candidate replacement animations, a plurality of candidate replacement animations related to the avatar may be extracted in the order of increasing or decreasing the motion magnitude of the object.

예를 들어, 보정 구간이 약 3초에 해당하는 경우, 보정부(270)는 복수의 후보 대체 애니메이션 중 3초에 적합한 최종 대체 애니메이션을 선택할 수 있다. 이 때, 3초에 적합한 후보 대체 애니메이션이 존재하지 않은 경우, 3초보다 긴 대체 애니메이션을 최종 대체 애니메이션으로 선택할 수 있다. 또는, 보정부(270)는 가장 재생 시간이 긴 대체 애니메이션의 길이가 3초보다 짧은 경우, 복수의 후보 대체 애니메이션 간의 조합을 통해 보정 구간의 길이를 맞추도록 할 수 있다. For example, when the correction section corresponds to about 3 seconds, the corrector 270 may select a final replacement animation suitable for 3 seconds from among a plurality of candidate replacement animations. At this time, if there is no candidate replacement animation suitable for 3 seconds, a replacement animation longer than 3 seconds may be selected as the final replacement animation. Alternatively, when the length of the alternative animation, which has the longest playing time, is shorter than 3 seconds, the corrector 270 may adjust the length of the correction section by combining a plurality of candidate alternative animations.

보정부(270)는 선택된 최종 대체 애니메이션을 보정 구간의 이전 프레임 및 이후 프레임과 자연스럽게 연결되도록 블렌딩 처리를 할 수 있다. 이 때, 최종 대체 애니메이션의 길이가 긴 경우, 이전 프레임 및 이후 프레임과 동일한 비율로 구간을 처리할 수 있다. The compensator 270 may perform a blending process so that the selected final replacement animation is naturally connected to the previous frame and the subsequent frame of the correction section. In this case, when the length of the final replacement animation is long, the section may be processed at the same rate as the previous frame and the subsequent frame.

보정 구간에 대한 신규 모션 데이터의 생성 또는 대체 애니메이션을 적용하는 과정에 대해서는 도 5a 내지 도 6b를 통해 상세히 설명하도록 한다. A process of generating new motion data or applying an alternative animation to the correction section will be described in detail with reference to FIGS. 5A to 6B .

도 5a 및 도 5b는 본 발명의 일 실시예에 따른 보정 구간에 대한 신규 모션 데이터를 생성하는 과정을 설명하기 위한 예시적인 도면이다. 5A and 5B are exemplary views for explaining a process of generating new motion data for a correction section according to an embodiment of the present invention.

도 5a를 참조하면, 영상의 제 1 내지 제 3 프레임(500, 510, 520) 중 제 2 프레임(510)에서 객체의 신체 일부가 화면 영역을 벗어나, 제 2 프레임(510)에서 객체의 모션 데이터(511)가 누락되었다고 가정하자. Referring to FIG. 5A , in the second frame 510 of the first to third frames 500 , 510 , and 520 of the image, a part of the body of the object is out of the screen area, and the motion data of the object is in the second frame 510 . Assume that (511) is missing.

도 5b를 참조하면, 보정부(270)는 제 1 프레임(500)의 모션 데이터(501) 및 제 3 프레임(520)의 모션 데이터(521)를 보간하여 보정 구간(530)에 대한 신규 모션 데이터(531)를 생성할 수 있다. Referring to FIG. 5B , the compensator 270 interpolates the motion data 501 of the first frame 500 and the motion data 521 of the third frame 520 to obtain new motion data for the correction section 530 . (531) can be created.

도 6a 및 도 6b는 본 발명의 일 실시예에 따른 보정 구간에 대체 애니메이션을 적용하는 과정을 설명하기 위한 예시적인 도면이다. 6A and 6B are exemplary views for explaining a process of applying an alternative animation to a correction section according to an embodiment of the present invention.

도 6a를 참조하면, 영상의 제 1 프레임(600) 및 제 2 프레임(610) 중 제 2 프레임(610)에서 객체의 신체 일부가 화면 영역을 벗어나, 제 2 프레임(610)에서 객체의 모션 데이터(611)가 누락되었다고 가정하자. Referring to FIG. 6A , in the second frame 610 of the first frame 600 and the second frame 610 of the image, a body part of the object is out of the screen area, and the motion data of the object in the second frame 610 is Assume that (611) is missing.

보정부(270)는 보정 구간(620)에 대해 아바타가 몸을 좌우로 흔드는 대체 애니메이션(630)을 적용할 수 있다. The compensator 270 may apply an alternative animation 630 in which the avatar shakes the body left and right to the correction section 620 .

다시 도 2로 돌아와서,, 보정부(270)는 보정 구간에 대해 카메라 워크 및 무대 효과를 적용하여 추가 보정을 수행할 수 있다. 여기서, 카메라 워크 및 무대 효과는 신규 모션 데이터의 생성 또는 대체 애니메이션의 적용 등을 통해 보정된 구간을 좀더 은닉하기 위해 해당 보정 구간에 추가적인 효과를 우선적으로 적용할 수 있다. Returning to FIG. 2 again, the correction unit 270 may perform additional correction by applying camera work and stage effects to the correction section. Here, the camera work and the stage effect may preferentially apply additional effects to the corresponding corrected section in order to further conceal the corrected section through generation of new motion data or application of an alternative animation.

여기서, 카메라 워크는 다양하게 적용될 수 있으나, 주로 사용되는 카메라 워크와 관련된 액션은 사전에 정의될 수 있다. 카메라 워크는 예를 들어, 줌인(zoom-in)/줌아웃(zoom-out), 아바타를 중심으로 카메라 회전, 카메라 시점 변환 등의 기능이 적용될 수 있다. 여기서, 아바타를 중심으로 카메라 회전의 경우, 카메라를 아바타의 좌측에서 정면 또는 아바타의 우측에서 정면으로의 회전을 적용시킬 수 있다. 카메라 시점 변환은 정면의 경우, 풀샷 또는 아바타 얼굴 위주로 시점 변환을 적용하고, 좌측/우측 측면의 경우, 풀샷 또는 아바타의 얼굴 위주로 시점 변환을 적용하고, 무대 스크린으로의 시점 변환 등을 적용할 수 있다. Here, the camera work may be variously applied, but an action related to the camera work which is mainly used may be defined in advance. For the camera work, functions such as zoom-in/zoom-out, camera rotation around an avatar, and camera viewpoint conversion may be applied, for example. Here, in the case of rotating the camera around the avatar, the camera may be rotated from the left to the front of the avatar or from the right to the front of the avatar. In the case of the camera viewpoint transformation, in the case of the front, viewpoint transformation is applied mainly to the full shot or avatar face, and in the case of the left/right side, viewpoint transformation is applied mainly to the full shot or the avatar's face, and viewpoint transformation to the stage screen can be applied .

보정부(270)는 보정 구간이 시작되는 시점을 카메라 워크를 적용하기 위한 키 프레임으로 설정하고, 설정된 키 프레임을 기준으로 영상에 포함된 음원의 장르 및 검출된 객체의 모션 크기에 기초하여 카메라 워크를 적용할 수 있다. 예를 들어, 보정부(270)는 음원의 장르가 댄스곡에 해당하고, 검출된 객체의 모션 크기가 큰 경우, 카메라 워크의 적용 간격을 짧게 지정(예를 들어, 5초 이내로 지정)하여 적용할 수 있다. 다른 예를 들어, 보정부(270)는 음원의 장르가 발라드 곡에 해당하고, 검출된 객체의 모션 크기가 작은 경우, 카메라 워크의 적용 간격을 길게 지정(예를 들어, 10초 이상으로 지정)하여 적용할 수 있다. The correction unit 270 sets the time point at which the correction section starts as a key frame for applying the camera walk, and based on the set key frame, the genre of the sound source included in the image and the motion size of the detected object. can be applied. For example, when the genre of the sound source corresponds to a dance song and the motion size of the detected object is large, the compensator 270 designates a short application interval of the camera work (for example, within 5 seconds) and applies it. can do. For another example, if the genre of the sound source corresponds to a ballad song and the motion size of the detected object is small, the compensator 270 designates a long application interval of the camera work (for example, set to 10 seconds or more) can be applied.

보정부(270)는 영상에 포함된 음원의 장르 및 객체의 모션 크기에 기초하여 카메라 줌인/줌아웃, 카메라 회전 속도를 빠르게 또는 느리게 적용할 수도 있다.The compensator 270 may apply a camera zoom-in/zoom-out and a camera rotation speed faster or slower based on the genre of the sound source included in the image and the motion size of the object.

보정부(270)는 균등한 키 프레임의 설정 시, 이전 프레임 및 이후 프레임으로부터 3초 이내의 아바타의 애니메이션의 보정으로 인한 키 프레임의 설정이 존재하는 경우, 균등한 키 프레임을 추가로 설정하지 않을 수 있다. 또한, 보정부(270)는 키 프레임의 지정부터 카메라 워크를 랜덤하게 설정하되, 기존에 설정된 효과가 연속적으로 설정되도록 하지 않을 수 있다. When an equal key frame is set, the compensator 270 may not additionally set an equal key frame if there is a key frame setting due to the correction of the animation of the avatar within 3 seconds from the previous frame and the subsequent frame. can In addition, the compensator 270 may set the camera work randomly from the designation of the key frame, but may not set the previously set effect continuously.

컨텐츠 생성부(280)는 추출된 모션 데이터를 아바타에 리타겟팅하여 아바타 컨텐츠를 생성할 수 있다. 예를 들어, 컨텐츠 생성부(280)는 객체의 모션 데이터에 기초하여 리타겟팅된 아바타 및 음원이 재생되는 화면을 2D 영상의 아바타 컨텐츠로 생성할 수 있다. 이 때, 생성된 아바타 컨텐츠는 아바타 컨텐츠의 편집 및 360˚ 환경에서 재생 가능한 프로젝트의 형태로 관리될 수 있다. The content generator 280 may generate avatar content by retargeting the extracted motion data to the avatar. For example, the content generator 280 may generate a screen on which a retargeted avatar and a sound source are reproduced based on the motion data of the object as avatar content of the 2D image. In this case, the generated avatar content can be managed in the form of a project that can be reproduced in a 360° environment and edited the avatar content.

도 7은 본 발명의 일 실시예에 따른 컨텐츠 제공 서버에서 객체의 모션 데이터 기반의 아바타 컨텐츠를 제공하는 방법의 순서도이다. 도 7에 도시된 컨텐츠 제공 서버(110)에서 객체의 모션 데이터 기반의 아바타 컨텐츠를 제공하는 방법은 도 1 내지 도 6b에 도시된 실시예에 따른 컨텐츠 제공 시스템(1)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 내지 도 6b에 도시된 실시예에 따른 컨텐츠 제공 서버(110)에서 객체의 모션 데이터 기반의 아바타 컨텐츠를 제공하는 방법에도 적용된다. 7 is a flowchart of a method for providing avatar content based on motion data of an object in a content providing server according to an embodiment of the present invention. The method for providing avatar content based on the motion data of an object in the content providing server 110 shown in FIG. 7 is processed in time series by the content providing system 1 according to the embodiment shown in FIGS. 1 to 6B. includes steps. Accordingly, even if the description is omitted below, it is also applied to the method of providing avatar content based on the motion data of the object in the content providing server 110 according to the embodiment shown in FIGS. 1 to 6B .

단계 S710에서 컨텐츠 제공 서버(110)는 사용자 단말(120)로부터 객체를 촬영한 영상 및 아바타 컨텐츠의 생성에 필요한 메타데이터를 수신할 수 있다. In step S710 , the content providing server 110 may receive from the user terminal 120 an image of an object and metadata necessary for generating the avatar content.

단계 S720에서 컨텐츠 제공 서버(110)는 수신한 메타데이터에 기초하여 영상을 분석할 수 있다. In step S720, the content providing server 110 may analyze the image based on the received metadata.

단계 S730에서 컨텐츠 제공 서버(110)는 분석된 영상에 기초하여 영상의 각 프레임으로부터 객체의 모션 데이터를 추출할 수 있다. In step S730, the content providing server 110 may extract motion data of the object from each frame of the image based on the analyzed image.

단계 S740에서 컨텐츠 제공 서버(110)는 추출된 모션 데이터에 기초하여 각 프레임에 대해 모션 데이터 중 적어도 일부의 누락 여부를 분석할 수 있다. In step S740, the content providing server 110 may analyze whether at least a portion of the motion data is missing for each frame based on the extracted motion data.

단계 S750에서 컨텐츠 제공 서버(110)는 모션 데이터 중 적어도 일부가 누락된 프레임에 대해 다른 프레임의 모션 데이터 또는 대체 애니메이션에 기초하여 누락된 모션 데이터 중 적어도 일부를 보완할 수 있다. In step S750 , the content providing server 110 may supplement at least a portion of the missing motion data based on motion data of another frame or an alternative animation for a frame in which at least a portion of the motion data is missing.

단계 S760에서 컨텐츠 제공 서버(110)는 추출된 모션 데이터를 아바타에 리타겟팅하여 아바타 컨텐츠를 생성할 수 있다. In step S760, the content providing server 110 may generate avatar content by retargeting the extracted motion data to the avatar.

상술한 설명에서, 단계 S710 내지 S760은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S710 to S760 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between the steps may be switched.

도 8은 본 발명의 일 실시예에 따른 사용자 단말의 구성도이다. 도 8을 참조하면, 사용자 단말(120)은 입력부(810), 촬영부(820), 전송부(830), 수신부(840), 표시부(850) 및 편집부(860)를 포함할 수 있다. 8 is a block diagram of a user terminal according to an embodiment of the present invention. Referring to FIG. 8 , the user terminal 120 may include an input unit 810 , a photographing unit 820 , a transmitting unit 830 , a receiving unit 840 , a display unit 850 , and an editing unit 860 .

입력부(810)는 영상에 포함될 음원 정보를 입력받을 수 있다. 예를 들어, 입력부(810)가 영상의 촬영 중에 재생될 음원 정보를 입력받으면, 전송부(830)는 입력된 음원 정보를 컨텐츠 제공 서버(110)로 전송할 수 있다. The input unit 810 may receive sound source information to be included in an image. For example, when the input unit 810 receives sound source information to be reproduced while capturing an image, the transmitter 830 may transmit the input sound source information to the content providing server 110 .

수신부(840)는 컨텐츠 제공 서버(110)로부터 음원 정보에 기초하여 선택된 추천 아바타를 수신할 수 있다. 여기서, 추천 아바타는 음원의 장르, 템포 등에 기초하여 복수의 캐릭터 중 어느 하나가 선택된 것일 수 있다. The receiver 840 may receive a recommended avatar selected based on sound source information from the content providing server 110 . Here, the recommended avatar may be one selected from among a plurality of characters based on the genre, tempo, and the like of the sound source.

입력부(810)는 사용자로부터 아바타 컨텐츠의 생성에 필요한 아바타 및 부가 정보를 직접 입력받을 수 있다. 입력부(810)는 사용자로부터 사용자의 모션에 기초하여 리타겟팅될 캐릭터로 사람의 신체 유형과 동일한 휴먼형 아바타 또는 동물형 아바타 중 어느 하나를 입력받을 수 있다. 또한, 입력부(810)는 영상에 포함된 음원 정보, 객체를 촬영한 카메라 정보(예를 들어, 2D 카메라 또는 3D 카메라) 및 배경 정보(무대 정보) 등을 입력받을 수 있다. The input unit 810 may directly receive an avatar and additional information required for generating avatar content from a user. The input unit 810 may receive either a human-type avatar or an animal-type avatar the same as a human body type as a character to be retargeted based on the user's motion from the user. Also, the input unit 810 may receive sound source information included in an image, camera information (eg, a 2D camera or a 3D camera) photographing an object, background information (stage information), and the like.

촬영부(820)는 객체를 촬영한 영상을 획득할 수 있다. 예를 들어, 촬영부(820)는 사용자 단말(120)에 구비된 카메라(예를 들어, 2D 카메라 또는 3D 카메라)를 이용하여 정해진 가이드에 따라 객체를 촬영할 수 있다. The photographing unit 820 may acquire an image obtained by photographing the object. For example, the photographing unit 820 may photograph an object according to a predetermined guide using a camera (eg, a 2D camera or a 3D camera) provided in the user terminal 120 .

전송부(830)는 객체를 촬영한 영상 및 아바타 컨텐츠의 생성에 필요한 메타데이터를 컨텐츠 제공 서버(110)로 전송할 수 있다. The transmitter 830 may transmit an image of the object and metadata necessary for generating the avatar content to the content providing server 110 .

수신부(840)는 컨텐츠 제공 서버(110)로부터 영상 및 메타데이터에 기초하여 생성된 아바타 컨텐츠를 수신할 수 있다. 예를 들어, 수신부(840)는 사용자로부터 아바타 컨텐츠 서비스 제공 앱을 통해 아바타 컨텐츠의 다운로드를 요청받은 경우, 컨텐츠 제공 서버(110)로부터 아바타 컨텐츠를 수신할 수 있다. 이 때, 수신부(840)는 사용자의 선택에 따라 단순 재생용 아바타 컨텐츠 또는 아바타 컨텐츠의 편집, 360˚ 환경에서 재생 가능한 프로젝트용 아바타 컨텐츠를 수신할 수 있다. The receiver 840 may receive the avatar content generated based on the image and metadata from the content providing server 110 . For example, when the receiving unit 840 receives a request from the user to download the avatar content through the avatar content service providing app, the receiver 840 may receive the avatar content from the content providing server 110 . In this case, the receiver 840 may receive the avatar content for simple reproduction or editing of the avatar content according to the user's selection, and the avatar content for the project that can be played in a 360° environment.

표시부(850)는 수신한 아바타 컨텐츠를 표시할 수 있다. The display unit 850 may display the received avatar content.

편집부(860)는 수신한 아바타 컨텐츠에 대해 저작 도구를 이용하여 사용자가 수동으로 편집할 수 있도록 할 수 있다. 여기서, 아바타 컨텐츠는 아바타 컨텐츠의 편집이 가능한 프로젝트 형태일 수 있다. 예를 들어, 편집부(860)는 아바타 컨텐츠에 대해 카메라 워크의 시점 조정, 아바타에 대한 추가 애니메이션 효과 적용, 아바타 컨텐츠에 대한 음원의 싱크 조정, 무대 효과의 적용 및 변경, 아바타에 대한 특수 효과의 적용, 화면 필터의 적용 및 변경 등을 편집할 수 있다. 아바타 컨텐츠를 편집하는 과정에 대해서는 도 9를 통해 상세히 설명하도록 한다. The editing unit 860 may allow the user to manually edit the received avatar content using an authoring tool. Here, the avatar content may be in the form of a project in which the avatar content can be edited. For example, the editor 860 may adjust the viewpoint of the camera work for the avatar content, apply additional animation effects to the avatar, adjust the sync of the sound source to the avatar content, apply and change stage effects, and apply special effects to the avatar , application and change of screen filters can be edited. The process of editing the avatar content will be described in detail with reference to FIG. 9 .

도 9는 본 발명의 일 실시예에 따른 편집된 아바타 컨텐츠를 도시한 예시적인 도면이다. 도 9를 참조하면, 편집부(860)는 카메라 워크의 시점 조정(900)을 통해 카메라의 위치, 회전 범위 등이 조정되거나, 변경되도록 아바타 컨텐츠를 편집할 수 있다. 9 is an exemplary diagram illustrating edited avatar content according to an embodiment of the present invention. Referring to FIG. 9 , the editing unit 860 may edit the avatar content so that the position, rotation range, etc. of the camera are adjusted or changed through the viewpoint adjustment 900 of the camera walk.

편집부(860)는 아바타에 대한 추가 애니메이션 효과(910)를 통해 아바타에 적용할 액세서리를 선택받고, 선택된 액세서리가 아바타에 적용되도록 아바타 컨텐츠를 편집할 수 있다. The editor 860 may receive a selection of an accessory to be applied to the avatar through an additional animation effect 910 on the avatar, and edit the avatar content so that the selected accessory is applied to the avatar.

편집부(860)는 아바타에 대한 특수 효과의 적용을 통해 아바타에 꽃가루(920) 효과가 적용되도록 아바타 컨텐츠를 편집할 수 있고, 아바타에 날개(930) 효과가 적용되도록 아바타 컨텐츠를 편집할 수도 있다. The editor 860 may edit the avatar content so that the pollen 920 effect is applied to the avatar by applying a special effect to the avatar, or edit the avatar content so that the wings 930 effect is applied to the avatar.

도 10은 본 발명의 사용자 단말에서 객체의 모션 데이터 기반의 아바타 컨텐츠를 제공하는 방법의 순서도이다. 도 10에 도시된 사용자 단말(120)에서 객체의 모션 데이터 기반의 아바타 컨텐츠를 제공하는 방법은 도 1 내지 도 9에 도시된 실시예에 따른 컨텐츠 제공 시스템(1)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 내지 도 9에 도시된 실시예에 따른 사용자 단말(120)에서 객체의 모션 데이터 기반의 아바타 컨텐츠를 제공하는 방법에도 적용된다. 10 is a flowchart of a method of providing avatar content based on motion data of an object in a user terminal according to the present invention. The method for providing avatar content based on the motion data of an object in the user terminal 120 shown in FIG. 10 includes the steps of being processed time-series by the content providing system 1 according to the embodiment shown in FIGS. 1 to 9 . include those Accordingly, even if omitted below, the method of providing avatar content based on motion data of an object in the user terminal 120 according to the embodiment shown in FIGS. 1 to 9 is also applied.

단계 S1010에서 사용자 단말(120)은 객체를 촬영한 영상을 획득할 수 있다. In step S1010, the user terminal 120 may acquire an image of the object.

단계 S1020에서 사용자 단말(120)은 객체를 촬영한 영상 및 아바타 컨텐츠의 생성에 필요한 메타데이터를 컨텐츠 제공 서버(110)로 전송할 수 있다. In step S1020 , the user terminal 120 may transmit an image of the object and metadata necessary for generating the avatar content to the content providing server 110 .

단계 S1030에서 사용자 단말(120)은 컨텐츠 제공 서버(110)로부터 영상 및 메타데이터에 기초하여 생성된 아바타 컨텐츠를 수신할 수 있다. In step S1030 , the user terminal 120 may receive the avatar content generated based on the image and metadata from the content providing server 110 .

단계 S1040에서 사용자 단말(120)은 수신한 아바타 컨텐츠를 표시할 수 있다. In step S1040, the user terminal 120 may display the received avatar content.

상술한 설명에서, 단계 S1010 내지 S1040은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S1010 to S1040 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between the steps may be switched.

도 1 내지 도 10을 통해 설명된 컨텐츠 제공 서버 및 사용자 단말에서 객체의 모션 데이터 기반의 아바타 컨텐츠를 제공하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램 또는 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 또한, 도 1 내지 도 10을 통해 설명된 컨텐츠 제공 서버 및 사용자 단말에서 객체의 모션 데이터 기반의 아바타 컨텐츠를 제공하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램의 형태로도 구현될 수 있다. The method for providing avatar content based on motion data of an object in the content providing server and the user terminal described through FIGS. 1 to 10 is a record including a computer program stored in a medium executed by a computer or instructions executable by the computer It can also be implemented in the form of a medium. Also, the method for providing avatar content based on motion data of an object in the content providing server and the user terminal described with reference to FIGS. 1 to 10 may be implemented in the form of a computer program stored in a medium executed by a computer.

컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다. The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

110: 컨텐츠 제공 서버
120: 사용자 단말
210: 수신부
220: 추천 아바타 제공부
230: 영상 분석부
240: 모션 데이터 추출부
250: 스케일링부
260: 모션 데이터 분석부
270: 보정부
280: 컨텐츠 생성부
810: 입력부
820: 촬영부
830:전송부
840: 수신부
850: 표시부
860: 편집부110: content providing server
120: user terminal
210: receiver
220: recommended avatar provider
230: image analysis unit
240: motion data extraction unit
250: scaling unit
260: motion data analysis unit
270: correction unit
280: content generation unit
810: input unit
820: filming unit
830: transmission unit
840: receiver
850: display unit
860: editorial department

Claims

A server that provides avatar content based on motion data of an object, the server comprising:
a receiving unit for receiving an image of an object and metadata necessary for generating the avatar content from a user terminal;
an image analysis unit that analyzes the image based on the received metadata;
a motion data extraction unit for extracting motion data of the object from each frame of the image based on the analyzed image;
a motion data analysis unit for analyzing whether at least a portion of the motion data is missing for each frame based on the extracted motion data;
a compensator for compensating for at least a portion of the missing motion data based on motion data of another frame or a replacement animation for a frame in which at least a portion of the motion data is missing; and
A content generation unit generating the avatar content by retargeting the extracted motion data to the avatar.
Containing, content providing server.

The method of claim 1,
The metadata includes information on the avatar to be retargeted and additional information necessary for generating the avatar content.

3. The method of claim 2,
The avatar includes either a human avatar or an animal avatar,
The additional information includes at least one of sound source information included in the image, camera information for photographing the object, and background information, the content providing server.

4. The method of claim 3,
The image analysis unit analyzes the type information of the image based on the camera information,
The motion data extraction unit converts the format of the extracted motion data based on the type information of the analyzed image,
The content providing server, wherein the motion data analyzer analyzes whether the motion data is missing for each frame of the image based on the format-converted motion data.

5. The method of claim 4,
The content providing server, further comprising a scaling unit for scaling the format-converted motion data to correspond to the size of the retargeted avatar.

3. The method of claim 2,
The motion data analyzer analyzes the motion data for the entire frame of the image, and classifies the analyzed motion data into any one of body regions.

7. The method of claim 6,
The motion data analyzer selects a mapping target region to be mapped with any one of the classified body regions based on the information of the avatar, and applies the mapping target region to the mapping target region based on coordinate information for each section corresponding to the selected mapping target region. Which is to derive the average value for the entire section, the content providing server.

8. The method of claim 7,
When the difference between coordinate values for each section corresponding to the mapping target area is equal to or greater than a first threshold based on the derived average value, the motion data analysis unit detects that the motion data is missing from at least one section corresponding to the mapping target area. It is determined that, the content providing server.

9. The method of claim 8,
The compensator calculates a correction section to which an alternative animation of the avatar is to be applied based on motion data classified as a preset first region among the body regions, and the omission rate of the first region in the calculated correction section is a second The content providing server to determine whether or not the threshold value or more.

10. The method of claim 9,
When the omission rate of the first region is less than a second threshold, the compensator interpolates motion data corresponding to at least one of a frame before and after the correction section to generate new motion data for the correction section. , content delivery server.

10. The method of claim 9,
The compensator applies the replacement animation to the correction section when the omission ratio of the first region is equal to or greater than a second threshold.

12. The method of claim 11,
The compensator calculates a length of a correction section to which the replacement animation is applied, detects a motion size of the object based on motion data corresponding to at least one of a frame before and after a frame of the correction section, and The content providing server, which selects a final replacement animation of any one of a plurality of candidate replacement animations based on the length and the motion magnitude of the detected object.

13. The method of claim 12,
The content providing server that the correction unit performs additional correction by applying camera work and stage effects to the correction section.

14. The method of claim 13,
The compensator sets a time point at which the correction section starts as a key frame, and applies the camera work based on the genre of the sound source included in the image and the motion size of the detected object based on the set key frame In, content providing server.

3. The method of claim 2,
When the sound source information is received from the user terminal, the genre of the sound source is analyzed based on the received sound source information, and a recommended avatar from among a plurality of avatars is selected and provided based on the analyzed genre of the sound source. A content providing server further comprising a recommended avatar providing unit.

A user terminal providing avatar content based on motion data of an object, the user terminal comprising:
a photographing unit for obtaining an image of photographing an object;
a transmission unit for transmitting the image of the object and metadata necessary for generating the avatar content to a content providing server;
a receiver configured to receive the avatar content generated based on the image and the metadata from the content providing server; and
Comprising a display unit for displaying the received avatar content,
the image is analyzed based on the metadata,
Motion data of the object is extracted from each frame of the image based on the analyzed image,
Whether or not at least a portion of the motion data is missing is analyzed for each frame based on the extracted motion data,
At least a portion of the missing motion data is supplemented for a frame in which at least a portion of the motion data is missing based on motion data of another frame or a replacement animation;
The avatar content is generated by retargeting the extracted motion data to the avatar.

17. The method of claim 16,
The user terminal further comprising an editing unit for editing the received avatar content using an authoring tool.

18. The method of claim 17,
The editing unit adjusts the viewpoint of the camera work with respect to the avatar content, applies additional animation effects to the avatar, adjusts the sync of the sound source to the avatar content, applies and changes stage effects, applies special effects to the avatar, screen Editing at least one of applying and changing the filter, the user terminal.

17. The method of claim 16,
Further comprising an input unit for receiving sound source information to be included in the image,
The transmitter transmits the input sound source information to the content providing server,
The receiving unit receives the recommended avatar selected based on the sound source information from the content providing server, the user terminal.

A server that provides avatar content based on motion data of an object, the server comprising:
receiving an image obtained by photographing an object and metadata necessary for generating the avatar content from a user terminal;
analyzing the image based on the received metadata;
extracting motion data of the object from each frame of the image based on the analyzed image;
analyzing whether at least a portion of the motion data is omitted for each frame based on the extracted motion data;
compensating for a frame in which at least a portion of the motion data is missing based on motion data of another frame or replacement animation; and
generating the avatar content by retargeting the extracted motion data to an avatar
Including, a content providing method.