KR20230100214A

KR20230100214A - Virtual avatar image synthesis technology and sharing automation device using gan-based mouth shape synthesis

Info

Publication number: KR20230100214A
Application number: KR1020210189910A
Authority: KR
Inventors: 손정아
Original assignee: 주식회사 애나
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2023-07-05

Abstract

본 발명은 일 실시예에 따른 GAN 기반 입모양 합성 기술을 활용한 가상 아바타 영상 합성 기술 및 공유 자동화 장치에 관한 것으로, 소스 영상에 FACS를 통해 학습한 입모양 데이터를 AI 학습 모델을 이용하여 추출된 입 이미지를 덮어쓰게 하여 새로운 영상을 생성하는 생성부; 생성된 영상을 최소 2개 이상의 영상 재생 플랫폼에 자동으로 업로드 되는 공유부;를 포함한다.The present invention relates to a virtual avatar image synthesis technology and sharing automation device using a GAN-based mouth shape synthesis technology according to an embodiment, in which mouth shape data learned through FACS is extracted from a source image using an AI learning model. a generation unit that overwrites the mouth image to generate a new image; A sharing unit that automatically uploads the created video to at least two or more video playback platforms.

Description

Virtual avatar image synthesis technology and sharing automation device using GAN-based mouth shape synthesis

본 발명은 인공지능과 LipGAN 기술에 기반한 실감 아바타 영상을 생성하고 공유하는 프로세스를 자동화 하기 위한 장치에 관한 것이다.The present invention relates to an apparatus for automating a process of generating and sharing realistic avatar images based on artificial intelligence and LipGAN technology.

기존의 DCGAN 기법은 합성된 영상에 대한 음성 발성에 대한 입모양을 별도로 재현하기 어려워 영상에 녹음된 음성을 이용해야만 해당 영상의 입모양을 재현하는것이 가능했으며, 이러한 불편함은 영상과 음성을 한번에 취득해야만 하기에 사업화에 큰 걸림돌이 되었다. 이에 대한 해결책으로 시각적 품질을 개선시킨 GAN 기반 입모양 재현 기술을 활용하여 제작부터 공유까지 All-in-One으로 하나의 플랫폼에서 진행 가능하도록 하는 프로세스를 특징점으로 정의하여 영상 정합 및 공유에 대한 번거로움을 줄이는 방법 및 장치에 대해 안출하고자 한다.Existing DCGAN techniques were difficult to separately reproduce the mouth shape for voice utterance for the synthesized video, so it was possible to reproduce the mouth shape of the video only by using the voice recorded in the video. It became a big obstacle to commercialization because it had to be acquired. As a solution to this, by using GAN-based mouth shape reproduction technology that improves visual quality, the process of making it possible to proceed from production to sharing on one platform as All-in-One is defined as a feature point to reduce the hassle of matching and sharing images. I would like to come up with a method and device to reduce it.

상술한 바와 같은 문제점을 해결하기 위한 본 발명은 실감 아바타 영상 제작본 공유 과정에 있어 자동화된 프로세스를 이용하여 상술한 문제점들을 해결하고자 한다.The present invention for solving the above-mentioned problems is to solve the above-mentioned problems by using an automated process in the process of producing and sharing realistic avatar images.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the description below.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 인공지능과 LipGAN 기술에 기반한 실감 아바타 영상을 생성하고 공유하는 프로세스를 자동화 하기 위한 장치에 관한 것으로, 소스 영상에 FACS를 통해 학습한 입모양 데이터를 AI 학습 모델을 이용하여 추출된 입 이미지를 덮어쓰게 하여 새로운 영상을 생성하는 생성부; 생성된 영상을 최소 2개 이상의 영상 재생 플랫폼에 자동으로 업로드 되는 공유부; 를 포함한다. 이 외에도, 본 발명을 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램을기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.It relates to a device for automating a process of generating and sharing realistic avatar images based on artificial intelligence and LipGAN technology according to an embodiment of the present invention to solve the above-mentioned problems, and the mouth shape learned through FACS in the source image. a generation unit for generating a new image by overwriting the mouth image extracted using the AI learning model with data; A sharing unit that automatically uploads the created video to at least two video playback platforms; includes In addition to this, another method for implementing the present invention, another system, and a computer readable recording medium recording a computer program for executing the method may be further provided.

고안된 기술은 N개의 입모양 사진을 기반으로 한 가상의 얼굴을 GAN으로 생성한 후, 이를 원하는 샘플 영상에 투영하여 실제와 구별이 어려울 만큼 고도화된 LipGAN 기반 입모양 합성 실감 아바타 영상을 생성하는 절차를 자동화 시키고, 이를 최소 2개 이상의 영상 공유 플랫폼에 업로드 하여 공유할 수 있도록 하는 것을 목표로 하며, 이를 서비스 플랫폼에 자동화 프로세스로 구축하여 가상 영상 생성과 공유를 하나의 플랫폼에서 All-in-One 으로 진행 가능하도록 하는 방법 및 장치를 제공한다.The devised technology creates a virtual face based on N mouth-shape photos with GAN, and then projects it onto a desired sample image to create a LipGAN-based lip-shape synthesized realistic avatar image that is so advanced that it is difficult to distinguish it from the real one. The goal is to automate it and upload it to at least two video sharing platforms so that it can be shared. By building this into a service platform as an automated process, virtual video creation and sharing are all-in-one on one platform. It provides a method and apparatus to make it possible.

도 1은 본 발명의 실시예에 따른 인공지능과 LipGAN 기술에 기반한 실감 아바타 영상을 생성하고 공유하는 프로세스를 자동화 하기 위한 방법 및 장치의 예시도이다.1 is an exemplary diagram of a method and apparatus for automating a process of generating and sharing a realistic avatar image based on artificial intelligence and LipGAN technology according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention, and methods of achieving them, will become clear with reference to the detailed description of the following embodiments taken in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, only these embodiments are intended to complete the disclosure of the present invention, and are common in the art to which the present invention belongs. It is provided to fully inform the person skilled in the art of the scope of the invention, and the invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성 요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.Terminology used herein is for describing the embodiments and is not intended to limit the present invention. In this specification, singular forms also include plural forms unless specifically stated otherwise in a phrase. As used herein, "comprises" and/or "comprising" does not exclude the presence or addition of one or more other elements other than the recited elements. Like reference numerals throughout the specification refer to like elements, and “and/or” includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various components, these components are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first component mentioned below may also be the second component within the technical spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings commonly understood by those skilled in the art to which the present invention belongs. In addition, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless explicitly specifically defined.

최근의 AI 기술은 기술 개발 자체에 대한 집중도가 높았지만 연구 개발이 어느정도 범용화될 수 있는 단계에 다다름에 따라 적용 분야에 특화한 서비스 개발이 활발해 지고 있다. 최근 딥러닝을 활용하여 제작한 콘텐츠인 GAN 기술 기반의 딥페이크(Deepfake)가 확산 중이며, 기존의 GAN 기법은 불안정하고 복잡한 영상의 조합에 어려움을 겪었으나 DCGAN이 부상하면서 딥러닝을 통해 이러한 문제점을 해결할 수 있게 되었다. 이러한 GAN 기반의 딥페이크 기술이 차세대 ICT 기술로 부각되고 있으며, AI 기술을 융합하고자 하는 다양한 분야의 주요 기업들이 해당 기술을 응용하여 다양한 서비스 개발을 위해 집중 투자하고 있다.Recently, AI technology has been highly focused on technology development itself, but as research and development has reached a stage where it can be generalized to some extent, the development of services specialized in application fields is becoming more active. Recently, deepfake based on GAN technology, which is content produced using deep learning, is spreading. Existing GAN techniques have had difficulties in combining unstable and complex images, but with the rise of DCGAN, deep learning solves these problems. been able to solve it. This GAN-based deepfake technology is emerging as a next-generation ICT technology, and major companies in various fields that want to converge AI technology are investing intensively to develop various services by applying the technology.

이러한 선행 발명에서는 FACS를 이용하여 표준화된 사람의 감정 표현 방법을 DCGAN 학습을 위한 특징점으로 정의하여 정확도를 높이고 결과물의 품질을 끌어올리고자 하며 이러한 기술들을 적용하여 제작된 실감 영상은 산업적 활용가치가 높아 영화, 음반 등 다양한 분야에서 활용 중이며, 제작비용을 기존의 약 10%로 낮출 수 있는 등 많은 장점을 보이고 있다.In these prior inventions, the standardized human emotion expression method using FACS is defined as a feature point for DCGAN learning to increase accuracy and improve the quality of the result, and realistic images produced by applying these technologies have high industrial utility value. It is used in various fields such as movies and records, and shows many advantages, such as being able to lower production costs to about 10% of the existing ones.

하지만, 상기 장점을 활용한 기존의 DCGAN 기법은 합성된 영상에 대한 음성 발성에 대한 입모양을 별도로 재현하기 어려워 영상에 녹음된 음성을 이용해야만 해당 영상의 입모양을 재현하는것이 가능했으며. 이러한 불폄함은 영상과 음성을 한번에 취득해야만 하기에 영상 생성 기법을 실제 상용화까지 이용하기엔 많은 한계가 있으며, 그 중에 가장 대표적인 것이 정합의 정확도가 낮아 입모양의 자연스러운 움직임 측면에서 아쉬운측면이 있으며, 사람이 바로 확인 가능할 수 있을 정도의 부자연스러움이 있어 일반 사용자들이 사용하기에는 높은 난이도로 인해 사업화가 쉽지 않은 단점이 있다.However, the existing DCGAN technique using the above advantages is difficult to separately reproduce the mouth shape for voice vocalization for the synthesized video, so it was possible to reproduce the mouth shape of the video only using the voice recorded in the video. This injustice has many limitations in using the image generation technique until actual commercialization because it is necessary to acquire both video and audio at once. There is a disadvantage that it is not easy to commercialize it due to the high level of difficulty for general users to use because it is unnatural enough to be immediately confirmed.

따라서 본 발명에서는 고도화된 LipGAN 기반 가상 아바타 영상을 생성하고, 이를 최소 2개 이상의 영상 공유 플랫폼에 업로드 하는 프로세스를 자동화 하기 위한 방법 및 장치를 제안한다.Therefore, the present invention proposes a method and apparatus for automating the process of generating an advanced LipGAN-based virtual avatar image and uploading it to at least two video sharing platforms.

Claims

a generation unit for generating a new image by overwriting a mouth image extracted using an AI learning model with mouth shape data learned through FACS on a source image;
A sharing unit that automatically uploads the created video to at least two video playback platforms;
Virtual avatar image synthesis technology and sharing automation device including