KR102590025B1

KR102590025B1 - Learning method of face swapping deep learning system that increases learning efficiency through attention mask

Info

Publication number: KR102590025B1
Application number: KR1020230055027A
Authority: KR
Inventors: 류원종; 임정혁; 김준혁; 김활; 정정영
Original assignee: 주식회사 이너버즈
Priority date: 2023-02-04
Filing date: 2023-04-26
Publication date: 2023-10-16
Also published as: KR102590024B1; KR102529214B1

Abstract

본 발명은 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 대한 것으로, (a) 소스 이미지(100)를 임베딩 네트워크(10)에 입력하여 제1 특성 임베딩 벡터(110)를 추출하는 단계;, (b) 타겟 이미지(200)의 픽셀을 호환 사이즈로 변환하여 타겟 데이터(210)로 변환하는 단계;, (c) 상기 타겟 데이터(210)와 상기 제1 특성 임베딩 벡터(110)를 합성하여 제1 합성 변환데이터(300)를 생성하는 단계;, (c') 상기 타겟 데이터(210)와 상기 제1 합성 변환데이터(300)를 어텐션 마스크(310)와 합성하여 제2 합성 변환데이터(320)를 생성하는 단계;, (d) 상기 제2 합성 변환데이터(320)를 스와핑 이미지(400)로 변환하는 합성 이미지 변환단계; 및 (e) 상기 스와핑 이미지(400)를 기초로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계;를 포함하는 구성으로 이루어진다.The present invention relates to a method of learning a face swapping deep learning system, including (a) inputting a source image 100 into an embedding network 10 to extract a first feature embedding vector 110; (b) Converting pixels of the target image 200 to a compatible size and converting them into target data 210; (c) combining the target data 210 and the first feature embedding vector 110 to perform a first composite transformation Generating data 300; (c') combining the target data 210 and the first synthesized converted data 300 with an attention mask 310 to generate second synthesized converted data 320. Step;, (d) a composite image conversion step of converting the second composite conversion data 320 into a swapping image 400; and (e) learning the face swapping deep learning system based on the swapping image 400.

Description

Learning method of face swapping deep learning system that increases learning efficiency through attention mask}

본 발명은 어텐션 마스크를 통해 학습의 효율을 증대시키는 페이스 스와핑 딥러닝 시스템의 학습방법에 대한 것이다. 보다 상세하게는 두 사람의 이미지의 아이덴티티와 어트리뷰트를 합성하는 시스템을 학습의 정도를 차별화하여 학습하는 방법에 대한 것이다.The present invention relates to a learning method for a face swapping deep learning system that increases learning efficiency through an attention mask. More specifically, it is about a method of learning a system that synthesizes the identity and attributes of two people's images by differentiating the degree of learning.

특허문헌 001은 사물 인터넷을 위해 연결된 5G 환경에서 인공지능(artificial intelligence, AI) 알고리즘 및/또는 기계학습(machine learning) 알고리즘을 실행하여 이미지를 합성하기 위한 방법 및 장치가 개시된다. 본 발명의 일 실시예에 따른 이미지 합성 방법은 얼굴 이미지를 포함하는 제 1 이미지를 획득하는 단계와, 미리 훈련된 심층 신경망 모델을 적용하여 제 1 이미지의 특징점을 검출하는 단계와, 제 1 이미지에 합성하기 위한 제 2 이미지를 획득하는 단계와, 제 2 이미지의 경계를 추출하는 단계와, 제 1 이미지의 특징점들의 좌표값을 기반으로 제 2 이미지의 경계에 대응하는 좌표값을 매칭하는 단계와, 제 1 이미지와 제 2 이미지를 병합(merging)하여 출력하는 단계를 포함하는 기술을 제시한다.Patent Document 001 discloses a method and device for synthesizing images by executing an artificial intelligence (AI) algorithm and/or a machine learning algorithm in a 5G environment connected for the Internet of Things. An image synthesis method according to an embodiment of the present invention includes acquiring a first image including a face image, detecting feature points of the first image by applying a pre-trained deep neural network model, and Obtaining a second image for compositing, extracting the boundary of the second image, matching coordinate values corresponding to the boundary of the second image based on the coordinate values of the feature points of the first image, A technology including the step of merging a first image and a second image and outputting them is presented.

특허문헌 002는 AI의 기계학습 기반 얼굴표정 등의 이미지 생성관한 방법에 대한 것이며, 제 2서버가 제 1서버로부터 각 배우의 얼굴 표정 정보를 제공 받아 해당 파일에 대한 AI 영상학습을 수행하고, 상기 AI 영상학습의 결과로 배우의 얼굴 표정에 대응하는 Weight를 생성 및 저장하는 단계, 제 2서버가 제 1서버로부터 출연배우의 얼굴표정 정보를 수신하는 단계, 제 2서버에서 대역 배우의 얼굴표정으로 출연 배우에 대응하는 Weight를 탐색하여 출연 배우의 얼굴 표정을 생성 저장하는 단계, 제 2서버에서 상기 생성된 출연 배우의 얼굴 표정으로 동영상(VOD파일 등)을 생성하는 단계를 포함하는 기술을 제시한다.Patent Document 002 is about a method for generating images such as facial expressions based on AI machine learning, and the second server receives the facial expression information of each actor from the first server, performs AI image learning on the file, and performs AI image learning on the file. As a result of AI image learning, a weight corresponding to the actor's facial expression is created and stored, a second server receives the actor's facial expression information from the first server, and the second server receives the actor's facial expression. We present a technology that includes the steps of searching for the weight corresponding to the actor, generating and storing the facial expression of the actor, and generating a video (VOD file, etc.) with the facial expression of the actor on the second server. .

특허문헌 003은 AI 기반 캐릭터 생성 방법은 AI 기반 캐릭터 생성 시스템에 의해 수행되는 것으로서, 표현형유전체정보 및 캐릭터이미지데이터를 포함하는 학습데이터를 이용하여 캐릭터생성모델을 학습시키는 단계;및 사용자로부터 직접 또는 간접적으로 제공된 표현형유전체정보에 상기 캐릭터생성모델을 적용하여 캐릭터이미지를 생성하는 단계를 포함한다. 캐릭터가 각자의 유전체정보를 갖기 때문에 가상공간에서 캐릭터에 고유성 및 식별력을 부여하는 것이 가능한 기술을 제시한다.Patent Document 003 is an AI-based character creation method that is performed by an AI-based character creation system, including the step of learning a character creation model using learning data including phenotypic genome information and character image data; and directly or indirectly from the user. It includes the step of generating a character image by applying the character creation model to the phenotypic genome information provided. Because each character has their own genomic information, we present a technology that makes it possible to give uniqueness and identification to characters in virtual space.

특허문헌 004는 합성곱 신경망 기반 이미지 처리 시스템에 대한 것이며, 복수의 커널 필터를 이용하여 입력 데이터로부터 물체의 특징 값을 추출하는 합성곱 계층부; 상기 합성곱 계층부에 의해 추출된 특징 값에 대하여 비선형 활성화 함수를 이용하여 변환 작업을 수행하는 활성화 계층부; 상기 활성화 계층부의 출력 값에 대하여 최댓값 풀링(max polling) 연산을 이용하여 차원을 축소하고 잡음을 제거 및 억제하는 풀링 계층부; 상기 풀링 계층부의 출력 값을 이용한 전방향 연산을 통해 상기 입력 데이터에 대한 분류 예측 값을 출력하는 분류 출력계층부; 상기 분류 예측 값과 미리 정해진 목표 값을 비교하여 그 오차 값에 해당하는 손실 값을 산출하는 손실계산 계층부; 상기 손실 값에 대한 편미분 값을 역방향 연산을 통해 계산하여 상기 각 계층부의 매개 변수에 대한 수정 값을 획득하는 역방향 계산부; 및 상기 수정 값, 및 일정량의 학습 데이터들을 통해 도출된 학습률을 이용한 경사 하강법(Gradient Descent Method)을 통하여 상기 매개 변수에 대한 업데이트를 수행하는 분류 학습부를 포함하는 기술을 제시한다.Patent Document 004 relates to a convolutional neural network-based image processing system, which includes a convolutional layer unit that extracts feature values of an object from input data using a plurality of kernel filters; an activation layer unit that performs a transformation operation on the feature values extracted by the convolution layer unit using a non-linear activation function; a pooling layer unit that reduces the dimensionality of the output value of the activation layer unit using a max polling operation and removes and suppresses noise; a classification output layer unit that outputs a classification prediction value for the input data through a forward operation using the output value of the pooling layer unit; a loss calculation layer unit that compares the classification prediction value with a predetermined target value and calculates a loss value corresponding to the error value; a reverse calculation unit that calculates a partial differential value for the loss value through a reverse operation to obtain correction values for the parameters of each layer; and a classification learning unit that performs an update on the parameter through a gradient descent method using the correction value and a learning rate derived from a certain amount of learning data.

KR 10-2236904 (등록일자:2021년 03월 31일)KR 10-2236904 (Registration date: March 31, 2021) KR 10-2021-0112576 (공개일자:2021년 09월 15일)KR 10-2021-0112576 (Publication date: September 15, 2021) KR 10-2022-0155239 (공개일자:2022년 11월 22일)KR 10-2022-0155239 (Publication date: November 22, 2022) KR 10-2068576 (등록일자:2020년 01월 15일)KR 10-2068576 (Registration date: January 15, 2020)

본 발명은 복수의 얼굴 이미지를 하나의 합성 이미지로 변환하는 시스템을 학습하는 방법 및 그 장치를 제공하고자 한다.The present invention seeks to provide a method and device for learning a system for converting multiple face images into one composite image.

또한, 입력되는 각각의 얼굴 이미지에 대하여 특성을 분류한 후 원하는 특성만을 조합하여 합성된 스와핑 이미지를 제공하고자 한다.In addition, we would like to provide a swapping image synthesized by classifying the characteristics of each input face image and then combining only the desired characteristics.

본 발명의 일 실시예에 따른 발명은 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 대한 것이며, (a) 소스 이미지를 임베딩 네트워크에 입력하여 제1 특성 임베딩 벡터를 추출하는 단계;,(b) 타겟 이미지를 호환 사이즈로 변환하여 타겟 데이터로 변환하는 단계;, (c) 상기 타겟 데이터와 상기 제1 특성 임베딩 벡터를 합성하여 제1 합성 변환데이터를 생성하는 단계;, (d) 상기 제1 합성 변환데이터를 스와핑 이미지로 변환하는 합성 이미지 변환단계; 및 (e) 상기 스와핑 이미지를 기초로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계;를 포함하는 구성으로 이루어진다.The invention according to an embodiment of the present invention relates to a method of learning a face swapping deep learning system, comprising: (a) inputting a source image into an embedding network to extract a first feature embedding vector; (b) target image converting to a compatible size and converting it into target data; (c) combining the target data and the first characteristic embedding vector to generate first synthesized converted data; (d) the first synthesized converted data A composite image conversion step of converting into a swapping image; and (e) learning the face swapping deep learning system based on the swapping image.

본 발명의 일 실시예에 따른 발명은 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 대한 것이며, 앞에서 제시한 발명에 있어서, 상기 (c) 단계는 상기 타겟 데이터(X)를 타겟 정규 데이터(Z)로 변환하는 정규화 단계; 및 상기 타겟 정규 데이터(Z)와 상기 제1 특성 임베딩 벡터(110)로부터 추출된 아이덴티티 데이터를 합성하는 역정규화 단계;를 포함하는 구성으로 이루어진다.The invention according to an embodiment of the present invention relates to a method of learning a face swapping deep learning system. In the invention presented above, step (c) converts the target data (X) into target regular data (Z). Normalization step to convert; and a denormalization step of combining the target normal data (Z) and the identity data extracted from the first feature embedding vector (110).

본 발명의 일 실시예에 따른 발명은 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 대한 것이며, 앞에서 제시한 발명에 있어서, 상기 정규화 단계는 하기 [수학식1]으로 계산되는 구성으로 이루어진다.The invention according to an embodiment of the present invention relates to a method of learning a face swapping deep learning system, and in the invention presented above, the normalization step is calculated by the following [Equation 1].

[수학식1][Equation 1]

(단, 과 는 타겟 데이터(X)의 평균과 표준편차를 의미)(step, class means the average and standard deviation of target data (X))

본 발명의 일 실시예에 따른 발명은 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 대한 것이며, 앞에서 제시한 발명에 있어서, 상기 역정규화 단계는 하기 [수학식2]으로 계산되는 구성으로 이루어진다.The invention according to an embodiment of the present invention relates to a method of learning a face swapping deep learning system, and in the invention presented above, the denormalization step is calculated by the following [Equation 2].

[수학식2][Equation 2]

(단, 는 아이덴티티 데이터를 의미)(step, means identity data)

본 발명의 일 실시예에 따른 발명은 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 대한 것이며, 앞에서 제시한 발명에 있어서, 상기 (e) 단계는 (e-1) 상기 스와핑 이미지를 상기 임베딩 네트워크에 입력하여 제2 특성 임베딩 벡터를 추출한 후, 상기 제1 특성 임베딩 벡터와 상기 제2 특성 임베딩 벡터의 차이인 제1 오차를 감소시키는 방향으로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계;를 포함하는 구성으로 이루어진다.The invention according to an embodiment of the present invention relates to a method of learning a face swapping deep learning system. In the invention presented above, step (e) is (e-1) inputting the swapping image to the embedding network. After extracting the second feature embedding vector, learning the face swapping deep learning system in a way to reduce the first error, which is the difference between the first feature embedding vector and the second feature embedding vector. It comes true.

본 발명의 일 실시예에 따른 발명은 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 대한 것이며, 앞에서 제시한 발명에 있어서, 상기 (e) 단계는 (e-2) 상기 스와핑 이미지를 판별 네트워크에 입력하여 계산과정에서 제1 중간값들을 추출하는 단계;, (e-3) 상기 타겟 이미지를 판별 네트워크에 입력하여 계산과정에서 제2 중간값들을 추출하는 단계; 및 (e-4) 상기 제1 중간값과 상기 제2 중간값의 차이인 제2 오차 감소시키는 방향으로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계;를 포함하는, 구성으로 이루어진다.The invention according to an embodiment of the present invention relates to a method of learning a face swapping deep learning system. In the invention presented above, step (e) is (e-2) by inputting the swapping image into a discriminator network. Extracting first intermediate values in a calculation process; (e-3) inputting the target image into a discriminator network to extract second intermediate values in a calculation process; and (e-4) learning the face swapping deep learning system in a direction to reduce a second error, which is the difference between the first median and the second median.

본 발명의 일 실시예에 따른 발명은 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 대한 것이며, (a) 소스 이미지를 임베딩 네트워크에 입력하여 제1 특성 임베딩 벡터를 추출하는 단계;, (a-1)상기 소스 이미지를 3DMM(3D Morphable Model) 네트워크에 입력하여 소스 3D 임베딩 벡터를 추출하고, (a-2)상기 타겟 이미지를 3DMM(3D Morphable Model) 네트워크에 입력하여 타겟 3D 임베딩 벡터를 추출하고, (a-3)상기 소스 3D 임베딩 벡터와 상기 타겟 3D 임베딩 벡터를 합성하여 합성 3D 임베딩 벡터를 생성하고, (a-4)상기 합성 3D 임베딩 벡터를 상기 제1 특성 임베딩 벡터에 연결하여 3D특성 임베딩 벡터를 생성하는 단계;, (b) 타겟 이미지의 픽셀을 호환 사이즈로 변환하여 타겟 데이터로 변환하는 단계; (c) 상기 타겟 데이터와 상기 3D 특성 임베딩 벡터를 합성하여 제2 합성 변환데이터를 생성하는 단계;, (d) 상기 제2 합성 변환데이터를 스와핑 이미지로 변환하는 합성 이미지 변환단계;, (e) 상기 스와핑 이미지를 기초로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계를 포함하는 구성으로 이루어진다.The invention according to an embodiment of the present invention relates to a method of learning a face swapping deep learning system, comprising: (a) inputting a source image into an embedding network to extract a first feature embedding vector; (a-1) Input the source image into a 3DMM (3D Morphable Model) network to extract a source 3D embedding vector, (a-2) Input the target image into a 3DMM (3D Morphable Model) network to extract a target 3D embedding vector, (a-2) a-3) Generate a synthetic 3D embedding vector by combining the source 3D embedding vector and the target 3D embedding vector, and (a-4) Connect the synthetic 3D embedding vector to the first feature embedding vector to create a 3D feature embedding vector. (b) converting pixels of the target image to a compatible size and converting them into target data; (c) generating second composite converted data by combining the target data and the 3D feature embedding vector; (d) composite image conversion step of converting the second composite converted data into a swapping image; (e) It consists of learning the face swapping deep learning system based on the swapping image.

본 발명의 일 실시예에 따른 발명은 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 대한 것이며, (a) 소스 이미지를 임베딩 네트워크에 입력하여 제1 특성 임베딩 벡터를 추출하는 단계;, (b) 타겟 이미지의 픽셀을 호환 사이즈로 변환하여 타겟 데이터로 변환하는 단계;, (c) 상기 타겟 데이터와 상기 제1 특성 임베딩 벡터를 합성하여 제1 합성 변환데이터를 생성하는 단계;, (c') 상기 타겟 데이터와 상기 제1 합성 변환데이터를 어텐션 마스크와 합성하여 제2 합성 변환데이터를 생성하는 단계;, (d) 상기 제2 합성 변환데이터를 스와핑 이미지로 변환하는 합성 이미지 변환단계;, (e) 상기 스와핑 이미지를 기초로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계;를 포함하는 구성으로 이루어진다.The invention according to an embodiment of the present invention relates to a method of learning a face swapping deep learning system, comprising: (a) inputting a source image into an embedding network to extract a first feature embedding vector; (b) target image Converting pixels to a compatible size and converting them into target data; (c) combining the target data and the first feature embedding vector to generate first synthesized converted data; (c') the target data and generating second synthesized converted data by combining the first synthesized converted data with an attention mask; (d) a synthesized image conversion step of converting the second synthesized converted data into a swapping image; (e) the swapping It consists of a configuration including; learning the face swapping deep learning system based on the image.

본 발명의 일 실시예에 따른 발명은 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 대한 것이며, 앞에서 제시한 발명에 있어서, 상기 (c') 단계는 (c'-1) 상기 어텐션 마스크를 시그모이드 네트워크를 기반으로 생성하는 단계;를 포함하는 구성으로 이루어진다.The invention according to an embodiment of the present invention relates to a method of learning a face swapping deep learning system. In the invention presented above, the step (c') is (c'-1) the attention mask is converted to a sigmoid. It consists of a configuration including a step of creating based on a network.

본 발명의 일 실시예에 따른 발명은 페이스 스와핑 딥러닝 시스템을 학습하는 컴퓨터 구현 학습장치에 대한 것으로, 명령을 저장하는 메모리; 및 상기 명령을 실행하도록 구성된 적어도 하나의 프로세서;를 포함하며, 상기 프로세서를 통해 실행되는 상기 명령은 소스 이미지를 임베딩 네트워크(10)에 입력하여 제1 특성 임베딩 벡터를 추출하고, 타겟 이미지를 호환 사이즈로 변환하여 타겟 데이터로 변환하고, 상기 타겟 데이터와 상기 제1 특성 임베딩 벡터를 합성하여 합성 변환데이터를 생성하고, 상기 합성 변환데이터를 스와핑 이미지로 변환하고, 상기 스와핑 이미지를 기초로 상기 페이스 스와핑 딥러닝 시스템을 학습하는, 컴퓨터 구현 학습장치로 구성된다.The invention according to an embodiment of the present invention relates to a computer-implemented learning device for learning a face swapping deep learning system, comprising: a memory for storing instructions; and at least one processor configured to execute the instructions, wherein the instructions executed through the processor input a source image into the embedding network 10 to extract a first feature embedding vector and convert the target image into a compatible size. Convert to target data, generate synthesized converted data by combining the target data and the first feature embedding vector, convert the synthesized converted data into a swapping image, and perform the face swapping dip based on the swapping image. It consists of a computer-implemented learning device that learns the learning system.

본 발명은 타겟 이미지의 어트리뷰트와 소스 이미지의 아이덴티티를 조합하는 시스템을 효과적으로 학습하는 방법을 제공할 수 있다.The present invention can provide a method for effectively learning a system that combines the attributes of a target image and the identity of a source image.

또한, 본 발명은 어텐션 마스크를 통해 집중적으로 학습되어야 할 구성을 효과적으로 학습시킬 수 있는 방법을 제공할 수 있다.Additionally, the present invention can provide a method for effectively learning a configuration to be intensively learned through an attention mask.

도 1은 본 발명의 일 실시예에 따른 페이스 스와핑 딥러닝 시스템의 작동 순서도이다.
도 2는 본 발명의 일 실시예에 따른 소스 이미지가 임베딩 네트워크에 입력되어 제1 특성 임베딩 벡터가 생성되는 과정을 도시한 개요도이다.
도 3은 본 발명의 일 실시예에 따른 믹스 블럭의 계산원리를 표현한 세부 구성도이다.
도 4는 본 발명의 일 실시예에 따른 제1 오차를 도시한 블록도이다.
도 5는 본 발명의 일 실시예에 따른 제2 오차를 도시한 블록도이다.
도 6 은 본 발명의 일 실시예에 따른 제1 합성 변환데이터와 어텐션 마스크의 생성과정을 나타낸 개요도이다.
도 7은 본 발명의 일 실시예에 따른 어텐션 마스크를 적용하여 제2 합성 변환데이터를 생성하는 과정을 나타낸 개요도이다.1 is a flowchart of the operation of a face swapping deep learning system according to an embodiment of the present invention.
Figure 2 is a schematic diagram illustrating a process in which a source image is input to an embedding network and a first feature embedding vector is generated according to an embodiment of the present invention.
Figure 3 is a detailed configuration diagram expressing the calculation principle of a mix block according to an embodiment of the present invention.
Figure 4 is a block diagram showing a first error according to an embodiment of the present invention.
Figure 5 is a block diagram showing a second error according to an embodiment of the present invention.
Figure 6 is a schematic diagram showing the generation process of first synthetic conversion data and attention mask according to an embodiment of the present invention.
Figure 7 is a schematic diagram showing the process of generating second synthetic converted data by applying an attention mask according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 설명되는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 아래에서 제시되는 실시 예들로 한정되는 것이 아니라, 서로 다른 다양한 형태로 구현될 수 있고, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. The advantages and features of the present invention and methods for achieving them will become clear by referring to the embodiments described in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments presented below, but may be implemented in various different forms, and should be understood to include all conversions, equivalents, and substitutes included in the spirit and technical scope of the present invention. .

아래에 제시되는 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발 명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.The embodiments presented below are provided to ensure that the disclosure of the present invention is complete and to fully inform those skilled in the art of the scope of the invention. In describing the present invention, if it is determined that a detailed description of related known technologies may obscure the gist of the present invention, the detailed description will be omitted.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. The terms used in this application are only used to describe specific embodiments and are not intended to limit the invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

이하, 본 발명에 따른 실시 예들을 첨부된 도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성 요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, identical or corresponding components are assigned the same drawing numbers and duplicate descriptions thereof are omitted. I decided to do it.

(실시예 1-1) 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 있어서, (a) 소스 이미지(100)를 임베딩 네트워크(10)에 입력하여 제1 특성 임베딩 벡터(110)를 추출하는 단계;, (b) 타겟 이미지(200)를 호환 사이즈로 변환하여 타겟 데이터(210)로 변환하는 단계;, (c) 상기 타겟 데이터(210)와 상기 제1 특성 임베딩 벡터(110)를 합성하여 제1 합성 변환데이터(300)를 생성하는 단계;, (d) 상기 제1 합성 변환데이터(300)를 스와핑 이미지(400)로 변환하는 합성 이미지 변환단계;, (e) 상기 스와핑 이미지(400)를 기초로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계;를 포함하는, 페이스 스와핑 딥러닝 시스템을 학습하는 방법.(Example 1-1) In a method of learning a face swapping deep learning system, (a) inputting a source image 100 into an embedding network 10 to extract a first feature embedding vector 110; (b) converting the target image 200 to a compatible size and converting it into target data 210; (c) combining the target data 210 and the first feature embedding vector 110 to perform a first synthesis Generating converted data 300; (d) Composite image conversion step of converting the first composite converted data 300 into a swapping image 400; (e) Based on the swapping image 400 A method of learning the face swapping deep learning system, including the step of learning the face swapping deep learning system.

본 발명의 예시적인 실시예는 여러 사람의 특성들을 합성하는 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 대한 것일 수 있다. An exemplary embodiment of the present invention may relate to a method of learning a face swapping deep learning system that synthesizes characteristics of multiple people.

본 발명의 예시적인 실시예에 따른 페이스 스와핑 딥러닝 시스템은 도 1 내지 도 7을 참조하면, 컨볼루션 블럭(30), 컨볼루션 레이어(31), 리니어 레이어(20), 믹스 블럭(40) 등의 디지털 데이터를 변환 및 합성하는 모듈의 결합을 의미할 수 있다. 따라서 본 발명에서 페이스 스와핑 딥러닝 시스템을 학습한다는 것은 위와 같은 여러 모듈의 가중치, 노드 등을 학습하는 것을 의미할 수 있다.Referring to FIGS. 1 to 7, the face swapping deep learning system according to an exemplary embodiment of the present invention includes a convolution block 30, a convolution layer 31, a linear layer 20, a mix block 40, etc. It can mean a combination of modules that convert and synthesize digital data. Therefore, learning the face swapping deep learning system in the present invention may mean learning the weights, nodes, etc. of various modules as described above.

본 발명은 소스 이미지(100)를 임베딩 네트워크(10)에 입력하여 제1 특성 임베딩 벡터(110)를 추출하는 단계, 타겟 이미지(200)의 픽셀을 호환 사이즈로 변환하여 제1 변환 타겟 이미지(200)로 변환하는 단계, 상기 제1 변환 타겟 이미지(200)와 상기 제1 특성 임베딩 벡터(110)를 합성하여 합성 변환 이미지를 생성하는 단계, 상기 합성 변환 이미지를 스와핑 이미지(400)로 변환하는 합성 이미지 변환단계를 포함할 수 있다.The present invention includes the steps of inputting the source image 100 into the embedding network 10 to extract the first feature embedding vector 110, converting the pixels of the target image 200 to a compatible size to produce a first conversion target image 200. ), generating a composite transformation image by combining the first transformation target image 200 and the first feature embedding vector 110, and converting the synthesis transformation image into a swapping image 400. An image conversion step may be included.

보다 상세하게 설명하면, 본 발명의 예시적인 실시예는 소스 이미지(100)의 아이덴티티(사람이 눈, 코 등의 형상 정보)와 타겟 이미지(200)의 어트리뷰트(사람 얼굴의 각도, 조명 등의 정보)를 합성하는 딥러닝 모델의 학습 방법에 대한 것일 수 있다. In more detail, an exemplary embodiment of the present invention includes the identity of the source image 100 (shape information of a person's eyes, nose, etc.) and the attributes of the target image 200 (information such as the angle of a person's face, lighting, etc.) ) It may be about the learning method of a deep learning model that synthesizes.

이하, 각 과정에 대한 구체적인 방법은 후술한다.Hereinafter, specific methods for each process will be described later.

(실시예 1-2) 실시예 1-1에 있어서, 상기 소스 이미지(100)와 상기 타겟 이미지(200) 및 상기 스와핑 이미지(400)는 RGB 3채널 사이즈의 이미지일 수 있다.(Example 1-2) In Example 1-1, the source image 100, the target image 200, and the swapping image 400 may be RGB 3-channel images.

본 발명의 예시적인 실시예에 있어서, 소스 이미지(100), 타겟 이미지(200), 스와핑 이미지(400)는 의 픽셀 사이즈일 수 있다.In an exemplary embodiment of the present invention, the source image 100, target image 200, and swapping image 400 are It may be a pixel size of .

현재 산업계에서 디지털로 표현되는 이미지는 RGB로 이루어진 이미지일 수 있다. 이는 3개의 채널로 이루어진 이미지일 수 있으며, 따라서 페이스 스와핑에 이용되는 타겟 이미지(200)와 소스 이미지(100), 최종 합성된 스와핑 이미지(400)는 3 채널 사이즈의 이미지일 수 있다.Currently, images expressed digitally in the industry may be images composed of RGB. This may be an image composed of three channels, and therefore the target image 200 and source image 100 used for face swapping, and the final synthesized swapping image 400 may be a three-channel image.

(실시예 1-3) 실시예 1-1에 있어서, 상기 호환 사이즈는 의 픽셀 사이즈일 수 있다.(Example 1-3) In Example 1-1, the compatible size is It may be a pixel size of .

본 발명의 예시적인 실시예에 있어서, 호환 사이즈는 타겟 이미지(200)가 제1 특성 임베딩 벡터(110)와 합성되기 전에 구성되어지는 사이즈일 수 있다. 도 1을 참조하면, 픽셀 사이즈를 호환 사이즈로 설정할 수 있다. 다만, 호환 사이즈는 특정되어지는 것이 아니라, 호환 사이즈로 변환된 타겟 데이터(210)와 제1 특성 임베딩 벡터(110)를 합성하는 과정이 보다 효율적일 수 있다면 호환 사이즈는 다양하게 설정되어질 수 있다.In an exemplary embodiment of the present invention, the compatible size may be a size configured before the target image 200 is synthesized with the first feature embedding vector 110. Referring to Figure 1, You can set the pixel size to a compatible size. However, the compatible size is not specified, and the compatible size can be set in various ways if the process of combining the target data 210 converted to the compatible size and the first feature embedding vector 110 can be more efficient.

또한, 본 발명의 예시적인 실시예에 있어서, 제1 특성 임베딩 벡터(110)는 이미 충분히 학습되어진 임베딩 네트워크(10)에 소스 이미지(100)를 입력하였을 때, 출력되는 벡터일 수 있다. 이러한 제1 특성 임베딩 벡터(110)의 픽셀 사이즈는 로 설정될 수 있고, 이는 512개의 숫자로 이루어진 정보일 수 있다.Additionally, in an exemplary embodiment of the present invention, the first feature embedding vector 110 may be a vector output when the source image 100 is input to the already sufficiently learned embedding network 10. The pixel size of this first feature embedding vector 110 is It can be set to , and this can be information consisting of 512 numbers.

따라서, 타겟 데이터(210)의 채널의 크기와 제1 특성 임베딩 벡터(110)의 채널의 크기는 동일하게 맞춰질 수 있다. 동일한 크기의 채널을 갖는 두 데이터는 합성곱의 과정으로 합성되어질 수 있다.Accordingly, the channel size of the target data 210 and the channel size of the first feature embedding vector 110 may be set to be the same. Two data with channels of the same size can be combined through a convolution process.

(실시예 2-1) 상기 (c) 단계는 상기 타겟 데이터(210)(X)를 타겟 정규 데이터(220)(Z)로 변환하는 정규화 단계;, 상기 타겟 정규 데이터(220)(Z)와 상기 제1 특성 임베딩 벡터(110)로부터 추출된 아이덴티티 데이터(120)를 합성하는 역정규화 단계;를 포함한다.(Example 2-1) Step (c) is a normalization step of converting the target data 210 (X) into target normal data 220 (Z);, the target normal data 220 (Z) It includes a denormalization step of synthesizing the identity data 120 extracted from the first feature embedding vector 110.

본 발명의 예시적인 실시예에 있어서, 타겟 데이터(210)와 제1 특성 임베딩 벡터(110)를 합성하여 합성 변환데이터를 생성하는 단계는 정규화 단계와 역정규화 단계를 포함할 수 있다. 정규화 단계와 역정규화 단계는 타겟 데이터(210)를 정규분포의 형태로 구성하고, 이러한 정규분포의 형태로 구성된 타겟 정규 데이터(220)에 제1 특성 임베딩 벡터(110)로부터 추출된 아이덴티티 데이터(120)를 합성하여 제1 합성 변환데이터(300)(X')를 생성하는 단계를 의미할 수 있다.In an exemplary embodiment of the present invention, the step of generating synthesized converted data by combining the target data 210 and the first feature embedding vector 110 may include a normalization step and a denormalization step. In the normalization step and the denormalization step, the target data 210 is configured in the form of a normal distribution, and the identity data 120 extracted from the first feature embedding vector 110 is added to the target normal data 220 configured in the form of this normal distribution. ) may refer to the step of generating first synthesized converted data 300 (X') by synthesizing.

(실시예 2-2) 실시예 2-1에 있어서, 상기 정규화 단계는 하기 [수학식1]으로 계산되는 과정이다.(Example 2-2) In Example 2-1, the normalization step is a process calculated by the following [Equation 1].

[수학식1][Equation 1]

(단, 과 는 타겟 데이터(210)의 평균과 표준편차를 의미)(step, class means the average and standard deviation of target data (210))

(실시예 2-3) 실시예 2-2에 있어서, 상기 역정규화 단계는 하기 [수학식2]으로 계산되는 과정이다.(Example 2-3) In Example 2-2, the denormalization step is a process calculated by the following [Equation 2].

[수학식2][Equation 2]

(단, 는 아이덴티티 데이터(120)를 의미)(step, means identity data (120))

(실시예 2-4) 실시예 2-3에 있어서, 상기 역정규화 단계는 상기 제1 특성 임베딩 벡터(110)를 리니어 레이어(20)에 입력하여 상기 아이덴티티 데이터(120)를 추출하는 단계;를 포함한다.(Example 2-4) In Example 2-3, the denormalization step includes extracting the identity data 120 by inputting the first feature embedding vector 110 into the linear layer 20. Includes.

본 발명의 예시적인 실시예에 있어서, 제1 특성 임베딩 벡터(110)는 복수의 수를 포함하는 데이터일 수 있다. 이러한 제1 특성 임베딩 벡터(110)로부터 아이덴티티 데이터(120)인 를 생성한 후, 아이덴티티 데이터(120)를 이용하여 타겟 정규 데이터(220)를 제1 합성 변환데이터(300)로 변환할 수 있다.In an exemplary embodiment of the present invention, the first feature embedding vector 110 may be data including a plurality of numbers. From this first feature embedding vector 110, the identity data 120 is After generating, the target regular data 220 can be converted into first synthetic conversion data 300 using the identity data 120.

본 발명의 페이스 스와핑 딥러닝 시스템을 학습하는 과정에서 리니어 레이어(20)의 계산 알고리즘을 구성하는 가중치값들이 학습되어질 수 있다.In the process of learning the face swapping deep learning system of the present invention, the weight values that make up the calculation algorithm of the linear layer 20 can be learned.

보다 구체적으로 설명하자면,To explain more specifically,

제1 특성 임베딩 벡터(110)가 로 이루어진 벡터라면, 512개의 숫자로 표현되는 구성일 수 있다. 이러한 512개의 숫자를 통해 다음과 같이 아이덴티티 데이터(120)인 를 생성할 수 있다.The first feature embedding vector 110 is If it is a vector consisting of 512 numbers It may be a configuration expressed as . Through these 512 numbers, the identity data (120) is as follows: can be created.

따라서, 제1 특성 임베딩 벡터(110)로부터 아이덴티티 데이터(120)인 를 어떻게 추출하느냐는, 리니어 레이어(20)의 가중치인 값들에 따라 결정되어질 수 있다. 이러한 가중치들이 어떻게 구성되는지에 따라 본 발명의 (c)단계에서 생성되는 제1 합성 변환데이터(300)값이 결정되어질 수 있다. 따라서 본 발명의 페이스 스와핑 딥러닝 시스템을 학습하는 방법은 이러한 가중치들을 학습하는 것을 포함할 수 있다.Therefore, the identity data 120 from the first feature embedding vector 110 How to extract is the weight of the linear layer (20). It can be determined according to the values. The value of the first synthetic conversion data 300 generated in step (c) of the present invention can be determined depending on how these weights are configured. Therefore, the method of learning the face swapping deep learning system of the present invention may include learning these weights.

(실시예3-1) 실시예 1-1에 있어서, 상기 (e) 단계는(Example 3-1) In Example 1-1, step (e) was

(e-1) 상기 스와핑 이미지(400)를 상기 임베딩 네트워크(10)에 입력하여 제2 특성 임베딩 벡터(410)를 추출한 후, 상기 제1 특성 임베딩 벡터(110)와 상기 제2 특성 임베딩 벡터(410)의 차이인 제1 오차(1)를 감소시키는 방향으로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계;를 포함한다.(e-1) After inputting the swapping image 400 into the embedding network 10 and extracting the second feature embedding vector 410, the first feature embedding vector 110 and the second feature embedding vector ( 410), learning the face swapping deep learning system in the direction of reducing the first error (1), which is the difference.

본 발명의 예시적인 실시예에 있어서, 제1 오차(1)는 아이덴티티와 관련된 오차를 의미할 수 있다. 충분히 학습된다면, 소스 이미지(100)와 타겟 이미지(200)가 합성된 스와핑 이미지(400)의 아이덴티티는 소스 이미지(100)의 아이덴티티와 동일하고, 스와핑 이미지(400)의 어트리뷰트는 타겟 이미지(200)와 동일하게 구성될 수 있다. 위와 같은 결과가 도출된다면, 본 발명의 페이스 스와핑 딥러닝 시스템이 가장 효과적으로 학습된 것으로 볼 수 있다.In an exemplary embodiment of the present invention, the first error 1 may refer to an error related to identity. If sufficiently learned, the identity of the swapping image 400, which is a composite of the source image 100 and the target image 200, is the same as the identity of the source image 100, and the attribute of the swapping image 400 is the target image 200. It can be configured in the same way as . If the above results are obtained, the face swapping deep learning system of the present invention can be considered to have been learned most effectively.

<실시예 3-1>에서 제1 오차(1)를 감소시키는 방향으로 본 발명의 시스템의 학습시킨다는 것은 페이스 스와핑 딥러닝 시스템이 소스 이미지(100)와 타겟 이미지(200)를 합성함에 있어, 소스 이미지(100)의 아이덴티티 값을 합성 이미지인 스와핑 이미지(400)에서도 유지시켜줄 수 있도록 학습하는 것을 의미할 수 있다.In <Example 3-1>, learning the system of the present invention in the direction of reducing the first error (1) means that when the face swapping deep learning system synthesizes the source image 100 and the target image 200, the source image 100 and the target image 200 are combined. This may mean learning so that the identity value of the image 100 can be maintained in the swapping image 400, which is a synthetic image.

(실시예 4-1) 실시예 3-1에 있어서, 상기 (e) 단계는(Example 4-1) In Example 3-1, step (e) was

(e-2) 상기 스와핑 이미지(400)를 판별 네트워크(50)에 입력하여 계산과정에서 제1 중간값(51)들을 추출하는 단계;(e-2) inputting the swapping image 400 into the discrimination network 50 and extracting first intermediate values 51 during the calculation process;

(e-3) 상기 타겟 이미지(200)를 판별 네트워크(50)에 입력하여 계산과정에서 제2 중간값(52)들을 추출하는 단계;(e-3) inputting the target image 200 into the discrimination network 50 and extracting second intermediate values 52 during the calculation process;

(e-4) 상기 제1 중간값(51)과 상기 제2 중간값(52)의 차이인 제2 오차(2)를 감소시키는 방향으로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계;(e-4) learning the face swapping deep learning system in a direction to reduce a second error (2), which is the difference between the first median value (51) and the second median value (52);

를 포함한다.Includes.

본 발명의 예시적인 실시예에 있어서, 제2 오차(2)는 어트리뷰트와 관련된 로스를 포함하는 의미로 구성될 수 있다.In an exemplary embodiment of the present invention, the second error 2 may be configured to include a loss related to an attribute.

<실시예 4-1>에서 제2 오차(2)를 감소시키는 방향으로 본 발명의 시스템을 학습시킨다는 것은 페이스 스와핑 딥러닝 시스템이 소스 이미지(100)와 타겟 이미지(200)를 합성함에 있어, 타겟 이미지(200)의 어트리뷰트 값을 합성 이미지인 스와핑 이미지(400)에서도 유지시켜줄 수 있도록 학습하는 것을 포함할 수 있다.In <Example 4-1>, learning the system of the present invention in the direction of reducing the second error 2 means that when the face swapping deep learning system synthesizes the source image 100 and the target image 200, the target This may include learning to maintain the attribute values of the image 200 in the swapping image 400, which is a synthetic image.

또한, 본 발명의 <실시예 4-1>을 참조하면, 제2 오차(2)를 감소시키는 방향으로 상기 페이스 스와핑 딥러닝 시스템을 학습하면, 타겟 이미지(200)의 아이덴티티에 관한 정보가 스와핑 이미지(400)에 반영되는 방향으로 학습이 이루어질 가능성이 있을 수 있다. 그러나, 제1 오차(1)를 감소시키는 방향으로 페이스 스와핑 딥러닝 시스템이 학습되는 정도가 보다 우세하게 반영되어, 아이덴티티에 관한 정보는 타겟 이미지(200)에 내포되어 있는 정보가 아닌, 소스 이미지(100)에 내포되어 있는 정보가 반영되도록 학습되어질 수 있다.In addition, referring to <Example 4-1> of the present invention, when the face swapping deep learning system is trained in the direction of reducing the second error 2, information about the identity of the target image 200 is stored in the swapping image. There may be a possibility that learning occurs in the direction reflected in (400). However, the degree to which the face swapping deep learning system is learned in the direction of reducing the first error (1) is more predominantly reflected, and the information about the identity is not the information contained in the target image 200, but the source image ( 100) can be learned to reflect the information contained in it.

(실시예 4-2) 실시예 4-1에 있어서, 상기 판별 네트워크(50)는 판별기(discriminator)로 구성된다.(Example 4-2) In Example 4-1, the discrimination network 50 is composed of a discriminator.

판별기(discriminator)는 입력되는 이미지가 사람의 이미지인지 아닌지 판별하는 네트워크일 수 있다. 판별 네트워크(50)는 입력되는 이미지에 대하여 0 내지 1 사이의 숫자를 출력하는 네트워크일 수 있다. 만약 진실(real)로 판단되면 1, 거짓(flase)으로 판단되면 0의 숫자를 출력한다.The discriminator may be a network that determines whether an input image is an image of a person or not. The discrimination network 50 may be a network that outputs a number between 0 and 1 for an input image. If it is judged to be true, the number 1 is output, and if it is judged to be false, the number 0 is output.

판별 네트워크(50)는 최종 출력값인 0 내지 1의 숫자를 출력하는 과정에서 중간값들을 생산할 수 있다. 여러 단계의 중간값들은 입력 이미지를 0 내지 1의 값으로 변환하는 과정에서 연속적으로 생성되어지는 데이터값일 수 있다.The discrimination network 50 may produce intermediate values in the process of outputting numbers from 0 to 1, which are final output values. The intermediate values of various stages may be data values that are continuously generated in the process of converting the input image to values of 0 to 1.

타겟 이미지(200)를 판별 네트워크(50)에 입력하면 타겟 이미지(200)는 본래 사람의 이미지이기 때문에 최종 출력값은 1에 가까운 값이 도출될 수 있다. When the target image 200 is input to the discrimination network 50, the final output value may be close to 1 because the target image 200 is originally an image of a person.

다만, 스와핑 이미지(400)는 타겟 이미지(200)와 소스 이미지(100)가 합성된 이미지이므로, 본 발명의 페이스 스와핑 딥러닝 시스템이 충분히 학습되기 전에는 사람의 형상을 띄지 않을 수 있다. 0에 가까운 출력값이 도출될 수 있다.However, since the swapping image 400 is a composite image of the target image 200 and the source image 100, it may not appear human-like until the face swapping deep learning system of the present invention is sufficiently trained. An output value close to 0 can be derived.

다만, 본 발명의 판별 네트워크(50)는 0 내지 1의 출력값을 도출하기 위해 수행되는 구성은 아닐 수 있다. 단지 출력값을 생성하는 과정에서 생산되는 중간값들을 이용하여 본 발명의 페이스 스와핑 딥러닝 시스템을 학습하는 것일 수 있다.However, the discrimination network 50 of the present invention may not be configured to derive an output value of 0 to 1. The face swapping deep learning system of the present invention may simply be learned using intermediate values produced in the process of generating output values.

(실시예 5-1) 실시예 1-1에 있어서, 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 있어서,(Example 5-1) In Example 1-1, in the method of learning a face swapping deep learning system,

(a) 소스 이미지(100)를 임베딩 네트워크(10)에 입력하여 제1 특성 임베딩 벡터(110)를 추출하는 단계;(a) inputting the source image 100 into the embedding network 10 to extract a first feature embedding vector 110;

(a-1)상기 소스 이미지(100)를 3DMM(3D Morphable Model) 네트워크에 입력하여 소스 3D 임베딩 벡터를 추출하고,(a-1) Input the source image 100 into a 3DMM (3D Morphable Model) network to extract the source 3D embedding vector,

(a-2)상기 타겟 이미지(200)를 3DMM(3D Morphable Model) 네트워크에 입력하여 타겟 3D 임베딩 벡터를 추출하고,(a-2) Input the target image 200 into a 3DMM (3D Morphable Model) network to extract a target 3D embedding vector,

(a-3)상기 소스 3D 임베딩 벡터와 상기 타겟 3D 임베딩 벡터를 합성하여 합성 3D 임베딩 벡터를 생성하고,(a-3) Generating a synthetic 3D embedding vector by combining the source 3D embedding vector and the target 3D embedding vector,

(a-4)상기 합성 3D 임베딩 벡터를 상기 제1 특성 임베딩 벡터(110)에 연결하여 3D특성 임베딩 벡터를 생성하는 단계;(a-4) generating a 3D feature embedding vector by connecting the synthesized 3D embedding vector to the first feature embedding vector 110;

(b) 타겟 이미지(200)의 픽셀을 호환 사이즈로 변환하여 타겟 데이터(210)로 변환하는 단계;(b) converting pixels of the target image 200 to a compatible size and converting them into target data 210;

(c) 상기 타겟 데이터(210)와 상기 3D 특성 임베딩 벡터(110)를 합성하여 제2 합성 변환데이터(320)를 생성하는 단계;(c) generating second synthesized converted data 320 by combining the target data 210 and the 3D feature embedding vector 110;

(d) 상기 제2 합성 변환데이터(320)를 스와핑 이미지(400)로 변환하는 합성 이미지 변환단계;(d) a composite image conversion step of converting the second composite conversion data 320 into a swapping image 400;

(e) 상기 합성 이미지(400)를 기초로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계를 포함하는, 페이스 스와핑 딥러닝 시스템을 학습하는 방법.(e) A method of learning the face swapping deep learning system, including the step of learning the face swapping deep learning system based on the composite image 400.

본 발명의 예시적인 실시예에 있어서, 3DMM 네트워크에 소스 이미지 또는 타겟 이미지를 입력하면, 이미지의 3D정보를 포함하는 3D 임베딩 벡터가 출력될 수 있다. 3D 임베딩 벡터는 257개의 숫자로 이루어진 벡터일 수 있다.In an exemplary embodiment of the present invention, when a source image or target image is input to a 3DMM network, a 3D embedding vector including 3D information of the image may be output. The 3D embedding vector may be a vector consisting of 257 numbers.

3DMM은 변형가능한 3D 움직임 모델을 의미할 수 있으며, 3차원 모델에 다양한 기법을 적용하여 움직임 및/또는 표정을 생성하는 모델을 의미할 수 있다. 여기서, 변형가능한 3D 움직임 모델은 변형가능한 3D 모델(Morphable 3D model) 기법이 적용되는 애니메이션에 포함된 3D 움직임 모델을 지칭할 수 있다. 예를 들어, 변형가능한 3D 움직임 모델에서, 객체 내에서 3D 형상 및/또는 텍스처 변형이 지속적으로 매개 변수화되어, 저차원 매개변수 공간 및/또는 텍스처가 적용된 3D 모델의 고차원 공간 간의 매핑이 설정될 수 있다.3DMM may refer to a deformable 3D movement model, and may refer to a model that generates movement and/or facial expressions by applying various techniques to a 3D model. Here, the deformable 3D motion model may refer to a 3D motion model included in an animation to which the Morphable 3D model technique is applied. For example, in a deformable 3D motion model, 3D shape and/or texture transformations within an object may be continuously parameterized, establishing a mapping between the lower-dimensional parameter space and/or the higher-dimensional space of the textured 3D model. there is.

다만, 본 발명에서 3DMM 네트워크란, 어떠한 이미지를 입력하였을 때, 3DMM 모델을 기반으로 하여 1차원의 3D 임베딩 벡터를 추출하는 네트워크를 의미할 수 있다.However, in the present invention, a 3DMM network may mean a network that extracts a one-dimensional 3D embedding vector based on a 3DMM model when an image is input.

따라서, 소스 이미지를 3DMM 네트워크에 입력하여 소스 3D임베딩 벡터를 추출하고, 타겟 이미지를 3DMM 네트워크에 입력하여 타겟 3D 임베딩 벡터를 추출한 후, 두 3D 임베딩 벡터를 합성하여 합성 3D임베딩 벡터를 생성할 수 있다.Therefore, the source image can be input to the 3DMM network to extract the source 3D embedding vector, the target image can be input to the 3DMM network to extract the target 3D embedding vector, and then the two 3D embedding vectors can be combined to generate a synthetic 3D embedding vector. .

소스 3D임베딩 벡터에서부터는 소스 이미지의 아이덴티티와 관련된 정보를 추출하고, 타겟 3D임베딩 벡터에서부터는 타겟 이미지의 어트리뷰트와 관련된 정보를 추출하여 합성할 수 있다.Information related to the identity of the source image can be extracted from the source 3D embedding vector, and information related to the attributes of the target image can be extracted and synthesized from the target 3D embedding vector.

3DMM네트워크에서 추출된 3D 임베딩 벡터는 전술한 임베딩 네트워크(10)에서 추출된 특성 임베딩 벡터들과는 다른 알고리즘에 의하여 추출된 벡터일 수 있다.The 3D embedding vector extracted from the 3DMM network may be a vector extracted by a different algorithm from the feature embedding vectors extracted from the above-described embedding network 10.

이와 같이, 3DMM네트워크는 전술한 바와 같이 이미지의 3D정보를 추출하는 네트워크일 수 있다. 나아가 합성 3D 임베딩 벡터와 제1 특성 임베딩 벡터(110)를 연결하여 3D특성 임베딩 벡터를 생성할 수 있으며, 3D특성 임베딩 벡터를 이용하여 <실시예 1-1>에서와 같이 타겟 데이터와 합성하여 제2 합성 변환데이터(320)를 생성할 수 있다.In this way, the 3DMM network may be a network that extracts 3D information of images as described above. Furthermore, a 3D feature embedding vector can be created by connecting the synthesized 3D embedding vector and the first feature embedding vector 110, and the 3D feature embedding vector is used to synthesize the target data as in <Example 1-1>. 2 Synthetic conversion data 320 can be generated.

마지막으로 제2 합성 변환데이터(320)를 기반으로 최종 이미지인 스와핑 이미지(400)을 생성할 수 있다.Finally, the swapping image 400, which is the final image, can be generated based on the second synthetic conversion data 320.

다시 말해, 본 발명은 이미지의 3D에 관한 특성도 반영하여 본 발명의 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 대한 것일 수 있다.In other words, the present invention may be about a method of learning the face swapping deep learning system of the present invention by also reflecting the 3D characteristics of the image.

(실시예 6-1) 페이스 스와핑 딥러닝 시스템을 학습하는 방법에 있어서, (a) 소스 이미지(100)를 임베딩 네트워크(10)에 입력하여 제1 특성 임베딩 벡터(110)를 추출하는 단계;(Example 6-1) A method for learning a face swapping deep learning system, comprising: (a) inputting a source image 100 into an embedding network 10 to extract a first feature embedding vector 110;

(c) 상기 타겟 데이터(210)와 상기 제1 특성 임베딩 벡터(110)를 합성하여 제1 합성 변환데이터(300)를 생성하는 단계;(c) generating first synthesized converted data 300 by combining the target data 210 and the first feature embedding vector 110;

(c') 상기 타겟 데이터(210)와 상기 제1 합성 변환데이터(300)를 어텐션 마스크(310)와 합성하여 제2 합성 변환데이터(320)를 생성하는 단계;(c') combining the target data 210 and the first synthesized converted data 300 with an attention mask 310 to generate second synthesized converted data 320;

(e) 상기 스와핑 이미지(400)를 기초로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계;(e) learning the face swapping deep learning system based on the swapping image 400;

를 포함한다.Includes.

(실시예 6-2) 실시예 6-1에 있어서, 상기 (c') 단계는(Example 6-2) In Example 6-1, step (c') was

(c'-1) 상기 어텐션 마스크(310)를 시그모이드 네트워크(60)를 기반으로 생성하는 단계;를 포함한다.(c'-1) generating the attention mask 310 based on the sigmoid network 60.

본 발명의 예시적인 실시예에 있어서, 시그모이드 네트워크(60)는 시그모이드 함수를 기반으로 0내지 1의 출력값을 생성하는 네트워크일 수 있다.In an exemplary embodiment of the present invention, the sigmoid network 60 may be a network that generates an output value of 0 to 1 based on a sigmoid function.

시그모이드 함수는 입력값 를 입력하였을 때, 를 출력하는 함수이며, 의 값은 하기의 [수학식3]과 같이 0 내지 1의 값으로 구성된다.Sigmoid function inputs When entered, It is a function that outputs, The value of consists of values from 0 to 1 as shown in [Equation 3] below.

[수학식3][Equation 3]

(실시예 6-3) 실시예 6-2에 있어서, (Example 6-3) In Example 6-2,

(c'-2) 생성된 상기 어텐션 마스크(310)의 함수를 M이라고 할 때, 제2 합성 변환데이터(320)는 하기의 [수학식4]으로 계산되는 단계;(c'-2) assuming that the generated function of the attention mask 310 is M, the second synthesized converted data 320 is calculated using Equation 4 below;

[수학식4][Equation 4]

(단, X는 타겟 데이터(200), X'은 제1 합성 변환데이터(300), Y는 제2 합성 변환데이터(320)이다.)(However, X is the target data 200,

를 포함한다.Includes.

(실시예6-4) 실시예 6-1에 있어서, (e) 상기 스와핑 이미지(400)를 상기 임베딩 네트워크(10)에 입력하여 제2 특성 임베딩 벡터(410)를 추출한 후, 상기 제1 특성 임베딩 벡터(110)와 상기 제2 특성 임베딩 벡터(410)의 차이인 제1 오차(1)를 감소시키는 방향으로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계;를 포함한다.(Example 6-4) In Example 6-1, (e) the swapping image 400 is input to the embedding network 10 to extract the second feature embedding vector 410, and then the first feature It includes: learning the face swapping deep learning system in a direction to reduce the first error 1, which is the difference between the embedding vector 110 and the second feature embedding vector 410.

(실시예 6-5) 실시예 6-4에 있어서, (g) 상기 스와핑 이미지(400)를 판별 네트워크(50)에 입력하여 계산과정에서 제1 중간값(51)들을 추출하는 단계;, (h) 상기 타겟 이미지(200)를 판별 네트워크(50)에 입력하여 계산과정에서 제2 중간값(52)들을 추출하는 단계;, (i) 상기 제1 중간값(51)과 상기 제2 중간값(52)의 차이인 제2 오차(2)를 감소시키는 방향으로 상기 페이스 스와핑 딥러닝 시스템을 학습하는 단계;(Example 6-5) In Example 6-4, (g) inputting the swapping image 400 into the discrimination network 50 and extracting first intermediate values 51 during the calculation process;, ( h) inputting the target image 200 into the discrimination network 50 to extract second intermediate values 52 during the calculation process; (i) the first intermediate value 51 and the second intermediate value Learning the face swapping deep learning system in a direction to reduce the second error (2), which is the difference of (52);

를 포함한다.Includes.

본 발명의 시그모이드 네트워크(60)는 본 발명의 페이스 스와핑 딥러닝 시스템의 학습과정에서 학습되는 구성일 수 있다.The sigmoid network 60 of the present invention may be a configuration learned during the learning process of the face swapping deep learning system of the present invention.

<실시예 1-1> 은 타겟 이미지(200)로부터 생성된 타겟 데이터(210)와 소스 이미지(100)로부터 생성된 제1 특성 임베딩 벡터(110)를 합성하여 제1 합성 변환데이터(300)를 생성하고, 이를 이용하여 타겟 이미지(200)와 소스 이미지(100)의 특성을 합성하는 스와핑 이미지(400)를 생성하는 단계를 포함하고 있다.<Example 1-1> synthesizes the target data 210 generated from the target image 200 and the first feature embedding vector 110 generated from the source image 100 to produce first synthesized converted data 300. It includes the step of generating and using this to generate a swapping image 400 that combines the characteristics of the target image 200 and the source image 100.

이와 대비하여, 실시예 6-1은 타겟 이미지(200)와 소스 이미지(100)의 합성에 있어, 스와핑 이미지(400)는 타겟 이미지(200)의 어트리뷰트 특성을 유지하고, 소스 이미지(100)의 아이덴티티 특성을 보다 효율적으로 유지하는 페이스 스와핑 딥러닝 시스템을 학습방법을 제공할 수 있다.In contrast, in Example 6-1, in compositing the target image 200 and the source image 100, the swapping image 400 maintains the attribute characteristics of the target image 200 and the source image 100. A face swapping deep learning system that maintains identity characteristics more efficiently can provide a learning method.

위와 같은 효과를 확보하기 위하여 어텐션 마스크(310)를 추가적으로 도입할 수 있다.To ensure the above effect, an attention mask 310 can be additionally introduced.

어텐션 마스크(310)는 타겟 데이터(210)와 제1 합성 변환데이터(300)를 합성하여 제2 합성 변환데이터(320)를 생성하는 과정을 보조하는 구성일 수 있다. 어텐션 마스크(310)의 함수를 M이라고 할 때, The attention mask 310 may be a component that assists the process of generating the second synthesized converted data 320 by combining the target data 210 and the first synthesized converted data 300. When the function of the attention mask 310 is M,

[수학식4][Equation 4]

위와 같은 [수학식4]을 통해 계산되어질 수 있는데, M값이 1에 가까운 부분은 제1 합성 변환데이터(300)를 따라가고, M값이 0에 가까운 부분은 타겟 데이터(210)를 따라갈 수 있다.It can be calculated through [Equation 4] as above. The part where the M value is close to 1 follows the first synthetic conversion data (300), and the part where the M value is close to 0 can follow the target data (210). there is.

다시 말해, 타겟 데이터(210)와 제1 특성 임베딩 벡터(110)가 합성되어 제1 합성 변환데이터(300)를 생성할 수 있다. 타겟 이미지(200)의 특성을 유지하고자하는 부분, 즉 어트리뷰트 정보에 해당하는 부분에 대하여는 어텐션 마스크(310)의 구성값이 0에 가깝게 형성되고, 소스 이미지(100)의 특성을 유지하고자 하는 부분, 즉 아이덴티티 정보에 해당하는 부분에 대하여는 어텐션 마스크(310) 구성값이 1에 가깝게 형성되도록 학습되어질 수 있다.In other words, the target data 210 and the first feature embedding vector 110 may be synthesized to generate first synthesized converted data 300. For the part where the characteristics of the target image 200 are to be maintained, that is, the part corresponding to the attribute information, the configuration value of the attention mask 310 is formed close to 0, and the characteristics of the source image 100 are to be maintained, That is, for the part corresponding to the identity information, the attention mask 310 configuration value can be learned to be close to 1.

어텐션 마스크(310)를 이용하여 제2 합성 변환데이터(320)를 생성하고, 이를 이용하여 스와핑 이미지(400)를 생성한 후, 이를 이용하여 제1 오차(1)와 제2 오차(2)를 추출한 후, 이들을 감소시키는 방향으로 본 발명의 페이스 스와핑 딥러닝 시스템을 학습할 수 있다.The second synthetic conversion data 320 is generated using the attention mask 310, the swapping image 400 is generated using this, and then the first error 1 and the second error 2 are calculated using this. After extraction, the face swapping deep learning system of the present invention can be trained to reduce them.

실시예 6-4 및 실시예 6-5에 대한 내용에 대하여는 전술한 실시예 3-1 및 실시예 4-1의 내용을 준용할 수 있다.Regarding the contents of Example 6-4 and Example 6-5, the contents of Example 3-1 and Example 4-1 described above can be applied mutatis mutandis.

다만, 어텐션 마스크(310)를 이용하여 제2 합성 변환데이터(320)를 생성하고, 스와핑 이미지(400)를 생성하여 제1 오차(1)와 제2 오차(2) 및 학습을 진행하면 이용하지 않았을 경우보다 학습의 효율이 높아질 수 있다. However, it cannot be used if the second synthetic conversion data 320 is generated using the attention mask 310, the swapping image 400 is generated, and the first error (1) and the second error (2) and learning are performed. Learning efficiency can be higher than if it were not done.

왜나햐면, 어텐션 마스크(310)는 타겟 이미지(200)와 소스 이미지(100)의 특성들에 대하여 두 이미지를 합성할 때, 각 이미지에서 집중되어야 할 부분을 강조하는 역할을 할 수 있기 때문일 수 있다. This may be because the attention mask 310 can play a role in emphasizing the part to be focused on in each image when combining two images with respect to the characteristics of the target image 200 and the source image 100. .

(실시예 7-1) 페이스 스와핑 딥러닝 시스템을 학습하는 컴퓨터 구현 학습장치에 있어서, 명령을 저장하는 메모리; 및 상기 명령을 실행하도록 구성된 적어도 하나의 프로세서;를 포함하며, 상기 프로세서를 통해 실행되는 상기 명령은 소스 이미지(100)를 임베딩 네트워크(10)에 입력하여 제1 특성 임베딩 벡터(110)를 추출하고, 타겟 이미지(200)를 호환 사이즈로 변환하여 타겟 데이터(210)로 변환하고, 상기 타겟 데이터(210)와 상기 제1 특성 임베딩 벡터(110)를 합성하여 합성 변환데이터를 생성하고, 상기 합성 변환데이터를 스와핑 이미지(400)로 변환하고, 상기 스와핑 이미지(400)를 기초로 상기 페이스 스와핑 딥러닝 시스템을 학습하는, 컴퓨터 구현 학습장치이다.(Example 7-1) A computer-implemented learning device for learning a face swapping deep learning system, comprising: a memory for storing instructions; and at least one processor configured to execute the instructions, wherein the instructions executed through the processor input the source image 100 into the embedding network 10 to extract the first feature embedding vector 110. , converting the target image 200 to a compatible size and converting it into target data 210, synthesizing the target data 210 and the first feature embedding vector 110 to generate synthetic conversion data, and generating the synthetic conversion. It is a computer-implemented learning device that converts data into a swapping image 400 and learns the face swapping deep learning system based on the swapping image 400.

(실시예 7-2) 실시예 7-1에 있어서, 상기 명령은 상기 스와핑 이미지(400)를 상기 임베딩 네트워크(10)에 입력하여 제2 특성 임베딩 벡터(410)를 추출한 후, 상기 제1 특성 임베딩 벡터(110)와 상기 제2 특성 임베딩 벡터(410)의 차이인 제1 오차(1)를 감소시키는 방향으로 상기 페이스 스와핑 딥러닝 시스템을 학습하는, 컴퓨터 구현 학습장치.(Example 7-2) In Example 7-1, the command inputs the swapping image 400 into the embedding network 10 to extract the second feature embedding vector 410, and then extracts the first feature embedding vector 410. A computer-implemented learning device that learns the face swapping deep learning system in a way to reduce the first error (1), which is the difference between the embedding vector (110) and the second feature embedding vector (410).

(실시예 7-3) 실시예 7-2에 있어서, 상기 명령은 상기 스와핑 이미지(400)를 판별 네트워크(50)에 입력하여 계산과정에서 제1 중간값(51)들을 추출하고, 상기 타겟 이미지(200)를 판별 네트워크(50)에 입력하여 계산과정에서 제2 중간값(52)들을 추출하고, 상기 제1 중간값(51)과 상기 제2 중간값(52)의 차이인 제2 오차(2)를 감소시키는 방향으로 상기 페이스 스와핑 딥러닝 시스템을 학습하는, 컴퓨터 구현 학습장치.(Example 7-3) In Example 7-2, the command inputs the swapping image 400 to the discrimination network 50 to extract first intermediate values 51 in the calculation process, and the target image (200) is input into the discrimination network 50 to extract second median values 52 during the calculation process, and a second error ( 2) A computer-implemented learning device that learns the face swapping deep learning system in the direction of reducing.

본 발명의 예시적인 실시예는 페이스 스와핑 딥러닝 시스템을 학습하는 컴퓨터 구현 학습장치에 대한 것으로, 명령을 저장하는 메모리, 명령을 실행하도록 구성된 적어도 하나의 프로세서를 포함할 수 있다. 프로세서는 (실시예 1-1) 부터 (실시예 6-5)의 단계들을 수행하도록 하는 명령을 실행할 수 있다. 중복되는 내용은 생략하기로 한다.An exemplary embodiment of the present invention relates to a computer-implemented learning device for learning a face swapping deep learning system, and may include a memory for storing instructions and at least one processor configured to execute the instructions. The processor may execute instructions to perform steps from (Embodiment 1-1) to (Embodiment 6-5). Overlapping content will be omitted.

본 발명의 명세서(특히 특허청구범위에서)에서 "상기"의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 본 발명에서 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 적용한 발명을 포함하는 것으로서(이에 반하는 기재가 없다면), 발명의 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다. In the specification (particularly in the claims) of the present invention, the use of the term “above” and similar referential terms may refer to both the singular and the plural. In addition, when a range is described in the present invention, the invention includes the application of individual values within the range (unless there is a statement to the contrary), and each individual value constituting the range is described in the detailed description of the invention. It's the same.

본 발명에 따른 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 따라 본 발명이 한정되는 것은 아니다. 본 발명에서 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 본 발명을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 본 발명의 범위가 한정되는 것은 아니다. 또한, 당업자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터에 따라 구성될 수 있음을 알 수 있다. Unless there is an explicit order or statement to the contrary regarding the steps constituting the method according to the invention, the steps may be performed in any suitable order. The present invention is not necessarily limited by the order of description of the above steps. The use of any examples or illustrative terms (e.g., etc.) in the present invention is merely to describe the present invention in detail, and unless limited by the claims, the scope of the present invention is limited by the examples or illustrative terms. It doesn't work. Additionally, those skilled in the art will recognize that various modifications, combinations and changes may be made depending on design conditions and factors within the scope of the appended claims or their equivalents.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위는 본 발명의 사상의 범주에 속한다고 할 것이다Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and the scope of the patent claims described below as well as all scopes equivalent to or equivalently changed from the scope of the claims are within the scope of the spirit of the present invention. It will be said to belong to

10 : 임베딩 네트워크 20 : 리니어 레이어
30 : 컨볼루션 블럭 40 : 믹스 블럭
50 : 판별 네트워크 60 : 시그모이드 네트워크
100 : 소스 이미지 110 : 제1 특성 임베딩 벡터
120 : 아이덴티티 데이터 200 : 타겟 이미지
210 : 타겟 데이터 220 : 타겟 정규 데이터
300 : 제1 합성 변환데이터 310 : 어텐션 마스크
320 : 제2 합성 변환 데이터 400 : 스와핑 이미지10: Embedding network 20: Linear layer
30: Convolution block 40: Mix block
50: Discriminant network 60: Sigmoid network
100: Source image 110: First feature embedding vector
120: Identity data 200: Target image
210: target data 220: target regular data
300: First synthetic conversion data 310: Attention mask
320: Second composite conversion data 400: Swapping image

Claims

In a method of learning a face swapping deep learning system,
(a) inputting the source image 100 into the embedding network 10 to extract a first feature embedding vector 110;
(b) converting pixels of the target image 200 to a compatible size and converting them into target data 210;
(c) generating first synthesized converted data 300 by combining the target data 210 and the first feature embedding vector 110;
(c') combining the target data 210 and the first synthesized converted data 300 with an attention mask 310 to generate second synthesized converted data 320;
(d) a composite image conversion step of converting the second composite conversion data 320 into a swapping image 400; and
(e) learning the face swapping deep learning system based on the swapping image 400;
Including,
The step (c) includes a normalization step of converting the target data 210 (X) into target normal data 220 (Z); and
A denormalization step of combining the target normal data 220 (Z) and the identity data 120 extracted from the first feature embedding vector 110,
The normalization step is calculated using Equation 1 below,
[Equation 1]

(step, class means the average and standard deviation of target data (210))

The denormalization step inputs the first feature embedding vector 110 into the linear layer 20 to extract the identity data 120, and is calculated based on the following [Equation 2],
[Equation 2]

(step, means identity data (120)

The step (e) is
(e-1) After inputting the swapping image 400 into the embedding network 10 and extracting the second feature embedding vector 410, the first feature embedding vector 110 and the second feature embedding vector ( Learning the face swapping deep learning system in a direction to reduce the first error (1), which is the difference 410);
(e-2) inputting the swapping image 400 into the discrimination network 50 and extracting first intermediate values 51 during the calculation process;
(e-3) inputting the target image 200 into the discrimination network 50 and extracting second intermediate values 52 during the calculation process; and
(e-4) learning the face swapping deep learning system in a direction to reduce the second error (2), which is the difference between the first median value (51) and the second median value (52); ,
The first feature embedding vector 110 is 512 numbers. It is data composed of, and the identity data 120 is generated as follows:

How to train a face swapping deep learning system.

In claim 1,
The step (c') is
(c'-1) generating the attention mask 310 based on a sigmoid network 60; a method of learning a face swapping deep learning system, including.

In claim 2,
The sigmoid network 60 is an input value as shown in [Equation 3] below. When entered, A method of learning a face swapping deep learning system, which is a network that generates output values of 0 to 1 based on the sigmoid function, which is a function that outputs.
[Equation 3]

In claim 2,
(c'-2) When the function of the generated attention mask 310 is M, the second synthetic conversion data 320 is calculated using the following [Equation 4]; Face swapping dip including How to learn learning systems.
[Equation 4]

(However, X is the target data 200,

delete