KR20230060726A

KR20230060726A - Method for providing face synthesis service and apparatus for same

Info

Publication number: KR20230060726A
Application number: KR1020210145232A
Authority: KR
Inventors: 이현기; 곽승근; 조준구; 손종수
Original assignee: 씨제이올리브네트웍스 주식회사
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2023-05-08
Also published as: KR102639187B1

Abstract

A method for providing a face synthesis service, according to one embodiment of the present invention, is performed by an apparatus containing a processor and a memory, and comprises: a step (a) of collecting basic data containing a target image containing a first figure, a face change target figure, and one or more source images containing a second figure, a target with a face to be changed; a step (b) of preprocessing the collected basic data to determine whether additional collection of the source images is required; a step (c) of, when the additional collection of the source images is not required according to the determination of the step (b), inputting the preprocessed basic data to a face synthesis model to generate a face synthesis image which synthesizes a face and a facial expression of the first figure contained in the target image with a face of the second figure contained in the source image; and a step (d) of post-correcting skin tone of the generated face synthesis image. Accordingly, the method can provide a face synthesis service with a high level of completion.

Description

Method for providing face synthesis service and apparatus therefor

본 발명은 얼굴 합성 서비스를 제공하는 방법 및 이를 위한 장치 에 관한 것이다. 보다 자세하게는 양질의 기초 데이터를 수집하여 학습하고, 생성된 얼굴 합성 이미지에 후보정 작업까지 수행함으로써 완성도 높은 얼굴 합성 서비스를 제공할 수 있는 방법 및 이를 위한 장치에 관한 것이다. The present invention relates to a method for providing a face synthesis service and an apparatus therefor. More specifically, it relates to a method and an apparatus for providing a highly complete face synthesis service by collecting and learning high-quality basic data and performing post-processing on the generated face synthesis image.

최근 딥러닝(Deep Learning) 기술이 적용된 얼굴 합성 기술, 소위 말하는 딥페이크(Deepfake) 기술이 출현하면서 각종 산업에서 널리 활용되고 있는바, 가상 인플루언서의 탄생과 같은 놀이의 용도 그리고 고인이 되어 다시 보지 못하는 사람들의 생전 모습을 재현하는 용도 등을 대표적인 사례로 들 수 있다.Recently, with the emergence of face synthesis technology applied with Deep Learning technology, so-called Deepfake technology, it is widely used in various industries. A typical example is the use of reproducing the appearance of people who cannot see.

이러한 딥페이크 기술은 기계 학습(Machine Learning, ML) 모델 중, 지도 학습 방식에서 벗어나 비지도 학습의 초석을 다짐으로써 차세대 딥러닝 알고리즘으로 주목 받고 있는 생성적 적대 신경망((Generative Adversarial Network, GAN) 모델을 통해 이루어지는바, 생성적 적대 신경망 방식은 생성자(Generator)와 감별자(Discriminator)라는 서로 상반된 목적을 갖는 두 신경망 모델이 경쟁하면서 성능이 향상되는 모델이다. 보다 구체적으로, 생성자는 실제 데이터를 학습하고 이를 바탕으로 실제에 가까운 거짓 데이터를 생성하며, 감별자는 생성자가 생성한 데이터가 실제인지 거짓인지를 판별하는데, 이러한 과정을 지속적으로 반복하면서 더욱 정교한 거짓 데이터를 생성할 수 있게 되는 것인바, 딥페이크 기술은 기술의 근간이 되는 생성적 적대 신경망 모델의 사용이 반복될수록 생성자와 감별자의 경쟁이 심화되어 그 완성도 역시 자연스럽게 향상될 수 있다. This deepfake technology is a generative adversarial network (GAN) model that is attracting attention as a next-generation deep learning algorithm by breaking away from supervised learning among machine learning (ML) models and laying the foundation for unsupervised learning. As it is done through, the generative adversarial network method is a model in which two neural network models with opposite purposes, a generator and a discriminator, compete and improve performance. More specifically, the generator learns real data. Based on this, false data that is close to reality is generated, and the discriminator determines whether the data generated by the generator is real or false. By continuously repeating this process, more sophisticated false data can be generated. As the use of the generative adversarial neural network model, which is the basis of the technology, is repeated, the competition between the creator and the discriminator intensifies, and its perfection can naturally improve.

그럼에도 불구하고 보다 정교한 얼굴 합성을 위해 딥페이크 기술의 완성도를 향상시키기 위한 다양한 연구가 활발하게 진행되고 있는바, 생성적 적대 신경망 모델의 학습 효율을 향상시키거나, 생성된 얼굴 합성 이미지에 후보정 작업을 가하여 완성하는 등이 그것이며, 본 발명은 이에 관한 것이다. Nevertheless, various studies are being actively conducted to improve the completeness of deepfake technology for more sophisticated face synthesis, which improves the learning efficiency of generative adversarial neural network models or performs post-processing on generated face synthesis images. adding and completing, etc., and the present invention relates to this.

대한민국 공개특허공보 제 10-2020-0027030호(2020.03.11)Republic of Korea Patent Publication No. 10-2020-0027030 (2020.03.11)

본 발명이 해결하고자 하는 기술적 과제는 딥페이크 기술에 적용되는 생성적 적대 신경망 모델의 학습 효율을 향상시킴으로써 완성도 높은 얼굴 합성 서비스를 제공할 수 있는 얼굴 합성 서비스를 제공하는 방법 및 이를 위한 장치를 제공하는 것이다. The technical problem to be solved by the present invention is to provide a method for providing a face synthesis service that can provide a highly complete face synthesis service by improving the learning efficiency of a generative adversarial network model applied to deepfake technology, and a device therefor. will be.

본 발명이 해결하고자 하는 또 다른 기술적 과제는 생성된 얼굴 합성 이미지에 후보정 작업을 가함으로써 완성도 높은 얼굴 합성 결과를 제공할 수 있는 얼굴 합성 서비스를 제공하는 방법 및 이를 위한 장치를 제공하는 것이다.Another technical problem to be solved by the present invention is to provide a method and apparatus for providing a face synthesis service capable of providing a highly complete face synthesis result by applying post-correction to a generated face synthesis image.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the description below.

상기 기술적 과제를 달성하기 위한 본 발명의 일 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법은 프로세서 및 메모리를 포함하는 장치를 통해 수행되되, (a) 얼굴 변경 대상 인물인 제1 인물을 포함하는 타겟 이미지 및 변경할 얼굴의 대상 인물인 제2 인물을 포함하는 하나 이상의 소스 이미지를 포함하는 기초 데이터를 수집하는 단계, (b) 상기 수집한 기초 데이터를 전처리하여 상기 소스 이미지의 추가 수집 필요 여부를 결정하는 단계, (c) 상기 (b) 단계의 결정 결과 상기 소스 이미지의 추가 수집이 필요하지 않다면, 상기 전처리한 기초 데이터를 얼굴 합성 모델에 입력하여 상기 타겟 이미지가 포함하는 제1 인물의 얼굴 및 표정에 상기 소스 이미지가 포함하는 제2 인물의 얼굴을 합성한 얼굴 합성 이미지를 생성하는 단계 및 (d) 상기 생성한 얼굴 합성 이미지의 피부톤을 후보정하는 단계를 포함한다. A method for providing a face synthesis service according to an embodiment of the present invention for achieving the above technical problem is performed through a device including a processor and a memory, (a) a target including a first person who is a face change target person Collecting basic data including one or more source images including an image and a second person as a target person of a face to be changed, (b) pre-processing the collected basic data to determine whether additional collection of the source image is necessary Step (c) If it is determined that additional collection of the source image is not required as a result of the determination in step (b), the preprocessed basic data is input to a face synthesis model to determine the face and expression of the first person included in the target image. generating a synthesized face image obtained by synthesizing a face of a second person included in the source image; and (d) post-correcting a skin tone of the synthesized face image.

일 실시 예에 따르면, 상기 (a) 단계의 소스 이미지는, 소정의 촬영 조건인 슛팅 가이드(Shooting Guide)에 맞춰서 촬영된 하나 이상의 이미지이며, 상기 슛팅 가이드는, 해상도가 FHD 1920 Х 1080 이상인 카메라로 상기 제2 인물에 대하여 정면, 우측 상단° 20 내지 30° 중 어느 한 위치, 좌측 상단 20°내지 30° 중 어느 한 위치, 우측 하단 10° 내지 20° 중 어느 한 위치 및 좌측 하단 10° 내지 20° 중 어느 한 위치 각각에서, 상기 제2 인물의 두 눈이 전부 보이도록 촬영하는 조건일 수 있다. According to an embodiment, the source image in step (a) is one or more images captured according to a shooting guide, which is a predetermined shooting condition, and the shooting guide is captured by a camera having a resolution of FHD 1920 Х 1080 or higher. With respect to the second person, any one position among the front, upper right corner 20 to 30 °, upper left 20 ° to 30 °, lower right 10 ° to 20 °, and lower left 10 ° to 20 It may be a condition of photographing so that both eyes of the second person are visible at each position of °.

일 실시 예에 따르면, 상기 슛팅 가이드는, 상기 제2 인물에 대하여 두 눈이 전부 보이는 상태에서 최대의 우측 위치 및 두 눈이 전부 보이는 상태에서 최대의 좌측 위치에서 촬영하는 조건을 더 포함할 수 있다. According to an embodiment, the shooting guide may further include a condition for shooting the second person at a maximum right position in a state in which both eyes are visible and a maximum left position in a state in which both eyes are visible with respect to the second person. .

일 실시 예에 따르면, 상기 슛팅 가이드는, 상기 제2 인물에 대한 촬영 조건을 더 포함하며, 상기 제2 인물에 대한 촬영 조건은, 상기 제2 인물에 대하여 2분 이상의 촬영 시간, 고개를 상하/좌우로 천천히 돌리며 촬영, 상기 타겟 이미지가 포함하는 제1 인물의 표정과 동일/유사한 표정, 얼굴 전면 중 일부라도 가림 없이 촬영하는 조건일 수 있다. According to an embodiment, the shooting guide further includes a shooting condition for the second person, wherein the shooting condition for the second person includes: a shooting time of 2 minutes or more for the second person; Conditions may include photographing while slowly turning left and right, facial expressions identical/similar to those of the first person included in the target image, and photographing without covering even a part of the entire face.

일 실시 예에 따르면, 상기 (b) 단계는, (b-1) 상기 타겟 이미지가 포함하는 제1 인물의 얼굴 및 소스 이미지가 포함하는 제2 인물의 얼굴에서 복수 개의 랜드마크(Land mark)를 각각 추출하는 단계, (b-2) 상기 제1 인물의 얼굴에서 추출한 복수 개의 랜드마크를 이용하여 상기 제1 인물의 얼굴에 대한 롤(Roll), 피치(Pitch) 및 요(Yaw)를 산출하고, 제2 인물의 얼굴에서 추출한 복수 개의 랜드마크를 이용하여 상기 제2 인물의 얼굴에 대한 롤, 피치 및 요를 산출하는 단계, (b-3) 상기 산출한 제1 인물의 얼굴에 대한 롤, 피치 및 요와 제2 인물의 얼굴에 대한 롤, 피치 및 요를 이용하여 상기 타겟 이미지와 소스 이미지의 유사도를 산정하는 단계 및 (b-4) 상기 산정한 타겟 이미지와 소스 이미지의 유사도가 임계값 이상인지 판단하는 단계를 포함할 수 있다. According to an embodiment, the step (b) may include (b-1) a plurality of landmarks from a face of a first person included in the target image and a face of a second person included in the source image. (b-2) calculating roll, pitch, and yaw for the face of the first person using a plurality of landmarks extracted from the face of the first person; , calculating the roll, pitch, and yaw of the face of the second person using a plurality of landmarks extracted from the face of the second person, (b-3) the calculated roll of the face of the first person, calculating the similarity between the target image and the source image using the pitch and yaw and the roll, pitch, and yaw of the second person's face; and (b-4) the calculated similarity between the target image and the source image is a threshold value It may include a step of determining whether it is abnormal.

일 실시 예에 따르면, 상기 (b-4) 단계 이후에, (b-5) 상기 판단 결과 임계값 이상이라면, 상기 소스 이미지의 추가 수집이 필요하다고 결정하는 단계 및 (b-6) 상기 판단 결과 임계값 미만이라면, 상기 소스 이미지의 추가 수집이 필요하지 않다고 결정하는 단계 중 어느 한 단계를 더 포함할 수 있다. According to an embodiment, after the step (b-4), (b-5) determining that additional collection of the source image is necessary if the result is greater than or equal to the threshold, and (b-6) the result of the determination. If it is less than the threshold value, it may further include any one of the steps of determining that additional collection of the source image is not required.

일 실시 예에 따르면, 상기 (b-3) 단계에서의 유사도 산정은, 유클리디안 거리(Euclidean distance)를 이용하여 산정할 수 있다. According to an embodiment, the similarity calculation in step (b-3) may be calculated using Euclidean distance.

일 실시 예에 따르면, 상기 (c) 단계의 얼굴 합성 모델은, 생성적 적대 신경망((Generative Adversarial Network, GAN)을 포함하는 하나 이상의 기계 학습(Machine Learning, ML) 모델일 수 있다. According to an embodiment, the face synthesis model in step (c) may be one or more machine learning (ML) models including a generative adversarial network (GAN).

일 실시 예에 따르면, 상기 (d) 단계는, (d-1) 상기 생성된 얼굴 합성 이미지와 상기 얼굴 합성 모델에 입력된 기초 데이터가 포함하는 타겟 이미지에서 소정의 랜드마크를 각각 제외시키고, 상기 소정의 랜드마크를 제외시킨 각각의 이미지와 합성 영역 윤곽 마스크 이미지를 합성하여 랜드마크 제외 이미지를 생성하는 단계, (d-2) 상기 생성한 랜드마크 제외 이미지와 상기 기초 데이터가 포함하는 타겟 이미지, 상기 랜드마크 제외 이미지와 얼굴 합성 이미지 각각에 하나 이상의 색상 모델을 적용하여 피부톤 유사도를 산정하는 단계, (d-3) 상기 하나 이상의 색상 모델을 함수로 하되, 상기 함수에 입력되는 하나 이상의 변수들에 대하여 최적화 기법을 적용해 상기 산정한 피부톤 유사도를 최소화시키는 변수를 탐색하는 단계 및 (d-4) 상기 탐색한 변수를 상기 생성한 얼굴 합성 이미지에 적용시키는 단계를 포함할 수 있다. According to an embodiment, the step (d) may include (d-1) excluding predetermined landmarks from a target image included in the generated face synthesis image and basic data input to the face synthesis model, respectively; generating a landmark-excluded image by synthesizing each image excluding a predetermined landmark and a synthetic region contour mask image, (d-2) a target image included in the generated landmark-excluded image and the basic data; Calculating skin tone similarity by applying one or more color models to each of the landmark-excluding image and the synthesized face image, (d-3) using the one or more color models as a function, but depending on one or more variables input to the function and (d-4) applying the searched variables to the generated synthesized face image.

일 실시 예에 따르면, 상기 (d-1) 단계는, (d-1-1) 상기 생성된 얼굴 합성 이미지와 상기 얼굴 합성 모델에 입력된 기초 데이터가 포함하는 타겟 이미지 각각에서 복수 개의 랜드마크를 추출하는 단계, (d-1-2) 상기 추출한 복수 개의 랜드마크 각각에 대하여 눈썹, 눈, 코 및 입술의 윤곽을 남기고 나머지를 제거하는 단계, (d-1-3) 상기 나머지를 제거한 눈썹, 눈, 코 및 입술의 윤곽을 포함하는 랜드마크 각각에 대하여 눈썹에 대한 랜드마크의 위치를 조정하는 단계, (d-1-4) 상기 위치를 조정한 눈썹에 대한 랜드마크와 눈, 코 및 입술의 윤곽을 포함하는 랜드마크의 윤곽선을 그리고 상기 윤곽선의 내부를 채우는 단계, (d-1-5) 상기 생성된 얼굴 합성 이미지와 상기 얼굴 합성 모델에 입력된 기초 데이터가 포함하는 타겟 이미지에서 상기 내부를 채우고 그린 랜드마크의 윤곽선을 각각 제외 - 마스킹 처리 - 시키는 단계 및 (d-1-6) 상기 내부를 채우고 그린 랜드마크의 윤곽선을 각각 제외시킨 얼굴 합성 이미지 및 타겟 이미지와 합성 영역 윤곽 마스크 이미지를 전부 합성하여 랜드마크 제외 이미지를 생성하는 단계를 포함할 수 있다. According to an embodiment, in the step (d-1), (d-1-1) a plurality of landmarks are selected from each of the target images included in the generated face synthesis image and basic data input to the face synthesis model. Extracting, (d-1-2) removing the remaining contours of the eyebrows, eyes, nose, and lips for each of the plurality of extracted landmarks, (d-1-3) removing the remaining eyebrows, Adjusting the position of the landmarks for the eyebrows with respect to each of the landmarks including the contours of the eyes, nose and lips, (d-1-4) adjusting the positions of the landmarks for the eyebrows and the eyes, nose and lips Drawing an outline of a landmark including an outline of and filling the inside of the outline, (d-1-5) in a target image including the generated face synthesis image and basic data input to the face synthesis model and (d-1-6) filling in the inside and excluding the contours of the green landmarks - masking process - and (d-1-6) filling the inside and excluding the contours of the green landmarks, respectively, a face synthesized image, a target image, and a synthesized region contour mask image A step of generating an image excluding landmarks by synthesizing all of them may be included.

일 실시 예에 따르면, 상기 (d-1-3) 단계에서의 눈썹에 대한 랜드마크의 위치 조정은, 상기 나머지를 제거한 눈썹에 대한 랜드마크의 중간 지점에서 상기 나머지를 제거한 코의 최상단 랜드마크의 "?향으?* 조정할 수 있다.According to one embodiment, the adjustment of the position of the landmark with respect to the eyebrow in the step (d-1-3) is the topmost landmark of the nose from which the remainder is removed at the midpoint of the landmark for the eyebrow from which the remainder is removed. "?For?* can be adjusted.

일 실시 예에 따르면, 상기 (d-1-4) 단계에서 그린 랜드마크의 윤곽선은, 상기 생성된 얼굴 합성 이미지가 포함하는 인물 - 제1 인물의 얼굴 및 표정에 제2 인물의 얼굴이 합성된 인물- 의 얼굴 크기, 상기 타겟 이미지가 포함하는 제1 인물의 얼굴 크기 각각에 따라 두께가 유동적으로 조절될 수 있다. According to an embodiment, the outline of the landmark drawn in step (d-1-4) is a person included in the generated face composite image - a face and expression of a first person synthesized with a face of a second person. The thickness may be flexibly adjusted according to the face size of the person and the face size of the first person included in the target image.

일 실시 예에 따르면, 상기 생성된 얼굴 합성 이미지가 포함하는 인물의 얼굴 크기 및 상기 타겟 이미지가 포함하는 제1 인물의 얼굴 크기는 턱 랜드마크의 양쪽 끝 지점의 거리를 유클리디안 거리를 이용하여 산정하고, 상기 얼굴 크기에 대하여 산정한 유클리디안 거리를 상기 랜드마크의 윤곽선의 두께에 대응시킬 수 있다. According to an embodiment, the face size of the person included in the generated synthesized face image and the face size of the first person included in the target image are determined by using Euclidean distance as a distance between both ends of the chin landmark. and the Euclidean distance calculated for the face size may correspond to the thickness of the outline of the landmark.

일 실시 예에 따르면, 상기 (d-2) 단계의 하나 이상의 색상 모델은, Gray 모델, RGB 모델, HSV(conic) 모델, HSV(cylindric) 모델 및 YCbCr(YUV) 모델 중 어느 하나 이상일 수 있다. According to an embodiment, the one or more color models in step (d-2) may be any one or more of a Gray model, an RGB model, a HSV (conic) model, a HSV (cylindric) model, and a YCbCr (YUV) model.

일 실시 예에 따르면, 상기 (d-3) 단계에서의 하나 이상의 변수들은, Blur/Sharpening 적용 강도, Super resolution 적용 강도 및 WCT2, GFPGAN, Histogram-matching을 포함하는 Color transfer method 종류 중 어느 하나 이상일 수 있다. According to one embodiment, the one or more variables in the step (d-3) may be any one or more of the following: Blur/Sharpening applied intensity, super resolution applied intensity, and color transfer method including WCT2, GFPGAN, and Histogram-matching. there is.

일 실시 예에 따르면, 상기 (d-3) 단계에서의 최적화 기법은, Bayesian optimization기법일 수 있다. According to an embodiment, the optimization technique in step (d-3) may be a Bayesian optimization technique.

상기와 같은 본 발명에 따르면, 얼굴 변경 대상 인물인 제1 인물을 포함하는 타겟 이미지 및 변경할 얼굴의 대상 인물인 제2 인물을 포함하는 하나 이상의 소스 이미지를 포함하는 기초 데이터를 얼굴 합성 모델에 입력하기 이전에 유사도를 산정하는 전처리 프로세스를 거치게 함으로써 소스 이미지의 추가 수집 필요 여부를 결정하고, 필요하다고 결정되면 본 발명만의 독자적인 슛팅 가이드에 맞춰서 촬영된 하나 이상의 고품질 소스 이미지를 추가로 수집하여 얼굴 합성 모델에 입력하는바, 생성적 적대 신경망 모델을 이용하는 얼굴 합성 모델의 학습 효율을 비약적으로 향상시킴으로써 완성도 높은 얼굴 합성 결과를 제공할 수 있다는 효과가 있다. According to the present invention as described above, inputting basic data including one or more source images including a target image including a first person as a face change target person and a second person as a target person for a face change to a face synthesis model It is determined whether additional collection of source images is necessary by having them go through a pre-processing process that calculates the degree of similarity beforehand, and if it is determined that it is necessary, one or more high-quality source images taken according to the unique shooting guide of the present invention are additionally collected to form a face synthesis model , there is an effect that a highly complete face synthesis result can be provided by dramatically improving the learning efficiency of a face synthesis model using a generative adversarial neural network model.

또한, 생성적 적대 신경망 모델을 이용하는 얼굴 합성 모델을 채택함으로써 얼굴 합성 모델의 사용이 반복될수록 생성자와 감별자의 경쟁이 심화되어 더욱 정교한 얼굴 합성 이미지의 생성이 가능할 것이나, 생성한 얼굴 합성 이미지에 대하여 피부톤을 최적화시키는 후보정 프로세스를 부가함으로써 종래의 딥페이크 기술 대비, 보다 완성도 높은 합성 결과를 제공할 수 있다는 효과가 있다. In addition, by adopting a face synthesis model using a generative adversarial neural network model, as the use of the face synthesis model is repeated, the competition between the creator and the discriminator intensifies, enabling the creation of a more sophisticated face synthesis image. By adding a post-correction process that optimizes, there is an effect that a more complete synthesis result can be provided compared to the conventional deepfake technology.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해 될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 발명을 통해 제공하는 얼굴 합성 서비스의 개요를 설명하기 위한 도면이다.
도 2는 본 발명의 제1 실시 예에 따른 얼굴 합성 서비스를 제공하는 장치가 포함하는 전체 구성을 나타낸 도면이다.
도 3은 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법의 대표적인 단계를 나타낸 순서도이다.
도 4는 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법에 있어서, 기초 데이터의 전처리에 관한 S320 단계를 구체화한 순서도이다.
도 5는 얼굴에서 추출한 68개의 랜드마크를 예시적으로 도시한 도면이다.
도 6은 사람의 자세를 기준으로 한 롤, 피치 및 요를 예시적으로 도시한 도면이다.
도 7은 타겟 이미지와 소스 이미지에 대하여 산정한 유사도를 시각화한 모습을 예시적으로 도시한 도면이다.
도 8은 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법에 있어서, 소스 이미지를 추가적으로 수집하는 모습을 예시적으로 도시한 도면이다.
도 9는 슛팅 가이드에 따라 7대의 카메라가 촬영한 제2 인물의 소스 이미지를 예시적으로 도시한 도면이다.
도 10은 도 1에 예시적으로 도시된 타겟 이미지와 소스 이미지를 얼굴 합성 모델에 입력하여 생성된 얼굴 합성 이미지를 예시적으로 도시한 도면이다.
도 11은 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법에 있어서, 피부톤 후보정에 관한 S340 단계를 구체화한 순서도이다.
도 12는 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법에 있어서, 랜드마크 제외 방법에 관한 S340-1 단계를 구체화한 순서도이다.
도 13은 얼굴 합성 이미지에서 복수 개의 랜드마크를 추출한 모습을 예시적으로 도시한 도면이다.
도 14는 도 13에 도시된 도면에서 눈썹, 눈, 코 및 입술의 윤곽에 대한 랜드마크를 남기고 나머지를 제거한 모습을 예시적으로 도시한 도면이다.
도 15는 눈썹에 대한 랜드마크의 위치 조정 모습을 예시적으로 도시한 도면이다.
도 16은 윤곽선의 두께를 소정의 절대값으로 사용했을 경우 얼굴 전체가 가려지는 상황을 예시적으로 도시한 도면이다.
도 17은 도 16에 도시된 도면에서 윤곽선의 두께를 얼굴 크기에 따른 유동값으로 대응시킨 결과를 예시적으로 도시한 도면이다.
도 18은 랜드마크 제외 이미지를 생성하는 모습을 예시적으로 도시한 도면이다.
도 19는 피부톤 유사도 산정을 위한 5가지 색상 모델의 개념도를 예시적으로 도시한 도면이다. 1 is a diagram for explaining the outline of a face synthesis service provided through the present invention.
2 is a diagram showing the overall configuration included in a device for providing a face synthesis service according to a first embodiment of the present invention.
3 is a flowchart illustrating representative steps of a method for providing a face synthesis service according to a second embodiment of the present invention.
4 is a flowchart embodying step S320 related to pre-processing of basic data in the method for providing a face synthesis service according to the second embodiment of the present invention.
5 is a diagram showing 68 landmarks extracted from a face as an example.
6 is a diagram illustrating roll, pitch, and yaw based on a person's posture by way of example.
7 is a diagram exemplarily illustrating a state in which similarities calculated between a target image and a source image are visualized.
8 is a diagram exemplarily illustrating how source images are additionally collected in a method for providing a face synthesis service according to a second embodiment of the present invention.
9 is a diagram illustrating source images of a second person captured by seven cameras according to a shooting guide by way of example.
FIG. 10 is a diagram exemplarily illustrating a face synthesis image generated by inputting the target image and the source image exemplarily shown in FIG. 1 to a face synthesis model.
FIG. 11 is a flowchart embodying step S340 of skin tone correction in the method of providing a face synthesis service according to the second embodiment of the present invention.
12 is a flowchart embodying step S340-1 of a method for excluding landmarks in the method for providing a face synthesis service according to the second embodiment of the present invention.
13 is a diagram exemplarily illustrating a state in which a plurality of landmarks are extracted from a synthesized face image.
FIG. 14 is a view exemplarily illustrating a state in which landmarks for contours of the eyebrows, eyes, nose, and lips in the drawing shown in FIG. 13 are left and the rest are removed.
15 is a view showing an example of position adjustment of landmarks with respect to the eyebrows.
16 is a diagram exemplarily illustrating a situation in which the entire face is covered when the thickness of the outline is used as a predetermined absolute value.
FIG. 17 is a diagram exemplarily illustrating the result of matching the thickness of the contour line with the flow value according to the size of the face in the drawing shown in FIG. 16 .
18 is a diagram exemplarily illustrating how to generate an image excluding landmarks.
19 is a diagram showing a conceptual diagram of 5 color models for calculating skin tone similarity by way of example.

본 발명의 목적과 기술적 구성 및 그에 따른 작용 효과에 관한 자세한 사항은 본 발명의 명세서에 첨부된 도면에 의거한 이하의 상세한 설명에 의해 보다 명확하게 이해될 것이다. 첨부된 도면을 참조하여 본 발명에 따른 실시 예를 상세하게 설명한다.Objects and technical configurations of the present invention and details of the operational effects thereof will be more clearly understood by the following detailed description based on the accompanying drawings in the specification of the present invention. An embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

본 명세서에서 개시되는 실시 예들은 본 발명의 범위를 한정하는 것으로 해석되거나 이용되지 않아야 할 것이다. 이 분야의 통상의 기술자에게 본 명세서의 실시 예를 포함한 설명은 다양한 응용을 갖는다는 것이 당연하다. 따라서, 본 발명의 상세한 설명에 기재된 임의의 실시 예들은 본 발명을 보다 잘 설명하기 위한 예시적인 것이며 본 발명의 범위가 실시 예들로 한정되는 것을 의도하지 않는다.The embodiments disclosed herein should not be construed or used as limiting the scope of the present invention. For those skilled in the art, it is natural that the description including the embodiments herein has a variety of applications. Therefore, any embodiments described in the detailed description of the present invention are illustrative for better explaining the present invention and are not intended to limit the scope of the present invention to the embodiments.

도면에 표시되고 아래에 설명되는 기능 블록들은 가능한 구현의 예들일 뿐이다. 다른 구현들에서는 상세한 설명의 사상 및 범위를 벗어나지 않는 범위에서 다른 기능 블록들이 사용될 수 있다. 또한, 본 발명의 하나 이상의 기능 블록이 개별 블록들로 표시되지만, 본 발명의 기능 블록들 중 하나 이상은 동일 기능을 실행하는 다양한 하드웨어 및 소프트웨어 구성들의 조합일 수 있다.The functional blocks shown in the drawings and described below are only examples of possible implementations. Other functional blocks may be used in other implementations without departing from the spirit and scope of the detailed description. Also, while one or more functional blocks of the present invention are represented as separate blocks, one or more of the functional blocks of the present invention may be a combination of various hardware and software configurations that perform the same function.

또한, 어떤 구성요소들을 포함한다는 표현은 "개방형"의 표현으로서 해당 구성요소들이 존재하는 것을 단순히 지칭할 뿐이며, 추가적인 구성요소들을 배제하는 것으로 이해되어서는 안 된다.In addition, the expression of including certain components simply indicates that the corresponding components exist as an expression of “open type”, and should not be understood as excluding additional components.

나아가 어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급될 때에는, 그 다른 구성요소에 직접적으로 연결 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 한다.Furthermore, it should be understood that when a component is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but other components may exist in the middle. do.

이하에서는 도면들을 참조하여 본 발명의 각 실시 예들에 대해 살펴보기로 한다. Hereinafter, each embodiment of the present invention will be described with reference to the drawings.

먼저, 도 1은 본 발명을 통해 제공하는 얼굴 합성 서비스의 개요를 설명하기 위한 것으로써, 본 서비스의 개념은 딥페이크 기술과 같이 얼굴 변경 대상 인물인 제1 인물을 포함하는 타겟 이미지(상단 이미지)와 변경할 얼굴의 대상 인물인 제2 인물을 포함하는 소스 이미지(하단 이미지)를 합성한 얼굴 합성 이미지(우측 이미지)를 제공하는 것을 기본 골자로 하며, 얼굴 합성 이미지를 사용하는 용도에 대해서는 별도로 규정하지 않음을 미리 밝혀두는 바이다. First, FIG. 1 is to explain the outline of the face synthesis service provided through the present invention, and the concept of this service is a target image (top image) including a first person who is a face change target person like deepfake technology. The basic goal is to provide a face synthesis image (right image) synthesized with a source image (bottom image) including a second person, the target person of the face to be changed, and the use of the face synthesis image is not separately regulated. I would like to state in advance that it is not.

도 2는 본 발명의 제1 실시 예에 따른 얼굴 합성 서비스를 제공하는 장치(100)가 포함하는 전체 구성을 나타낸 도면이다. 2 is a diagram showing the overall configuration included in the apparatus 100 for providing a face synthesis service according to the first embodiment of the present invention.

그러나 이는 본 발명의 목적을 달성하기 위한 바람직한 실시 예일 뿐이며, 필요에 따라 일부 구성이 추가되거나 삭제될 수 있고, 어느 한 구성이 수행하는 역할을 다른 구성이 함께 수행할 수도 있음은 물론이다. However, this is only a preferred embodiment for achieving the object of the present invention, and some components may be added or deleted as necessary, and other components may also perform the role played by one component.

본 발명의 제1 실시 예에 따른 얼굴 합성 서비스를 제공하는 장치(100)는 프로세서(10), 네트워크 인터페이스(20), 메모리(30), 스토리지(40) 및 이들을 연결하는 데이터 버스(50)를 포함할 수 있으며, 기타 본 발명의 목적을 달성함에 있어 요구되는 부가적인 구성들을 더 포함할 수 있음은 물론이라 할 것이다. An apparatus 100 for providing a face synthesis service according to a first embodiment of the present invention includes a processor 10, a network interface 20, a memory 30, a storage 40, and a data bus 50 connecting them. It can be included, and it will be said that it may further include additional components required for achieving the object of the present invention.

프로세서(10)는 각 구성의 전반적인 동작을 제어한다. 프로세서(10)는 CPU(Central Processing Unit), MPU(Micro Processer Unit), MCU(Micro Controller Unit) 또는 본 발명이 속하는 기술 분야에서 널리 알려져 있는 형태의 프로세서 중 어느 하나일 수 있다. 아울러, 프로세서(10)는 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법을 수행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다. The processor 10 controls the overall operation of each component. The processor 10 may be any one of a central processing unit (CPU), a micro processor unit (MPU), a micro controller unit (MCU), or a type of processor widely known in the art to which the present invention belongs. In addition, the processor 10 may perform an operation for at least one application or program for performing the method for providing a face synthesis service according to the second embodiment of the present invention.

네트워크 인터페이스(20)는 본 발명의 제1 실시 예에 따른 얼굴 합성 서비스를 제공하는 장치(100)의 유무선 인터넷 통신을 지원하며, 그 밖의 공지의 통신 방식을 지원할 수도 있다. 따라서 네트워크 인터페이스(20)는 그에 따른 통신 모듈을 포함하여 구성될 수 있다.The network interface 20 supports wired and wireless Internet communication of the device 100 providing a face synthesis service according to the first embodiment of the present invention, and may support other known communication methods. Accordingly, the network interface 20 may include a communication module according to it.

메모리(30)는 각종 정보, 명령 및/또는 정보를 저장하며, 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법을 수행하기 위해 스토리지(40)로부터 하나 이상의 컴퓨터 프로그램(41)을 로드할 수 있다. 도 2에서는 메모리(30)의 하나로 RAM을 도시하였으나 이와 더불어 다양한 저장 매체를 메모리(30)로 이용할 수 있음은 물론이다. The memory 30 stores various types of information, commands and/or information, and receives one or more computer programs 41 from the storage 40 to perform the method of providing a face synthesis service according to the second embodiment of the present invention. can load Although RAM is shown as one of the memories 30 in FIG. 2 , various storage media can be used as the memory 30 , of course.

스토리지(40)는 하나 이상의 컴퓨터 프로그램(41) 및 대용량 네트워크 정보(42)를 비임시적으로 저장할 수 있다. 이러한 스토리지(40)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 발명이 속하는 기술 분야에서 널리 알려져 있는 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체 중 어느 하나일 수 있다. Storage 40 may non-temporarily store one or more computer programs 41 and mass network information 42 . The storage 40 may be a non-volatile memory such as a read only memory (ROM), an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a hard disk, a removable disk, or a It may be any one of widely known computer-readable recording media in any form.

컴퓨터 프로그램(41)은 메모리(30)에 로드되어, 하나 이상의 프로세서(10)가 (A) 얼굴 변경 대상 인물인 제1 인물을 포함하는 타겟 이미지 및 변경할 얼굴의 대상 인물인 제2 인물을 포함하는 하나 이상의 소스 이미지를 포함하는 기초 데이터를 수집하는 오퍼레이션, (B) 상기 수집한 기초 데이터를 전처리하여 상기 소스 이미지의 추가 수집 필요 여부를 결정하는 오퍼레이션, (C) 상기 (B) 오퍼레이션 의 결정 결과 상기 소스 이미지의 추가 수집이 필요하지 않다면, 상기 전처리한 기초 데이터를 얼굴 합성 모델에 입력하여 상기 타겟 이미지가 포함하는 제1 인물의 얼굴 및 표정에 상기 소스 이미지가 포함하는 제2 인물의 얼굴을 합성한 얼굴 합성 이미지를 생성하는 오퍼레이션 및 (D) 상기 생성한 얼굴 합성 이미지를 후보정하는 오퍼레이션을 실행할 수 있다. The computer program 41 is loaded into the memory 30 so that one or more processors 10 (A) include a target image including a first person as a face change target person and a second person as a target face change face. An operation for collecting basic data including one or more source images, (B) an operation for pre-processing the collected basic data to determine whether additional collection of the source image is required, (C) a result of the determination of the (B) operation If additional collection of source images is not required, the preprocessed basic data is input to a face synthesis model, and the face and expression of the first person included in the target image are combined with the face of the second person included in the source image. An operation of generating a synthesized face image and (D) an operation of post-correcting the generated synthesized face image may be executed.

이상 간단하게 언급한 컴퓨터 프로그램(41)이 수행하는 오퍼레이션은 컴퓨터 프로그램(41)의 일 기능으로 볼 수 있으며, 보다 자세한 설명은 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법에 대한 설명에서 후술하도록 한다. The operation performed by the computer program 41 briefly mentioned above can be regarded as one function of the computer program 41, and a more detailed description is about the method for providing a face synthesis service according to the second embodiment of the present invention. It will be described later in the description.

데이터 버스(50)는 이상 설명한 프로세서(10), 네트워크 인터페이스(20), 메모리(30) 및 스토리지(40) 사이의 명령 및/또는 정보의 이동 경로가 된다. The data bus 50 serves as a transfer path for commands and/or information between the processor 10 , the network interface 20 , the memory 30 and the storage 40 described above.

이상 설명한 본 발명의 제1 실시 예에 따른 얼굴 합성 서비스를 제공하는 장치(100)는 네트워크 기능을 보유한 서버일 수 있으며, 인하우스 시스템 및 공간 임대형 시스템 등과 같은 유형의 물리적인 서버와 무형의 클라우드(Cloud) 서버 등과 같이 실질적인 구현 형태는 무방하다 할 것이다. The apparatus 100 for providing a face synthesis service according to the first embodiment of the present invention described above may be a server having a network function, and may be a physical server such as an in-house system and a space rental system, and an intangible cloud ( Cloud) Server, etc.

또한, 본 발명의 제1 실시 예에 따른 얼굴 합성 서비스를 제공하는 장치(100)는 서버뿐만 아니라 네트워크 기능을 보유한 사용자 단말일 수도 있는바, 최근 사용자 단말의 성능이 비약적으로 향상됨에 따라 서버에 준하는 연산을 수행할 수 있기 때문이다. 예를 들어, 사용자 단말은 스마트폰, PDA, PDP, 테블릿 PC, 스마트 워치, 스마트 글라스, 노트북 PC 등과 같이 휴대가 가능한 포터블(Portable) 단말일 수 있으며, 데스크톱 PC, 키오스크 등과 같은 설치형 단말일 수도 있다 할 것이다. In addition, the device 100 for providing a face synthesis service according to the first embodiment of the present invention may be a user terminal having a network function as well as a server. Because it can perform calculations. For example, the user terminal may be a portable terminal such as a smart phone, PDA, PDP, tablet PC, smart watch, smart glasses, notebook PC, etc., or may be an installed terminal such as a desktop PC and a kiosk. there will be

이하, 본 발명의 제1 실시 예에 따른 얼굴 합성 서비스를 제공하는 장치(100)가 서버인 경우를 전제로(이하, "서비스 서버"로 명명하도록 한다) 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법에 대하여 도 3 내지 도 19를 참조하여 설명하도록 한다. Hereinafter, on the premise that the device 100 providing the face synthesis service according to the first embodiment of the present invention is a server (hereinafter referred to as a "service server"), a face according to the second embodiment of the present invention A method of providing a composition service will be described with reference to FIGS. 3 to 19 .

도 3은 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법의 대표적인 단계를 나타낸 순서도이다. 3 is a flowchart illustrating representative steps of a method for providing a face synthesis service according to a second embodiment of the present invention.

그러나 이는 본 발명의 목적을 달성함에 있어서 바람직한 실시 예일 뿐이며, 필요에 따라 일부 단계가 추가 또는 삭제될 수 있음은 물론이며, 어느 한 단계가 다른 단계에 포함되어 수행될 수도 있다. However, this is only a preferred embodiment in achieving the object of the present invention, and some steps may be added or deleted as necessary, and any one step may be included in another step and performed.

우선, 서비스 서버(100)가 얼굴 변경 대상 인물인 제1 인물을 포함하는 타겟 이미지 및 변경할 얼굴의 대상 인물인 제2 인물을 포함하는 하나 이상의 소스 이미지를 포함하는 기초 데이터를 수집한다(S310). First, the service server 100 collects basic data including one or more source images including a target image including a first person as a face change target person and a second person as a target person for face change (S310).

여기서 수집은 수신의 의미까지 포함하는 광의의 개념이며, 얼굴 변경 대상 인물인 제1 인물을 포함하는 타겟 이미지는 앞선 도 1에서의 상단 이미지인바, 쉽게 말하면 얼굴만을 다른 사람의 얼굴로 변경하고자 하는 인물이 담긴 이미지를 의미하며, 변경할 얼굴의 대상 인물인 제2 인물을 포함하는 소스 이미지는 앞선 도 1에서의 하단 이미지인바, 다른 사람의 얼굴에 합성될 얼굴을 가진 인물이 담긴 이미지를 의미하고, 타겟 이미지 및 소스 이미지를 포함하여 기초 데이터라고 명명하도록 한다. Here, collection is a concept in a broad sense that includes the meaning of reception, and the target image including the first person who is the person to be changed is the upper image in FIG. It means an image containing, and the source image including the second person, the target person of the face to be changed, is the lower image in FIG. Including the image and the source image, it is called basic data.

이러한 기초 데이터 중, 소스 이미지는 하나 이상, 보다 구체적으로 복수 개의 소스 이미지를 최대한 다양하게 수집하는 것이 바람직한바, 타겟 이미지에 대하여 최적의 얼굴 합성 이미지를 생성할 수 있는 소스 이미지를 자유롭게 선택하여 얼굴 합성을 진행할 수 있기 때문이다. Among these basic data, it is preferable to collect one or more source images, more specifically, a plurality of source images as diversely as possible. Therefore, a source image capable of generating an optimal face synthesis image for a target image is freely selected and face synthesized. because it can proceed.

한편, 다시 도 1을 참조하면, 타겟 이미지가 포함하는 제1 인물, 소스 이미지가 포함하는 제2 인물 모두 동일한 구도에서 어깨선 바로 밑까지 이미지에 포함되고, 얼굴이 이미지의 중앙 근처에 배치되어 있음을 확인할 수 있는바, 필수적인 것은 아닐 것이나 두 이미지가 포함하는 인물의 구도가 동일/유사하다면 얼굴 합성의 완성도 및 후술할 얼굴 합성 모델의 학습 효율이 향상될 수 있으므로 가급적 인물의 구도가 동일/유사한 타겟 이미지 및 소스 이미지를 기초 데이터로 수집함이 바람직하다 할 것이다. Meanwhile, referring to FIG. 1 again, it can be seen that both the first person included in the target image and the second person included in the source image are included in the image right below the shoulder line in the same composition, and the face is disposed near the center of the image. As can be seen, it is not essential, but if the composition of the person included in the two images is the same/similar, the completeness of face synthesis and the learning efficiency of the face synthesis model described later can be improved. And it will be desirable to collect the source image as basic data.

그러나 그럼에도 불구하고, 두 이미지가 포함하는 인물의 구도가 동일/유사하지 않은 경우가 빈번하게 발생할 수 있으며, 소스 이미지가 불명확한 경우, 해상도가 낮은 경우, 두 인물의 표정이 지나치게 상이한 경우 등과 같이 얼굴 합성의 완성도 및 후술할 얼굴 합성 모델의 학습 효율을 저하시키는 다양한 상황이 발생할 수 있는바, 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법은 얼굴 합성의 완성도 및 후술할 얼굴 합성 모델의 학습 효율을 향상시키기 위한 소스 이미지 추가 수집 방법을 포함하고 있으며, S330 단계에 대한 설명에서 후술하도록 한다. Nevertheless, cases in which the composition of the person included in the two images are not the same/similar may frequently occur, and when the source image is unclear, the resolution is low, or the expression of the two persons is too different, such as Since various situations may occur that reduce the completeness of synthesis and the learning efficiency of a face synthesis model to be described later, the method for providing a face synthesis service according to the second embodiment of the present invention improves the completeness of face synthesis and the face synthesis model to be described later. It includes a source image additional collection method for improving learning efficiency, and will be described later in the description of step S330.

기초 데이터를 수집했다면, 서비스 서버(100)가 수집한 기초 데이터를 전처리하여 소스 이미지의 추가 수집 필요 여부를 결정한다(S320). If the basic data is collected, the basic data collected by the service server 100 is pre-processed to determine whether additional collection of source images is necessary (S320).

여기서의 전처리는 타겟 이미지와 소스 이미지가 각각 포함하는 제1 인물의 얼굴 데이터 그리고 제2 인물의 얼굴 데이터를 분석하는 것을 의미하는바, S320 단계를 통해 최적의 데이터 셋을 확보할 수 있다. 이하 자세히 설명하도록 한다. Here, the preprocessing means analyzing the face data of the first person and the face data of the second person included in the target image and the source image, respectively, and an optimal data set can be secured through step S320. It will be explained in detail below.

도 4는 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법에 있어서, 기초 데이터의 전처리에 관한 S320 단계를 구체화한 순서도이다. 4 is a flowchart embodying step S320 related to pre-processing of basic data in the method for providing a face synthesis service according to the second embodiment of the present invention.

우선, 서비스 서버(100)가 타겟 이미지가 포함하는 제1 인물의 얼굴 및 소스 이미지가 포함하는 제2 인물의 얼굴에서 복수 개의 랜드마크(Land mark)를 각각 추출한다(S320-1). First, the service server 100 extracts a plurality of landmarks from the face of the first person included in the target image and the face of the second person included in the source image (S320-1).

여기서 랜드마크는 통상적인 얼굴 인식 기술에서 활용되는 얼굴의 랜드마크를 의미하는바, 얼굴 윤곽선, 눈, 코, 입에 관한 68개의 랜드마크를 추출함이 보편적이라 할 것이며, 이를 도 5에 예시적으로 도시해 놓았다. Here, the landmarks refer to landmarks of the face used in conventional face recognition technology, and it is common to extract 68 landmarks related to the facial contour, eyes, nose, and mouth, which are illustrated in FIG. 5 depicted as

한편, S310 단계에 대한 설명에서 소스 이미지는 하나 이상, 보다 구체적으로 복수 개의 소스 이미지를 최대한 다양하게 수집하는 것이 바람직하다고 했던바, 복수 개의 소스 이미지를 수집한 경우 각각의 소스 이미지가 포함하는 제2 인물의 얼굴에서 복수 개의 랜드마크를 각각 추출한다 할 것이다. On the other hand, in the description of step S310, it is said that it is desirable to collect one or more source images, more specifically, a plurality of source images as diverse as possible. A plurality of landmarks will be extracted from each person's face.

랜드마크를 각각 추출했다면, 서비스 서버(100)가 제1 인물의 얼굴에서 추출한 복수 개의 랜드마크를 이용하여 제1 인물의 얼굴에 대한 롤(Roll), 피치(Pitch) 및 요(Yaw)를 산출하고, 제2 인물의 얼굴에서 추출한 복수 개의 랜드마크를 이용하여 제2 인물의 얼굴에 대한 롤, 피치 및 요를 산출한다(S320-2). If each landmark is extracted, the service server 100 calculates the roll, pitch, and yaw of the face of the first person using a plurality of landmarks extracted from the face of the first person. And, the roll, pitch, and yaw of the face of the second person are calculated using the plurality of landmarks extracted from the face of the second person (S320-2).

S320-1 단계에서 추출한 복수 개의 랜드마크 각각은, 고유의 좌표를 가지고 있을 것이며, 이를 이용하여 타겟 이미지가 포함하는 제1 인물의 얼굴 및 소스 이미지가 포함하는 제2 인물의 얼굴 각각에 대한 롤, 피치 및 요를 산출할 수 있다. Each of the plurality of landmarks extracted in step S320-1 will have a unique coordinate, and using this, a roll for each of the face of the first person included in the target image and the face of the second person included in the source image, Pitch and yaw can be calculated.

도 6에는 사람의 자세를 기준으로 한 롤, 피치 및 요를 예시적으로 도시해 놓은바, 롤, 피치 및 요의 산출을 통해 x축, y축 및 z축 3차원 공간에 대하여 제1 인물 및 제2 인물의 얼굴이 어느 정도 회전한 상태인지를 파악할 수 있다. Figure 6 shows the roll, pitch, and yaw based on a person's posture as an example, and through the calculation of the roll, pitch, and yaw, the first person and the first person and It is possible to determine how much the second person's face is rotated.

한편, S320-1 단계에서 소스 이미지가 복수 개인 경우, 복수 개의 소스 이미지 각각이 포함하는 제2 인물의 얼굴에 대한 복수 개의 랜드마크를 각각 추출했다면, 이들 각각 추출한 복수 개의 랜드마크를 이용하여 각각의 소스 이미지가 포함하는 제2 인물의 얼굴에 대한 롤, 피치 및 요를 산출한다 할 것이다. On the other hand, if there are a plurality of source images in step S320-1, if a plurality of landmarks for the face of the second person included in each of the plurality of source images are extracted, each landmark is obtained by using the plurality of landmarks respectively extracted. The roll, pitch, and yaw of the face of the second person included in the source image may be calculated.

롤, 피치 및 요를 산출했다면, 서비스 서버(100)가 산출한 제1 인물의 얼굴에 대한 롤, 피치 및 요와 제2 인물의 얼굴에 대한 롤, 피치 및 요를 이용하여 타겟 이미지와 소스 이미지의 유사도를 산정한다(S320-3). If the roll, pitch, and yaw are calculated, the target image and the source image are calculated by using the roll, pitch, and yaw of the face of the first person and the roll, pitch, and yaw of the face of the second person calculated by the service server 100. Calculate the similarity of (S320-3).

여기서의 유사도 산정은 유클리디안 거리(Euclidean distance)를 이용하여 산정하는바, 유클리디안 거리는 데이터 분석에 있어서 유사도를 산정하는 공지된 분석 기법이며 유클리디안 유사도(Euclidean Similarity)라고도 하고, 다음과 같은 수학식 1에 의해 산정될 수 있다. The similarity calculation here is calculated using the Euclidean distance. The Euclidean distance is a known analysis technique for calculating similarity in data analysis and is also called Euclidean Similarity. It can be calculated by the same Equation 1.

수학식 1: Euclidean distance (L2) =

Equation 1: Euclidean distance (L2) =

여기서 Pd, Yd, Rd각각은 타겟 이미지의 피치, 요, 롤 값이고, Ps, Ys, Rs각각은 소스 이미지의 피치, 요, 롤 값이다. Here, Pd, Yd, and Rd are pitch, yaw, and roll values of the target image, respectively, and Ps, Ys, and Rs are pitch, yaw, and roll values of the source image, respectively.

이와 같이 유사도의 산정이 유클리디안 거리에 따르기 때문에, S320-3 단계에서 산정하는 유사도는 산정된 유클리디안 거리로 볼 수 있으며, 서비스 서버(100)가 S320-2 단계에 따라 복수 개의 소스 이미지가 포함하는 제2 인물의 얼굴에 대한 롤, 피치 및 요를 각각 산출한 경우, 이들에 따른 벡터값 각각과 타겟 이미지가 포함하는 제1 인물의 얼굴에 대한 롤, 피치 및 요에 따른 벡터값을 유클리디안 거리를 산정하는 수학식 1에 대입하여 각각의 유클리디안 거리를 산정할 수 있다. Since the similarity is calculated according to the Euclidean distance, the similarity calculated in step S320-3 can be regarded as the calculated Euclidean distance, and the service server 100 generates a plurality of source images according to step S320-2. When the roll, pitch, and yaw of the face of the second person included in are calculated, each vector value according to these values and the vector value according to the roll, pitch, and yaw of the face of the first person included in the target image are calculated. Each Euclidean distance can be calculated by substituting into Equation 1 for calculating the Euclidean distance.

한편, 서비스 서버(100)는 S320-3 단계에서 산정한 타겟 이미지와 소스 이미지의 유사도를 시각화하여 나타낼 수도 있는바, 도 7에 이를 예시적으로 도시해 놓았으며, 이를 통해 유사도를 한눈에 손쉽게 식별할 수 있음과 더불어 최적의 데이터 셋을 확보할 수 있다는 효과가 있다. On the other hand, the service server 100 may visualize and indicate the similarity between the target image and the source image calculated in step S320-3, which is illustrated in FIG. 7 as an example, through which the similarity can be easily identified at a glance In addition to being able to do this, it has the effect of securing an optimal data set.

유사도까지 산정했다면, 서비스 서버(100)가 산정한 타겟 이미지와 소스 이미지의 유사도가 임계값 이상인지 판단하며(S320-4), 판단 결과 임계값 이상이라면, 소스 이미지의 추가 수집이 필요하다고 결정하며(S320-5), 판단 결과 임계값 미만이라면, 소스 이미지의 추가 수집이 필요하지 않다고 결정한다(S320-6). If the similarity is calculated, it is determined whether the similarity between the target image and the source image calculated by the service server 100 is greater than or equal to a threshold (S320-4), and if the result is greater than or equal to the threshold, it is determined that additional collection of the source image is necessary (S320-5), if the determination result is less than the threshold value, it is determined that additional collection of the source image is not required (S320-6).

여기서 임계값은 S320-3 단계에서 산정한 유사도, 즉 유클리디안 거리를 기준으로 타겟 이미지가 포함하는 제1 인물의 얼굴 각도와 소스 이미지가 포함하는 제2 인물의 얼굴 각도가 동일/유사한 각도를 나타내는지 여부를 결정하는 기준이 되는 값인바, 예를 들어 5일 수 있으며, 서비스 서버(100)의 운영자의 설정에 따라 이와 상이한 임계값을 설정할 수 있음은 물론이라 할 것이다. Here, the threshold value is the angle at which the face angle of the first person included in the target image and the face angle of the second person included in the source image are the same/similar angles based on the similarity calculated in step S320-3, that is, the Euclidean distance. It is a value that is a criterion for determining whether or not to indicate, for example, it may be 5, and it will be said that a different threshold value can be set according to the setting of the operator of the service server 100.

S320-4 단계의 판단 결과 산정한 유사도, 즉 유클리디안 거리가 임계값 이상이라면, 얼굴 합성의 완성도 및 후술할 얼굴 합성 모델의 학습 효율을 향상시킬 수 있는 적합한 소스 이미지가 없는 것으로 보아 앞서 S310 단계에서 설명을 보류한 얼굴 합성의 완성도 및 후술할 얼굴 합성 모델의 학습 효율을 향상시키기 위한 소스 이미지 추가 수집 방법에 따라 촬영된 하나 이상의 소스 이미지를 추가적으로 수집하며, S320-4 단계의 판단 결과 산정한 유사도, 즉 유클리디안 거리가 임계값 미만이라면 얼굴 합성의 완성도 및 후술할 얼굴 합성 모델의 학습 효율을 향상시킬 수 있는 소스 이미지가 존재하는 것으로 보아 S330 단계가 수행된다. If the similarity calculated as a result of the determination in step S320-4, that is, the Euclidean distance is greater than or equal to the threshold value, it is considered that there is no suitable source image that can improve the completeness of face synthesis and the learning efficiency of the face synthesis model to be described later, and the previous step S310 In order to improve the completeness of the face synthesis that has been withheld from the explanation and the learning efficiency of the face synthesis model to be described later, one or more source images taken are additionally collected according to the source image addition collection method, and the similarity calculated as a result of the judgment in step S320-4. That is, if the Euclidean distance is less than the critical value, step S330 is performed considering that there exists a source image capable of improving the completeness of face synthesis and the learning efficiency of a face synthesis model to be described later.

한편, 소스 이미지가 복수 개인 경우, 즉 S320-3 단계에서 복수 개의 소스 이미지가 포함하는 제2 인물의 얼굴에 대한 롤, 피치 및 요에 따른 벡터값 각각과 타겟 이미지가 포함하는 제1 인물의 얼굴에 대한 롤, 피치 및 요에 따른 벡터값을 유클리디안 거리를 산정하는 수학식 1에 대입하여 각각의 유클리디안 거리를 산정한 경우, 산정한 각각의 유클리디안 거리와 임계값을 비교하여 단 하나의 유클리디안 거리라도 임계값 미만이라고 판단된다면, 소스 이미지의 추가 수집이 필요하지 않다고 결정함으로써 해당 유클리디안 거리를 산정하는데 이용된 소스 이미지를 기초로 후술할 얼굴 합성이 진행될 수 있다 할 것이며, 산정한 각각의 유클리디안 거리와 임계값을 비교하여 모든 유클리디안 거리가 임계값 이상이라고 판단된다면, 소스 이미지의 추가 수집이 필요하다고 결정함으로써 앞서 S310 단계에서 설명을 보류한 얼굴 합성의 완성도 및 후술할 얼굴 합성 모델의 학습 효율을 향상시키기 위한 소스 이미지 추가 수집 방법에 따라 촬영된 하나 이상의 소스 이미지를 추가적으로 수집하게 될 것이다. Meanwhile, when there are a plurality of source images, that is, the face of the first person included in the target image and vector values according to the roll, pitch, and yaw of the face of the second person included in the plurality of source images in step S320-3. When each Euclidean distance is calculated by substituting the vector values according to the roll, pitch, and yaw for Euclidean distance into Equation 1 for calculating the Euclidean distance, each calculated Euclidean distance is compared with the threshold value If it is determined that even a single Euclidean distance is less than the threshold value, it is determined that additional collection of source images is not required, so that face synthesis to be described later can be performed based on the source image used to calculate the corresponding Euclidean distance. If it is determined that all Euclidean distances are equal to or greater than the threshold value by comparing each of the calculated Euclidean distances with the threshold value, it is determined that additional collection of source images is necessary, thereby performing face synthesis that has been withheld from the description in step S310. One or more captured source images will be additionally collected according to a source image additional collection method for improving completeness and learning efficiency of a face synthesis model to be described later.

다시 도 3에 대한 설명으로 돌아가도록 한다. Let's go back to the description of FIG. 3 again.

S320 단계에 따라 소스 이미지의 추가 수집 필요 여부를 결정했으며, 결정 결과 소스 이미지의 추가 수집이 필요하지 않다면, 전처리한 기초 데이터를 얼굴 합성 모델에 입력하여 타겟 이미지가 포함하는 제1 인물의 얼굴 및 표정에 소스 이미지가 포함하는 제2 인물의 얼굴을 합성한 얼굴 합성 이미지를 생성한다(S330). In step S320, it is determined whether additional collection of source images is necessary, and as a result of the decision, if additional collection of source images is not necessary, the face and expression of the first person included in the target image are input to the face synthesis model by inputting the preprocessed basic data. A face synthesized image obtained by synthesizing the face of the second person included in the source image is generated (S330).

S330 단계는 얼굴을 합성하는 본격적인 단계인바, 이에 대한 설명을 하기에 앞서, 앞서 설명을 보류한 얼굴 합성의 완성도 및 후술할 얼굴 합성 모델의 학습 효율을 향상시키기 위한 소스 이미지 추가 수집 방법에 대하여 설명하도록 한다. Step S330 is a full-scale step of synthesizing a face. Before explaining this, the completeness of face synthesis, which has been previously withheld, and a method for collecting additional source images to improve the learning efficiency of a face synthesis model to be described later will be described. do.

도 8은 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법에 있어서, 소스 이미지를 추가적으로 수집하는 모습을 예시적으로 도시한 도면이다. 8 is a diagram exemplarily illustrating how source images are additionally collected in a method for providing a face synthesis service according to a second embodiment of the present invention.

앞서 S320 단계에 대한 설명에서 유사도가 임계값 이상이라면 소스 이미지의 추가 수집이 필요하다고 결정했던바, 추가 수집하는 소스 이미지는 유사도를 산정하였을 때 확실하게 임계값 미만이 될 수 있을 정도의 양질의 데이터이어야 하며, 소스 이미지 추가 수집 방법은 이를 달성하기 위한 촬영 조건으로써 소정의 촬영 조건을 나타내는 슛팅 가이드(Shooting Guide)라 할 수 있다. In the description of step S320 above, it was determined that additional collection of source images was necessary if the similarity was higher than the threshold, and the additionally collected source images were high-quality data that could surely be below the threshold when the similarity was calculated. , and the source image additional collection method may be referred to as a shooting guide indicating a predetermined shooting condition as a shooting condition to achieve this.

도 8을 참조하면, 중앙에 배치된 제2 인물을 5대의 카메라가 촬영하고 있음을 확인할 수 있는바, 5대의 카메라 모두 FHD 1920 Х 1080 이상인 카메라를 사용할 것을 권장하며, 정면, 우측 상단° 20 내지 30° 중 어느 한 위치, 좌측 상단 20°내지 30° 중 어느 한 위치, 우측 하단 10° 내지 20° 중 어느 한 위치 및 좌측 하단 10° 내지 20° 중 어느 한 위치 각각에서, 제2 인물의 두 눈이 전부 보이도록 촬영하는 조건이 이상의 슛팅 가이드일 수 있고, 도 8에는 우측 상단과 좌측 상단에 30 ° 위치에서, 우측 하단과 좌측 하단에 15°위치에서 카메라가 예시적으로 촬영하고 있다. Referring to FIG. 8, it can be seen that 5 cameras are shooting the second person placed in the center. It is recommended that all 5 cameras use FHD 1920 Х 1080 or higher cameras, and the front and upper right angles are 20 to 20 degrees. At any one position of 30 °, any one of 20 ° to 30 ° in the upper left corner, any one of 10 ° to 20 ° in the lower right corner, and any one position of 10 ° to 20 ° in the lower left corner, respectively, two The shooting guide may be a condition of shooting so that all eyes are visible, and in FIG. 8, the camera is illustratively shooting at 30 ° positions on the upper right and upper left and 15 ° on the lower right and lower left.

이러한 슛팅 가이드는 전문 촬영기사에 의해 설치되는 전문 촬영 조건이 요구되지 않는 비교적 간단한 조건인바, 카메라는 해상도만 만족한다면 DSLR뿐만 아니라 스마트폰 카메라를 사용하여도 무방하고, 부드럽고 콘트라스트가 강하지 않은 부드러운 조명, 예를 들어 일반 형광등을 조명으로 사용할 수도 있으나, 흔들림 없이 트라이포드의 고정은 필요하다 할 것이다. This shooting guide is a relatively simple condition that does not require professional shooting conditions installed by a professional photographer. As long as the camera satisfies the resolution, it is okay to use a smartphone camera as well as a DSLR, soft lighting with soft and low contrast, For example, a general fluorescent lamp can be used as lighting, but it will be necessary to fix the tripod without shaking.

또한, 촬영의 경우 5대의 카메라를 각자의 위치에 설치한 상태에서 제2 인물을 동시에 촬영하는 것이 가장 바람직할 것이나, 1대의 카메라로 한 위치에서 촬영하고, 그 다음 위치를 옮겨 순차적으로 촬영하는 것도 가능할 것인바, 제2 인물이 최대한 움직이지 않는다는 전제가 필요하다 할 것이다. In addition, in the case of shooting, it would be most desirable to simultaneously shoot the second person with 5 cameras installed at their respective locations, but it is also possible to shoot at one location with one camera and then move to the next location and shoot sequentially. It would be possible, but it would be necessary to presume that the second person does not move as much as possible.

한편, 슛팅 가이드는 이상 설명한 5대의 카메라의 위치뿐만 아니라 제2 인물에 대하여 두 눈이 전부 보이는 상태에서 최대의 우측 위치 및 두 눈이 전부 보이는 상태에서 최대의 좌측 위치에서 촬영하는 조건, 즉 2대의 카메라를 더 포함하여 전체 7대의 카메라로 촬영하는 조건을 포함할 수 있을 것인바, 슛팅 가이드에 따른 촬영을 통해 수집되는 소스 이미지의 개수가 많아질 것이므로 카메라가 5대인 경우보다 양질의 소스 이미지를 수집할 수 있다 할 것이다. On the other hand, the shooting guide is not only the position of the five cameras described above, but also the condition of shooting at the maximum right position with both eyes visible and the maximum left position with both eyes visible for the second person, that is, two It will be possible to include the condition of shooting with a total of 7 cameras, including more cameras, so the number of source images collected through shooting according to the shooting guide will increase, so better quality source images are collected than in the case of 5 cameras can do it will do

이상 설명한 슛팅 가이드는 촬영하는 카메라에 대한 촬영 조건이며, 촬영 대상인 인물에 대한 촬영 조건, 보다 구체적으로 소스 이미지가 포함하는 제2 인물에 대한 촬영 조건을 더 포함할 수 있는바, 제2 인물에 대한 촬영 조건은 제2 인물에 대하여 2분 이상의 촬영 시간을 확보하되, 고개를 자연스럽게 상하/좌우로 천천히 돌리면서 촬영하고, 타겟 이미지가 포함하는 제1 인물의 표정과 최대한 동일하거나 유사한 표정을 짓되, 머리카락, 마이크, 손 등으로 인해 얼굴 전면 중 일부라도 가림 없이 이마와 눈썹이 나오도록 촬영하는 조건이 제2 인물에 대한 촬영 조건일 수 있으며, 이러한 제2 인물에 대한 촬영 조건과 카메라에 대한 촬영 조건 모두를 만족시킨 경우 가장 양질의 소스 이미지를 수집할 수 있다 할 것이다. The above-described shooting guide is a shooting condition for a shooting camera, and may further include a shooting condition for a person to be shot, more specifically, a shooting condition for a second person included in the source image. The shooting condition is to secure a shooting time of 2 minutes or more for the second person, but shoot while naturally turning the head up and down / left and right, and make a facial expression that is the same or similar to the expression of the first person included in the target image as much as possible. , The condition of shooting so that the forehead and eyebrows come out without covering any part of the front of the face due to the microphone, hand, etc. may be the shooting condition for the second person, and both the shooting condition for the second person and the shooting condition for the camera is satisfied, the highest quality source images can be collected.

도 9에는 슛팅 가이드에 따라 7대의 카메라가 촬영한 제2 인물의 소스 이미지를 예시적으로 도시한바, 모든 이미지 상에서 제2 인물의 두 눈이 전부 보이고 머리카락도 잘리지 않았으며, 얼굴 전면 중 일부도 가려지지 않았음을 확인할 수 있다. FIG. 9 exemplarily shows the source image of the second person taken by seven cameras according to the shooting guide. In all images, both eyes of the second person are visible, hair is not cut, and a part of the front face is covered. I can confirm that it is not supported.

이상 설명한 슛팅 가이드에 맞춰서 촬영된 소스 이미지는 S320 단계에서의 결정 결과에 따라 소스 이미지의 추가 수집을 위해 촬영된 소스 이미지일 수 있으나, S310 단계에서 수집하는 하나 이상의 소스 이미지 자체를 슛팅 가이드에 맞춰서 촬영한 이미지로 수집함으로써 S320 단계에서의 결정 결과가 소스 이미지의 추가 수집이 필요하지 않다고 결정해 곧바로 S330 단계가 수행되게 할 수도 있을 것이다. The source image captured according to the shooting guide described above may be a source image captured for additional collection of source images according to the decision result in step S320, but one or more source images collected in step S310 are taken according to the shooting guide. By collecting as one image, the decision result in step S320 may determine that additional collection of source images is not necessary, and step S330 may be performed immediately.

한편, S330 단계에 대한 설명으로 돌아가면, 얼굴 합성 모델에 타겟 이미지와 하나 이상의 소스 이미지를 포함하는 기초 데이터가 인풋 데이터로 입력됨으로써 아웃풋 데이터로서 타겟 이미지가 포함하는 제1 인물의 얼굴 및 표정에 소스 이미지가 포함하는 제2 인물의 얼굴을 합성한 얼굴 합성 이미지가 생성되는바, 여기서 얼굴 합성 모델은 공지된 딥페이크 기술이 적용된 생성적 적대 신경망(Generative Adversarial Network, GAN)을 포함하는 하나 이상의 기계 학습(Machine Learning, ML) 모델일 수 있으며, 도 1에 예시적으로 도시된 타겟 이미지와 소스 이미지를 얼굴 합성 모델에 입력하여 생성된 얼굴 합성 이미지를 도 10에 예시적으로 도시해 놓았다. Meanwhile, going back to the description of step S330, basic data including a target image and one or more source images is input to the face synthesis model as input data, and as output data, the face and expression of the first person included in the target image are sourced. A face synthesized image is generated by synthesizing the face of a second person included in the image, wherein the face synthesized model includes one or more machine learning including a generative adversarial network (GAN) to which a known deepfake technology is applied. (Machine Learning, ML) model, and a face synthesis image generated by inputting the target image and the source image exemplarily shown in FIG.

얼굴 및 표정 합성에 관한 S330 단계를 수행함에 이용되는 얼굴 합성 모델이 생성적 적대 신경망 모델이기에 생성자는 실제 데이터를 학습하고 이를 바탕으로 실제에 가까운 거짓 데이터를 생성하고, 감별자는 생성자가 생성한 데이터가 실제인지 거짓인지를 판별할 것인바, 얼굴 합성 모델의 사용이 반복될수록 생성자와 감별자의 경쟁이 심화되어 더욱 정교한 얼굴 합성 이미지의 생성이 가능할 것이나, 모델에 입력되는 기초 데이터 중 하나 이상의 소스 이미지가 앞서 설명한 슛팅 가이드에 맞춰서 촬영된 하나 이상의 소스 이미지인 경우, 얼굴 합성 모델의 학습 효율이 비약적으로 향상되어 다른 소스 이미지를 이용한 경우 대비, 같은 빈도의 사용에서 보다 완성도 높은 얼굴 합성 이미지를 생성할 수 있다 할 것이다. 이는 다른 소스 이미지를 이용한 후자의 경우 데이터의 문제로 인해 얼굴 일부가 생성되지 않는 등 얼굴 합성 이미지 생성에 치명적인 영향을 미칠 수가 있으며, 이 경우 데이터를 보강하여 학습을 다시 수행해야 할 것임에 반해, 슛팅 가이드에 맞춰서 촬영된 하나 이상의 소스 이미지를 이용한 경우 이러한 상황 자체가 발생할 여지가 없기 때문이다. Since the face synthesis model used in step S330 for face and expression synthesis is a generative adversarial network model, the generator learns real data and generates false data that is close to reality based on it, and the discriminator determines that the data generated by the generator is It will determine whether it is real or false. As the use of the face synthesis model is repeated, the competition between the generator and the discriminator intensifies, making it possible to create a more sophisticated face synthesis image, but one or more source images among the basic data input to the model In the case of one or more source images taken according to the described shooting guide, the learning efficiency of the face synthesis model is dramatically improved, and compared to the case of using other source images, it is possible to generate a more complete face synthesis image with the same frequency of use. will be. In the case of the latter using other source images, this can have a fatal effect on the generation of face synthesis images, such as a part of the face not being generated due to data problems. This is because there is no room for such a situation to occur when one or more source images taken according to the guide are used.

한편, 학습에 대한 보편적인 프로세스가 데이터 수집 - 데이터 처리 - 모델 학습을 따르기 때문에 S330 단계에서의 얼굴 합성 모델의 경우 S310 단계 및 S320 단계를 수 차례 반복함으로써 일정 수준 이상의 학습이 완료된 얼굴 합성 모델일 수 있으며, S310 단계 및 S320 단계가 최초로 수행됨으로써 그 이전까지 학습이 전혀 진행되지 않은 얼굴 합성 모델일 수 도 있고, 이 경우 S330 단계에서 얼굴 합성 이미지가 최초로 생성됨으로써 얼굴 합성 모델이 그 결과를 학습할 것이며, 그 이후에 수집되는 기초 데이터에 따라 생성되는 얼굴 합성 이미지는 그 이전의 학습 결과를 반영한다 할 것이다. On the other hand, since the general process for learning follows data collection - data processing - model learning, in the case of the face synthesis model in step S330, it may be a face synthesis model that has completed learning at a certain level by repeating steps S310 and S320 several times. And, as steps S310 and S320 are performed for the first time, it may be a face synthesis model in which learning has not progressed at all before that. In this case, as a face synthesis image is first generated in step S330, the face synthesis model will learn the result. , the synthesized face image generated according to the basic data collected after that will reflect the previous learning result.

S330 단계에 따라 얼굴 합성 이미지까지 생성했다면 얼굴 합성은 완료된 것으로 볼 수 있으나, 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법은 피부톤을 최적화하는 후보정 작업을 추가적으로 수행함으로써 종래의 딥페이크 기술들과는 차별화되는 완성도 높은 얼굴 합성 이미지를 생성할 수 있다. 이는 S340 단계에 관한 것이며, 이하 설명하도록 한다. If the face synthesis image has been generated in step S330, the face synthesis can be regarded as completed. It is possible to create a highly complete face composite image that is differentiated from other technologies. This relates to step S340 and will be described below.

얼굴 합성 이미지까지 생성했다면, 서비스 서버(100)는 생성한 얼굴 합성 이미지의 피부톤을 후보정한다(S340)If the synthesized face image has been created, the service server 100 additionally corrects the skin tone of the synthesized face image (S340).

특허 명세서에 도면으로 첨부됨으로 인한 해상도 문제에 따라 도 10에 도시된 얼굴 합성 이미지에서 명확하게 확인이 어려울 수도 있으나, S330 단계에 따라 생성된 얼굴 합성 이미지가 포함하는 인물을 제3 인물 - 제1 인물의 얼굴 및 표정에 제2 인물의 얼굴이 합성된 인물 - 이라 하였을 때, 제3 인물의 피부톤과 합성 대상 인물인 제1 인물의 피부톤은 미세하게라도 상이할 수 있으며, 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법의 S340 단계에 따르면 상이한 피부톤이 최적화될 수 있다. Although it may be difficult to clearly check in the face synthesis image shown in FIG. 10 due to resolution problems due to attachment as drawings in the patent specification, the person included in the face synthesis image generated in step S330 is the third person - the first person. When a person whose face and expression of a second person are synthesized as - , the skin tone of the third person and the skin tone of the first person, the person to be synthesized, may be slightly different, and the second embodiment of the present invention According to step S340 of the method for providing a face synthesis service according to, different skin tones may be optimized.

도 11은 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법에 있어서, 피부톤 후보정에 관한 S340 단계를 구체화한 순서도이다. FIG. 11 is a flowchart embodying step S340 of skin tone correction in the method of providing a face synthesis service according to the second embodiment of the present invention.

우선, 서비스 서버(100)가 S330 단계에서 생성된 얼굴 합성 이미지와 얼굴 합성 모델에 입력된 기초 데이터가 포함하는 타겟 이미지에서 소정의 랜드마크를 각각 제외시키고, 소정의 랜드마크를 제외시킨 각각의 이미지를 합성하여 랜드마크 제외 이미지를 생성한다(S340-1). First, the service server 100 excludes predetermined landmarks from the target image including the face synthesis image generated in step S330 and the basic data input to the face synthesis model, and each image from which the predetermined landmark is excluded is synthesized to generate an image excluding landmarks (S340-1).

이는 얼굴에서 랜드마크를 제외하고 피부톤만으로의 유사도를 산정하기 위한 것인바, 소위 랜드마크 제외 방법이라 할 수 있으며, 도 12를 참조하여 보다 자세히 설명하도록 한다. This is to calculate the similarity only with skin tone excluding landmarks from the face, and can be referred to as a so-called landmark exclusion method, which will be described in more detail with reference to FIG. 12 .

도 12는 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법에 있어서, 랜드마크 제외 방법에 관한 S340-1 단계를 구체화한 순서도이다. 12 is a flowchart embodying step S340-1 of a method for excluding landmarks in the method for providing a face synthesis service according to the second embodiment of the present invention.

우선, 서비스 서버(100)가 생성된 얼굴 합성 이미지와 얼굴 합성 모델에 입력된 기초 데이터가 포함하는 타겟 이미지 각각에서 복수 개의 랜드마크를 추출한다(S340-1-1).First, the service server 100 extracts a plurality of landmarks from each of the target images included in the generated face synthesis image and basic data input to the face synthesis model (S340-1-1).

여기서 추출하는 복수 개의 랜드마크는 앞서 S320-1 단계에서 설명한 복수 개의 랜드마크와 동일하기에 중복 서술을 방지하기 위해 자세한 설명은 생략하도록 하며, 도 13에 얼굴 합성 이미지에서 복수 개의 랜드마크를 추출한 모습을 예시적으로 도시해 놓았다. Since the plurality of landmarks extracted here are the same as the plurality of landmarks previously described in step S320-1, detailed descriptions are omitted to prevent redundant description. is illustrated by way of example.

얼굴 합성 이미지 및 타겟 이미지 각각에서 복수 개의 랜드마크를 추출했다면, 서비스 서버(100)가 추출한 복수 개의 랜드마크 각각에 대하여 눈썹, 눈, 코 및 입술의 윤곽을 남기고 나머지를 제거한다(S340-1-2). If a plurality of landmarks are extracted from each of the synthesized face image and the target image, the service server 100 leaves outlines of the eyebrows, eyes, nose, and lips for each of the extracted landmarks and removes the rest (S340-1- 2).

S340-1-1 단계에서 추출한 복수 개의 랜드마크는 앞선 도 5에서도 확인할 수 잇듯이 턱선과 콧구멍, 입 등과 같이 불필요한 랜드마크가 포함되어 있으며, S340-1-2 단계에서는 도 14에 예시적으로 도시한 바와 같이 이들 랜드마크를 제거하고 눈썹, 눈, 코 및 입술의 윤곽만을 남기는 것이며, 이러한 작업을 얼굴 합성 이미지와 타겟 이미지 각각에서 모두 수행한다. As can be seen in FIG. 5, the plurality of landmarks extracted in step S340-1-1 include unnecessary landmarks such as the jaw line, nostrils, and mouth, and in step S340-1-2, FIG. As shown, these landmarks are removed and only the contours of the eyebrows, eyes, nose, and lips are left, and these operations are performed in both the face synthesis image and each target image.

복수 개의 랜드마크 각각에 대하여 눈썹, 눈, 코 및 입술의 윤곽을 남기고 나머지를 제거했다면, 서비스 서버(100)가 나머지를 제거한 눈썹, 눈, 코 및 입술의 윤곽을 포함하는 랜드마크 각각에 대하여 눈썹에 대한 랜드마크의 위치를 조정한다(S340-1-3). If the contours of the eyebrows, eyes, nose, and lips are left for each of the plurality of landmarks and the remainder is removed, the service server 100 removes the eyebrows, eyebrows, and eyebrows for each of the landmarks including the outlines of the eyes, nose, and lips. Adjust the position of the landmark for (S340-1-3).

눈썹에 대한 랜드마크의 경우 보통 눈썹 윗부분을 기준으로 추출하기 때문에 눈썹 전체에 대하여 후술할 마스킹 처리하기에 적합하지 않은바, 이로 인해 눈썹에 대한 랜드마크의 위치를 조정해야 할 필요성이 있다. 이를 도 15에 예시적으로 도시해 놓았으며, 눈썹에 대한 랜드마크의 위치 조정은 S340-1-2 단계에서 나머지를 제거한 눈썹에 대한 랜드마크의 중간 지점(도 15의 상단 도면에서 눈썹 중간 붉은 지점)에서 나머지를 제거한 코의 최상단 랜드마크(도 15의 상단 도면에서 코 최상단 붉은 지점)의 "??*, 즉 상단에서 하단 방향으로 조정하는 것이고, 그 결과 도 15의 하단 도면과 같이 눈썹의 가운데로 눈썹에 대한 랜드마크가 이동하게 되며, 이러한 작업을 얼굴 합성 이미지와 타겟 이미지 각각에서 모두 수행한다. In the case of landmarks for the eyebrows, since they are usually extracted based on the upper part of the eyebrows, they are not suitable for masking the entire eyebrows, which will be described later. Therefore, it is necessary to adjust the position of the landmarks for the eyebrows. This is exemplarily shown in FIG. 15, and the adjustment of the position of the landmark for the eyebrow is the middle point of the landmark for the eyebrow with the rest removed in step S340-1-2 (the red point in the middle of the eyebrow in the upper drawing of FIG. 15). ) of the uppermost landmark of the nose (the uppermost red point of the nose in the upper drawing of FIG. 15) from which the rest is removed, “?? The landmark for the eyebrow is moved, and this operation is performed in both the face composite image and each target image.

눈썹에 대한 랜드마크의 위치를 조정했다면, 서비스 서버(100)가 위치를 조정한 눈썹에 대한 랜드마크와 눈, 코 및 입술의 윤곽을 포함하는 랜드마크의 윤곽선을 그리고 윤곽선의 내부를 채운다(S340-1-4). If the position of the landmark for the eyebrow is adjusted, the service server 100 draws the outline of the landmark including the contour of the eye, nose, and lips, and fills the inside of the contour (S340). -1-4).

위치를 조정한 눈썹에 대한 랜드마크와 눈, 코 및 입술의 윤곽을 포함하는 랜드마크의 윤곽선을 그리는 것은 윤곽선을 도시한다는 의미이며, 윤곽선의 내부를 채운다는 것은 도시한 윤곽선 내부의 빈 공간을 윤곽선과 동일한 색으로 색칠한다는 것인바, 이를 통해 후술할 마스킹 처리가 용이하게 수행될 수 있다. Drawing the outline of the landmarks, including the landmarks for the repositioned eyebrows and the outlines of the eyes, nose and lips, means drawing the outlines, and filling the inside of the outlines means filling the empty space inside the drawn outlines with the outlines. Since it is colored with the same color as, through this, a masking process to be described later can be easily performed.

이러한 작업을 얼굴 합성 이미지와 타겟 이미지 각각에서 모두 수행하는바, 윤곽선의 두께를 소정의 절대값으로 사용했을 경우 윤곽선의 도시 및 내부를 채운다면 얼굴 크기가 서로 상이한 두 장의 이미지를 첨부한 도 16과 같이 얼굴 전체가 가려지는 상황이 발생할 여지가 있으므로 윤곽선의 두께를 유동적으로 조절할 필요성이 있다. This operation is performed on both the face synthesis image and the target image, and when the thickness of the contour line is used as a predetermined absolute value, the outline is shown and filled in. FIG. 16 and FIG. Likewise, there is a possibility that the entire face is covered, so there is a need to flexibly adjust the thickness of the outline.

이를 위해 서비스 서버(100)는 생성된 얼굴 합성 이미지가 포함하는 제3 인물의 얼굴 크기 및 타겟 이미지가 포함하는 제1 인물의 얼굴 크기 각각에 따라 윤곽선의 두께를 유동적으로 조절할 수 있는바, 보다 구체적으로 생성된 얼굴 합성 이미지가 포함하는 제3 인물의 얼굴 크기, 타겟 이미지가 포함하는 제1 인물의 얼굴 크기 및 소스 이미지가 포함하는 제2 인물의 얼굴 크기는 턱 랜드마크의 양쪽 끝 지점의 거리를 유클리디안 거리를 이용하여 산정하고, 얼굴 크기에 대하여 산정한 유클리디안 거리를 랜드마크의 윤곽선의 두께에 대응시킴으로써 얼굴 크기에 따라 윤곽선의 두께가 유동적으로 조절될 수 있다. To this end, the service server 100 can flexibly adjust the thickness of the outline according to the face size of the third person included in the generated face composite image and the face size of the first person included in the target image, respectively. The face size of the third person included in the synthesized face image generated by , the face size of the first person included in the target image, and the face size of the second person included in the source image include the distances between the two end points of the chin landmark. The thickness of the contour line can be flexibly adjusted according to the size of the face by calculating the Euclidean distance and corresponding the Euclidean distance calculated for the face size to the thickness of the contour line of the landmark.

이는 보다 쉽게 설명하면 얼굴이 큰 경우, 유클리디안 거리도 크게 산정되며, 이 경우 윤곽선의 두께가 두껍게 조절된다는 것이며, 얼굴이 작은 경우, 유클리디안 거리도 작게 산정되며, 이 경우 윤곽선의 두께가 얇게 조절된다는 것인바, 도 16에 대하여 윤곽선의 두께를 얼굴 크기에 따른 유동값으로 대응시킨 도 17을 참조하면, 얼굴 크기가 서로 상이한 두 장의 이미지에 있어서, 얼굴 크기가 큰 좌측 이미지에 사용된 윤곽선의 두께가 얼굴 크기가 이보다 작은 우측 이미지에 사용된 윤곽선의 두께보다 일정 수준 두꺼움을 확인할 수 있다. To explain this more easily, if the face is large, the Euclidean distance is calculated to be large, and in this case, the thickness of the contour line is adjusted thicker. 17 in which the thickness of the contour line corresponds to the flow value according to the face size with respect to FIG. It can be seen that the thickness of is thicker at a certain level than the thickness of the outline used for the right image with a smaller face size.

윤곽선을 그리고 내부를 채우기까지 했다면, 서비스 서버(100)가 생성된 얼굴 합성 이미지와 얼굴 합성 모델에 입력된 기초 데이터가 포함하는 타겟 이미지에서 내부를 채우고 그린 랜드마크의 윤곽선을 각각 제외 - 마스킹 처리 - 시키고(S340-1-5), 내부를 채우고 그린 랜드마크의 윤곽선을 각각 제외시킨 얼굴 합성 이미지및 타겟 이미지와 합성 영역 윤곽 마스크 이미지를 전부 합성하여 랜드마크 제외 이미지를 생성한다(S340-1-6).If the outline is drawn and the inside is filled in, the service server 100 fills in the inside and excludes the outline of the drawn landmark from the target image including the generated face synthesis image and the basic data input to the face synthesis model - Masking process - (S340-1-5), and fills in the inside and excludes the contours of the drawn landmarks, and synthesizes the face synthesis image, the target image, and the composite region contour mask image to create a landmark exclusion image (S340-1-6). ).

앞서 설명을 보류했던 마스킹 처리란 특정 영역을 가리거나 제외시키는 이미지 처리 기법의 일종인바, 생성된 얼굴 합성 이미지와 얼굴 합성 모델에 입력된 기초 데이터가 포함하는 타겟 이미지에서 내부를 채우고 그린 랜드마크의 윤곽선을 각각 제외 - 마스킹 처리 - 시키는 경우 순수한 피부색만 남은 얼굴 이미지를 획득할 수 있으며, 도 18의 상단 좌측부터 합성 영역 윤곽 마스크 이미지, 순수한 피부색만 남은 타겟 이미지, 순수한 피부색만 남은 얼굴 합성 이미지를 확인할 수 있으며, 하단에 이들 3개의 이미지를 합성하여 생성한 랜드마크 제외 이미지를 확인할 수 있다. The masking process, which has been withheld from the previous description, is a type of image processing technique that covers or excludes a specific area. The outline of the landmark drawn by filling the inside of the generated face synthesis image and the target image including the basic data input to the face synthesis model In the case of excluding -masking-processing-, a face image with only pure skin color remaining can be obtained, and from the upper left of FIG. At the bottom, you can see an image excluding landmarks created by synthesizing these three images.

이상 설명한 S340-1-1 단계 내지 S340-1-6 단계에 따라 생성된 랜드마크 제외 이미지는 랜드마크가 제외된 상태에서 3개의 이미지 합성에 따른 순수한 피부색만을 나타내고 있는바, 랜드마크 제외 이미지는 피부색 최적화의 기초 데이터로 사용되며, 다시 도 11에 대한 설명으로 돌아가도록 한다. The landmark-excluded image generated according to steps S340-1-1 to S340-1-6 described above shows only pure skin color according to the synthesis of three images in a state where landmarks are excluded, and the landmark-excluded image is skin color It is used as basic data for optimization, and returns to the description of FIG. 11 again.

랜드마크 제외 이미지를 생성했다면, 서비스 서버(100)가 생성한 랜드마크 제외 이미지와 기초 데이터가 포함하는 타겟 이미지, 상기 랜드마크 제외 이미지와 얼굴 합성 이미지 각각에 하나 이상의 색상 모델을 적용하여 피부톤 유사도를 산정한다(S340-2). If the landmark exclusion image is generated, the skin tone similarity is determined by applying one or more color models to each of the landmark exclusion image generated by the service server 100, the target image including the basic data, and the landmark exclusion image and face synthesis image. Calculate (S340-2).

여기서 하나 이상의 색상 모델은 공지된 색상 모델인 Gray 모델, RGB 모델, HSV(conic) 모델, HSV(cylindric) 모델 및 YCbCr(YUV) 모델 중 어느 하나 이상일 수 있으며, 이들 모델을 이용하여 랜드마크 제외 이미지와 기초 데이터가 포함하는 타겟 이미지의 픽셀별로, 랜드마크 제외 이미지와 얼굴 합성 이미지의 픽셀별로 피부톤 유사도를 산정하는바, 각각의 모델의 개념도를 도 19에 예시적으로 도시하였으며, 세부적으로 설명하면, Gray 모델의 경우 밝기 정보만을 이용하기에 유사도의 산정은 밝기값(g) 차의 절대값인 |g1-g2|로, RGB 모델의 경우 두 픽셀의 r, g, b 값의 차이의 유클리디안 거리인

로, HSV(conic) 모델의 경우 hue(색조), saturation(채도), value(밝기) 3가지 성분으로 색을 표한하되, conic 모델 HSV 대응 공간 좌표로 변환한 x=s * cos(2π * h/255) * v/255, y=s * sin(2π * h/255) * v/255, z=v에서

로, HSV(cylindric) 모델의 경우 hue(색조), saturation(채도), value(밝기) 3가지 성분으로 색을 표한하되, cylindric 모델 HSV 대응 공간 좌표로 변환한 x=s * cos(2π * h/255), y=s * sin(2π * h/255), z=v에서

로, 마지막 YCbCr(YUV) 모델의 경우 RGB 에서 밝기 정보(y)와 색상 정보(Cb, Cr)을 분리하여 표현하는 색상 모델인바, 에서

로 피부톤 유사도를 산정할 수 있을 것이다. Here, the one or more color models may be any one or more of known color models such as the Gray model, the RGB model, the HSV (conic) model, the HSV (cylindric) model, and the YCbCr (YUV) model. The skin tone similarity is calculated for each pixel of the target image included in and the basic data, and for each pixel of the landmark exclusion image and the face synthesis image. A conceptual diagram of each model is exemplarily shown in FIG. In the case of the Gray model, since only brightness information is used, the degree of similarity is calculated by |g1-g2|, which is the absolute value of the difference in brightness values (g). street person

, in the case of the HSV (conic) model, the color is represented by three components: hue, saturation, and value, but x=s * cos(2π * h converted to spatial coordinates corresponding to the conic model HSV /255) * v/255, y=s * sin(2π * h/255) * v/255, z=v

, in the case of the HSV (cylindric) model, the color is represented by three components: hue, saturation, and value (brightness), but x=s * cos(2π * h converted to spatial coordinates corresponding to the cylindrical model HSV /255), y=s * sin(2π * h/255), z=v

In the case of the last YCbCr (YUV) model, it is a color model that separates brightness information (y) and color information (Cb, Cr) from RGB,

It will be possible to calculate the skin tone similarity.

피부톤 유사도까지 산정했다면, 서비스 서버(100)가 하나 이상의 색상 모델을 함수로 하되, 함수에 입력되는 하나 이상의 변수들에 대하여 최적화 기법을 적용해 산정한 피부톤 유사도를 최소화시키는 변수를 탐색한다(S340-3). If skin tone similarity is calculated, the service server 100 uses one or more color models as a function, and searches for a variable that minimizes the calculated skin tone similarity by applying an optimization technique to one or more variables input to the function (S340- 3).

최적화(Optimization)이라 함은 어떤 임의의 함수 f(x)에 대하여, 그 값을 가장 크게 또는 작게 하는 해를 산정하는 과정이며, x는 이를 위해 함수에 대입되는 변수인바, S340-3 단계에서의 함수는 앞선 하나 이상의 색상 모델인 Gray 모델, RGB 모델, HSV(conic) 모델, HSV(cylindric) 모델 및 YCbCr(YUV) 모델 중 어느 하나 이상이 될 수 있고, x는 얼굴 합성시에 사용될 수 잇는 변수들로써 Blur/Sharpening 적용 강도, Super resolution 적용 강도 및 WCT2, GFPGAN, Histogram-matching을 포함하는 Color transfer method 종류 중 어느 하나 이상일 수 있다. Optimization is the process of calculating a solution that makes the value the largest or smallest for any function f(x), and x is a variable that is substituted into the function for this purpose, in step S340-3 The function may be any one or more of the above one or more color models, such as the Gray model, the RGB model, the HSV (conic) model, the HSV (cylindric) model, and the YCbCr (YUV) model, and x is a variable that can be used during face synthesis. As examples, it may be any one or more of the types of color transfer methods including Blur/Sharpening applied intensity, Super resolution applied intensity, and WCT2, GFPGAN, and Histogram-matching.

S340-3 단계에서는 이상 언급한 x의 값을 다양하게 변경해가면서 앞서 산정한 피부톤 유사도를 최소화시키는 변수를 찾아가는바, 최적화 기법과 관련하여 공지된 기법인 Bayesian optimization 기법을 적용할 수 있다. In step S340-3, a variable that minimizes the previously calculated skin tone similarity is searched for while changing the value of x mentioned above in various ways, and Bayesian optimization technique, a known technique related to the optimization technique, can be applied.

여기서 Bayesian optimization 기법이란 이전까지의 사전 지식(일종의 실험 결과)를 반영해가면서 변수들을 탐색해가는 최적화 기법인바, 현재까지 조사된 데이터를 토대로 목적 함수를 추정하는 모델인 Surrogate 모델과 추정된 모델을 토대로 넣으면 좋을 법한 입력 데이터를 추천하는 함수인 Acquisition 함수를 포함하며, Acquisition 함수가 추천하는 최종적인 입력 데이터가 피부톤 유사도를 최소화시키는 변수에 해당할 수 있다. Here, the Bayesian optimization technique is an optimization technique that explores variables while reflecting prior knowledge (a kind of experiment result). It includes an Acquisition function, which is a function that recommends input data that is good to be inserted, and the final input data recommended by the Acquisition function may correspond to a variable that minimizes skin tone similarity.

피부톤 유사도를 최소화시키는 변수를 탐색했다면, 마지막으로 탐색한 변수를 생성한 얼굴 합성 이미지에 적용시킨다(S340-4). If a variable that minimizes skin tone similarity is searched for, the last searched variable is applied to the created face synthesis image (S340-4).

S340-3 단계에서 탐색한 변수는 어떠한 색상 모델을 채택하였는지, 그리고 어떠한 변수들을 채택하여 변수를 탐색했는지에 따라 상이해질 수 있을 것인바, 서비스 서버(100)의 프로세서(10)를 병렬 프로세싱이 가능한 고성능 프로세서로 구현함으로써 모든 색상 모델에 대하여 모든 변수들을 Bayesian optimization 기법에 적용시켜 피부톤 유사도를 최소화시키는 변수를 각각 탐색하고, 이들 탐색한 변수들 모두를 얼굴 합성 이미지에 적용시키거나, 사용자 입력에 따라 색상 모델 및 변수들을 선택하여 탐색한 변수만을 얼굴 합성 이미지에 적용시킬 수도 있을 것이다. The variables searched in step S340-3 may be different depending on which color model is adopted and which variables are adopted to search for variables, so that the processor 10 of the service server 100 can perform parallel processing By implementing with a high-performance processor, all variables for all color models are applied to the Bayesian optimization technique to search for variables that minimize skin tone similarity, and all of these searched variables are applied to the face synthesis image, or color based on user input It may be possible to apply only the variables searched for by selecting the model and variables to the synthesized face image.

지금까지 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법에 대하여 설명하였다. 본 발명에 따르면 얼굴 변경 대상 인물인 제1 인물을 포함하는 타겟 이미지 및 변경할 얼굴의 대상 인물인 제2 인물을 포함하는 하나 이상의 소스 이미지를 포함하는 기초 데이터를 얼굴 합성 모델에 입력하기 이전에 유사도를 산정하는 전처리 프로세스를 거치게 함으로써 소스 이미지의 추가 수집 필요 여부를 결정하고, 필요하다고 결정되면 본 발명만의 독자적인 슛팅 가이드에 맞춰서 촬영된 하나 이상의 고품질 소스 이미지를 추가로 수집하여 얼굴 합성 모델에 입력하는바, 생성적 적대 신경망 모델을 이용하는 얼굴 합성 모델의 학습 효율을 비약적으로 향상시킴으로써 완성도 높은 얼굴 합성 서비스를 제공할 수 있다. 또한, 생성적 적대 신경망 모델을 이용하는 얼굴 합성 모델을 채택함으로써 얼굴 합성 모델의 사용이 반복될수록 생성자와 감별자의 경쟁이 심화되어 더욱 정교한 얼굴 합성 이미지의 생성이 가능할 것이나, 생성한 얼굴 합성 이미지에 대하여 피부톤을 최적화시키는 후보정 프로세스를 부가함으로써 종래의 딥페이크 기술 대비, 보다 완성도 높은 합성 서비스를 제공할 수 있다. So far, the method for providing a face synthesis service according to the second embodiment of the present invention has been described. According to the present invention, before inputting basic data including one or more source images including a target image including a first person as a face change target person and a second person as a target person for the face to be changed to a face synthesis model, the degree of similarity is determined. It is determined whether additional collection of source images is necessary by passing through a preprocessing process of calculating, and if it is determined that it is necessary, one or more high-quality source images taken according to the shooting guide unique to the present invention are additionally collected and input to the face synthesis model. , it is possible to provide a highly complete face synthesis service by dramatically improving the learning efficiency of a face synthesis model using a generative adversarial neural network model. In addition, by adopting a face synthesis model using a generative adversarial neural network model, as the use of the face synthesis model is repeated, the competition between the creator and the discriminator intensifies, enabling the creation of a more sophisticated face synthesis image. By adding a post-correction process that optimizes, it is possible to provide a more complete synthesis service compared to the conventional deepfake technology.

한편, 본 발명의 제1 실시 예에 따른 얼굴 합성 서비스를 제공하는 장치(100) 및 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법은 모든 기술적 특징을 동일하게 포함하는 본 발명의 제3 실시 예에 따른 컴퓨터로 판독 가능한 매체에 저장된 컴퓨터 프로그램으로 구현할 수도 있는바, 이 경우 컴퓨팅 장치와 결합하여 (AA) 얼굴 변경 대상 인물인 제1 인물을 포함하는 타겟 이미지 및 변경할 얼굴의 대상 인물인 제2 인물을 포함하는 하나 이상의 소스 이미지를 포함하는 기초 데이터를 수집하는 단계, (BB) 상기 수집한 기초 데이터를 전처리하여 상기 소스 이미지의 추가 수집 필요 여부를 결정하는 단계, (CC) 상기 (BB) 단계의 결정 결과 상기 소스 이미지의 추가 수집이 필요하지 않다면, 상기 전처리한 기초 데이터를 얼굴 합성 모델에 입력하여 상기 타겟 이미지가 포함하는 제1 인물의 얼굴 및 표정에 상기 소스 이미지가 포함하는 제2 인물의 얼굴을 합성한 얼굴 합성 이미지를 생성하는 단계 및 (DD) 상기 생성한 얼굴 합성 이미지의 피부톤을 후보정하는 단계를 실행할 수 있을 것이며, 중복 서술을 위해 자세히 기재하지는 않았지만 본 발명의 제1 실시 예에 따른 얼굴 합성 서비스를 제공하는 장치(100) 및 본 발명의 제2 실시 예에 따른 얼굴 합성 서비스를 제공하는 방법에 적용된 모든 기술적 특징은 본 발명의 제3 실시 예에 따른 컴퓨터로 판독 가능한 매체에 저장된 컴퓨터 프로그램에 모두 동일하게 적용될 수 있음은 물론이라 할 것이다. On the other hand, the apparatus 100 for providing a face synthesis service according to the first embodiment of the present invention and the method for providing a face synthesis service according to the second embodiment of the present invention include all technical features equally. It can also be implemented as a computer program stored in a computer-readable medium according to the third embodiment. In this case, in combination with a computing device, (AA) a target image including a first person who is a face change target person and a target person whose face to be changed Collecting basic data including one or more source images including a second person, (BB) pre-processing the collected basic data to determine whether additional collection of the source image is necessary, (CC) the ( As a result of the decision in step BB), if additional collection of the source image is not required, the preprocessed basic data is input to a face synthesis model, and the face and expression of the first person included in the target image are included in the source image. 2 It will be possible to execute the step of generating a synthesized face image by synthesizing the face of the person and (DD) the step of post-correcting the skin tone of the generated synthesized face image. Although not described in detail for redundant description, the first embodiment of the present invention All technical features applied to the device 100 for providing a face synthesis service according to the example and the method for providing a face synthesis service according to the second embodiment of the present invention are computer readable media according to the third embodiment of the present invention. Of course, it will be said that the same can be applied to all computer programs stored in .

이상 첨부된 도면을 참조하여 본 발명의 실시 예들을 설명하였지만, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.Although the embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains can be implemented in other specific forms without changing the technical spirit or essential features of the present invention. you will be able to understand Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting.

10: 프로세서
20: 네트워크 인터페이스
30: 메모리
40: 스토리지
41: 컴퓨터 프로그램
50: 정보 버스
100: 서비스 서버10: Processor
20: network interface
30: memory
40: storage
41: computer program
50: information bus
100: service server

Claims

A method for providing a face synthesis service through a device including a processor and a memory,
(a) collecting basic data including one or more source images including a target image including a first person as a face to be changed and a second person as a target of a face to be changed;
(b) determining whether additional collection of the source image is necessary by pre-processing the collected basic data;
(c) If, as a result of the determination in step (b), additional collection of the source image is not required, the preprocessed basic data is input to a face synthesis model to apply the source image to the face and expression of the first person included in the target image. generating a synthesized face image in which a face of a second person included in the image is synthesized; and
(d) post-correction of the skin tone of the synthesized face image;
A method for providing a face synthesis service comprising a.

According to claim 1,
The source image of step (a) is
One or more images captured in accordance with a shooting guide, which is a predetermined shooting condition,
The shooting guide,
With a camera with a resolution of FHD 1920 Χ 1080 or higher, any one position among the front, upper right corner of 20 to 30 °, upper left position of 20 ° to 30 °, and lower right corner of the second person from 10 ° to 20 ° A condition in which both eyes of the second person are photographed so that both eyes of the second person are visible at one position and at any one position from 10 ° to 20 ° in the lower left corner,
How to provide face synthesis service.

According to claim 2,
The shooting guide,
Further comprising a condition of photographing the second person at the maximum right position with both eyes visible and the maximum left position with both eyes visible,
How to provide face synthesis service.

According to claim 2,
The shooting guide,
Further comprising a shooting condition for the second person,
The shooting condition for the second person is,
Shooting time of 2 minutes or more for the second person, shooting while slowly turning the head up and down / left and right, an expression identical / similar to that of the first person included in the target image, and a part of the front face of the face. ,
How to provide face synthesis service.

According to claim 1,
In step (b),
(b-1) extracting a plurality of landmarks from a face of a first person included in the target image and a face of a second person included in the source image;
(b-2) calculating the roll, pitch, and yaw of the face of the first person using a plurality of landmarks extracted from the face of the first person, and calculating roll, pitch, and yaw of the face of the second person using a plurality of landmarks extracted from the face;
(b-3) calculating a similarity between the target image and the source image using the calculated roll, pitch, and yaw of the face of the first person and the calculated roll, pitch, and yaw of the face of the second person; and
(b-4) determining whether the calculated similarity between the target image and the source image is greater than or equal to a threshold value;
A method for providing a face synthesis service comprising a.

According to claim 5,
After the step (b-4),
(b-5) determining that additional collection of the source image is required if the result of the determination is greater than or equal to the threshold value; and
(b-6) determining that additional collection of the source image is not required if the value is less than the threshold value as a result of the determination;
A method for providing a face synthesis service further comprising any one of steps.

According to claim 5,
The similarity calculation in step (b-3) is,
Calculated using the Euclidean distance,
How to provide face synthesis service.

According to claim 1,
The face synthesis model in step (c),
One or more machine learning (ML) models including a generative adversarial network (GAN),
How to provide face synthesis service.

According to claim 1,
In step (d),
(d-1) Excluding predetermined landmarks from the target image including the generated face synthesis image and the basic data input to the face synthesis model, respectively, and each image and synthesis area from which the predetermined landmark was excluded generating a landmark exclusion image by synthesizing contour mask images;
(d-2) calculating skin tone similarity by applying one or more color models to each of the generated landmark-excluding image, the target image included in the basic data, and the landmark-excluding image and face synthesized image;
(d-3) using the one or more color models as a function and searching for a variable that minimizes the calculated skin tone similarity by applying an optimization technique to one or more variables input to the function; and
(d-4) applying the searched variable to the generated synthesized face image;
A method for providing a face synthesis service comprising a.

According to claim 9,
In the step (d-1),
(d-1-1) extracting a plurality of landmarks from each of the target images included in the generated face synthesis image and basic data input to the face synthesis model;
(d-1-2) leaving outlines of the eyebrows, eyes, nose, and lips for each of the plurality of extracted landmarks and removing the rest;
(d-1-3) adjusting the position of the landmarks relative to the eyebrows with respect to each of the landmarks including the contours of the eyebrows, eyes, nose, and lips from which the remainder is removed;
(d-1-4) drawing contour lines of landmarks including the contours of the eye, nose, and lips, and filling the inside of the contours with respect to the adjusted eyebrows;
(d-1-5) excluding outlines of landmarks drawn by filling in the insides from the target image including the generated face synthesis image and basic data input to the face synthesis model - masking process; and
(d-1-6) generating a landmark-excluded image by synthesizing a face synthesized image, a target image, and a synthesized region contour mask image in which the contours of the drawn landmarks are excluded from filling the inside, respectively;
A method for providing a face synthesis service comprising a.

According to claim 10,
Adjusting the position of the landmark with respect to the eyebrows in step (d-1-3),
Adjusting "? towards?" of the uppermost landmark of the nose from which the remainder is removed at the midpoint of the landmark for the eyebrow from which the remainder is removed,
How to provide face synthesis service.

According to claim 10,
The outline of the landmark drawn in step (d-1-4) is,
The face size of the person included in the generated face synthesis image - a person whose face and expression of the first person are synthesized with the face of the second person - and the thickness of the face of the first person included in the target image, respectively. dynamically controlled,
How to provide face synthesis service.

According to claim 12,
The face size of the person included in the generated synthesized face image and the face size of the first person included in the target image are calculated by using the Euclidean distance between both end points of the chin landmark, respectively, and the face Corresponding the Euclidean distance calculated for each size to the thickness of the outline of the landmark,
How to provide face synthesis service.

According to claim 9,
One or more color models in step (d-2),
Any one or more of a Gray model, an RGB model, a HSV (conic) model, a HSV (cylindric) model, and a YCbCr (YUV) model,
How to provide face synthesis service.

According to claim 9,
One or more variables in step (d-3) are,
Any one or more of the following types of color transfer methods including Blur/Sharpening applied intensity, Super resolution applied intensity, and WCT2, GFPGAN, and Histogram-matching,
How to provide face synthesis service.

According to claim 9,
The optimization technique in step (d-3) is,
Bayesian optimization technique
How to provide face synthesis service.

one or more processors;
network interface;
a memory for loading a computer program executed by the processor; and
Including storage for storing large-capacity network data and the computer program,
The computer program by the one or more processors,
(A) an operation for collecting basic data including one or more source images including a target image including a first person as a face change target person and a second person as a target person for the face to be changed;
(B) an operation of determining whether additional collection of the source image is necessary by pre-processing the collected basic data;
(C) If it is determined that the source image is not additionally collected as a result of the operation (B), the preprocessed basic data is input to a face synthesis model, and the source image includes the face and expression of the first person included in the target image. an operation of generating a synthesized face image obtained by synthesizing a face of a second person included in the image; and
(D) an operation of post-correcting a skin tone of the generated synthesized face image;
A device that provides a face synthesis service that runs

Combined with a computing device,
(AA) collecting basic data including one or more source images including a target image including a first person as a face to be changed and a second person as a target of a face to be changed;
(BB) determining whether additional collection of the source image is necessary by pre-processing the collected basic data;
(CC) If, as a result of the determination in step (BB), additional collection of the source image is not required, the preprocessed basic data is input to a face synthesis model, and the source image includes the face and expression of the first person included in the target image. generating a synthesized face image in which a face of a second person included in the image is synthesized; and
(DD) post-correction of the skin tone of the synthesized face image;
to run,
A computer program stored on a computer readable medium.