KR20210147626A

KR20210147626A - Apparatus and method for synthesizing 3d face image using competitive learning

Info

Publication number: KR20210147626A
Application number: KR1020200065122A
Authority: KR
Inventors: 이상훈; 강지우; 이성민
Original assignee: 연세대학교 산학협력단
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2021-12-07
Also published as: KR102422822B1

Abstract

The present invention provides a three-dimensional (3D) face image synthesizing apparatus using competitive learning and a method thereof. According to the present invention, the 3D face image synthesizing apparatus comprises: a face model alignment unit receiving a two-dimensional (2D) face image to be synthesized into a 3D face image, extracting matching parameters from a pattern of the 2D face image according to a pre-learned pattern estimation method, and aligning a predetermined 3D face model to the 2D face image to acquire a UV map; and a UV map completion unit estimating and filling a hole area generated in the UV map by an unscanned area of the 2D face image according to the pre-learned pattern estimation method on the basis of the horizontal symmetry of the face to acquire a compensated UV map, and extracting fine features of the compensated UV map and restoring the UV map reflecting the fine features to acquire a completed UV map. Since the face model alignment unit and the UV map completion unit are trained in a competitive learning method to reduce generation loss and discrimination loss, even if a 2D face image captured for the entire face is not provided, the apparatus is learned on the basis of the energy of the completed UV map to finely and accurately restore even an uncaptured occlusion area.

Description

Apparatus and method for synthesizing a three-dimensional face image using competitive learning

본 발명은 3차원 얼굴 이미지 합성 장치 및 방법에 관한 것으로, 얼굴 정렬 및 색상 완성 사이의 경쟁적 학습을 이용한 3차원 얼굴 이미지 합성 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for synthesizing a three-dimensional face image, and to an apparatus and method for synthesizing a three-dimensional face image using competitive learning between face alignment and color completion.

얼굴 모델은 얼굴 합성, 3D 애니메이션 및 이미지 편집과 같은 다양한 응용 분야에서 사용되고 있다. 이에 인간과 같은 3D 얼굴을 구축하는 것은 컴퓨터 비전 및 컴퓨터 그래픽에서 중요한 이슈이다.Face models are used in a variety of applications such as face compositing, 3D animation, and image editing. Building a human-like 3D face is therefore an important issue in computer vision and computer graphics.

3D 얼굴 및 질감의 통계 모델인 3DMM(3D Morphable Model)은 2D 얼굴 이미지의 대상에서 3D 얼굴 이미지를 얻는 데 가장 널리 사용되는 모델이다. 스테레오 카메라와 같은 3D 센서의 출현으로 정확하고 많은 수의 3D 얼굴 데이터 세트를 수집할 수 있기 때문에, 3DMM은 자연스러운 다양한 얼굴 모양을 표현할 수 있는 강력한 기능을 가지고 있다.3D Morphable Model (3DMM), a statistical model of 3D faces and textures, is the most widely used model for obtaining 3D face images from objects of 2D face images. With the advent of 3D sensors such as stereo cameras, it is possible to collect accurate and large number of 3D face data sets, so 3DMM has a powerful ability to express various natural facial shapes.

그러나 2D 얼굴 이미지는 제한된 환경에서 캡쳐되므로, 3D 얼굴 모델을 이미지에 정확하게 정렬시키기 어렵다. 그리고 3D 얼굴 모델은 메쉬 타입으로 얼굴을 표현하므로, 3DMM 얼굴 모델 이미지에 정렬시키더라도 2D 얼굴 이미지에서 캡쳐되지 않은 영역에 대해서는 표현하기 어렵다는 한계가 있다. 뿐만 아니라, 3DMM의 표현 방식은 수염이나, 주름 및 점과 같은 얼굴 질감을 세밀하게 표현하기에는 불충분하다.However, since the 2D face image is captured in a limited environment, it is difficult to accurately align the 3D face model to the image. In addition, since the 3D face model expresses a face in a mesh type, there is a limitation in that it is difficult to express an area that is not captured in the 2D face image even if it is aligned with the 3DMM face model image. In addition, the expression method of 3DMM is insufficient to accurately express facial textures such as beards, wrinkles, and dots.

따라서 3D 얼굴 모델을 얼굴 이미지에 정확하게 정렬시킬 수 있고, 얼굴 이미지에서 캡쳐되지 않은 영역에 대해서도 자연스럽게 복원할 수 있을 뿐만 아니라, 세밀한 질감 표현이 가능한 3D 얼굴 이미지 합성 기법이 요구되고 있다.Therefore, there is a need for a 3D face image synthesis technique that can accurately align a 3D face model to a face image, can restore natural areas that are not captured in a face image, and can express detailed textures.

한국 공개 특허 제10-2018-0082170호 (2018.07.18 공개)Korean Patent Publication No. 10-2018-0082170 (published on July 18, 2018)

본 발명의 목적은 얼굴 모델을 얼굴 이미지에 정확하게 정렬시켜 UV 맵을 획득하고, 획득된 UV 맵을 이용하여 2D 얼굴 이미지에서 캡쳐되지 않은 영역까지 세밀하게 표현할 수 있는 3차원 얼굴 이미지 합성 장치 및 방법을 제공하는데 있다.An object of the present invention is to obtain a UV map by accurately aligning a face model to a face image, and to provide an apparatus and method for synthesizing a three-dimensional face image that can be used to accurately express even areas not captured in a 2D face image using the obtained UV map. is to provide

본 발명의 다른 목적은 3D 얼굴 모델을 2D 얼굴 이미지에 정렬시키기 위해 2D 얼굴 이미지에 랜드마크나 특징점과 같은 조건을 설정할 필요가 없으며, 모든 얼굴 영역에 대해 캡쳐된 2D 얼굴 이미지가 학습을 위해 제공되지 않더라도, 용이하게 학습되어 얼굴 질감을 복원할 수 있는 3차원 얼굴 이미지 합성 장치 및 방법을 제공하는데 있다.Another object of the present invention is that it is not necessary to set conditions such as landmarks or feature points in the 2D face image to align the 3D face model to the 2D face image, and the 2D face images captured for all face regions are not provided for learning. An object of the present invention is to provide an apparatus and method for synthesizing a three-dimensional face image that can be easily learned and can restore facial texture.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 3차원 얼굴 이미지 합성 장치는 3D 얼굴 이미지로 합성하고자 하는 2D 얼굴 이미지를 인가받아 미리 학습된 패턴 추정 방식에 따라 상기 2D 얼굴 이미지의 패턴으로부터 매칭 파라미터를 추출하여 기지정된 3D 얼굴 모델을 상기 2D 얼굴 이미지에 정렬시켜 UV 맵을 획득하는 얼굴 모델 정렬부; 및 미리 학습된 패턴 추정 방식에 따라 상기 2D 얼굴 이미지의 스캔되지 않은 영역에 의해 상기 UV 맵에 발생된 홀 영역을 얼굴의 수평 대칭성에 기반하여 추정하여 채워 보상 UV 맵을 획득하고, 보상 UV 맵의 미세 특징을 추출하여, 미세 특징이 반영된 UV 맵을 복원함으로써 UV 완성 맵을 획득하는 UV 맵 완성부를 포함한다.In order to achieve the above object, a three-dimensional face image synthesizing apparatus according to an embodiment of the present invention receives a 2D face image to be synthesized into a 3D face image and matches from the pattern of the 2D face image according to a pre-learned pattern estimation method a face model aligning unit that extracts parameters and aligns a predetermined 3D face model with the 2D face image to obtain a UV map; and a hole area generated in the UV map by an unscanned area of the 2D face image according to a pre-learned pattern estimation method is estimated based on the horizontal symmetry of the face to obtain a compensation UV map, and a compensation UV map is obtained. and a UV map completion unit that extracts fine features and obtains a UV complete map by restoring the UV map reflecting the fine features.

상기 얼굴 모델 정렬부는 패턴 추정 방식이 미리 학습된 인공 신경망으로 구현되어, 상기 2D 얼굴 이미지의 패턴으로부터 상기 3D 얼굴 모델을 상기 2D 얼굴 이미지에 정렬시키기 위해 요구되는 형상 변화와 위치 및 지향 방향을 나타내는 상기 매칭 파라미터를 획득하는 UV 인코더; 상기 매칭 파라미터에 따라 상기 3D 얼굴 모델을 상기 2D 얼굴 이미지에 정렬시키고, 상기 2D 얼굴 이미지를 상기 3D 얼굴 모델에 맵핑한 후, 맵핑된 3D 얼굴 모델을 기지정된 2차원의 UV 공간 상에 전개하여 상기 UV 맵을 획득하는 UV 맵핑부; 및 상기 UV 맵에서 상기 2D 얼굴 이미지의 스캔되지 않은 영역에 의해 발생되는 홀 영역을 검출하여 홀 영역에 대응하는 마스크를 생성하는 마스크 획득부를 포함할 수 있다.The face model aligning unit is implemented as an artificial neural network in which a pattern estimation method has been previously learned, and represents the shape change, position, and orientation direction required to align the 3D face model to the 2D face image from the pattern of the 2D face image. UV encoder to obtain matching parameters; According to the matching parameter, the 3D face model is aligned to the 2D face image, the 2D face image is mapped to the 3D face model, and the mapped 3D face model is developed on a predetermined two-dimensional UV space to develop the UV mapping unit to obtain a UV map; and a mask acquirer configured to detect a hole area generated by an unscanned area of the 2D face image in the UV map and generate a mask corresponding to the hole area.

상기 UV 맵 완성부는 상기 UV 맵과 상기 마스크를 결합하고, 미리 학습된 패턴 추정 방식에 따라 마스크가 결합된 UV 맵의 패턴으로부터 얼굴의 수평 대칭성에 기반하는 반사 계수를 추정하여 상기 UV 맵에 발생된 홀 영역의 색상을 채우고, 마스크가 결합된 UV 맵의 패턴으로부터 상기 홀 영역을 포함하는 상기 UV 맵의 조명에 따른 명암을 추출하여 채워 상기 보상 UV 맵을 획득하는 UV 맵 개략 추정부; 및 상기 보상 UV 맵과 상기 마스크를 결합하고, 미리 학습된 패턴 추정 방식에 따라 마스크가 결합된 보상 UV 맵의 미세 특징을 추출 및 복원함으로써 상기 UV 완성 맵을 획득하는 UV 맵 세부 추정부를 포함할 수 있다.The UV map completion unit combines the UV map and the mask, and estimates the reflection coefficient based on the horizontal symmetry of the face from the pattern of the UV map to which the mask is combined according to a pre-learned pattern estimation method. a UV map rough estimation unit that fills in the color of the hole area and extracts and fills the light and shade according to the illumination of the UV map including the hole area from the pattern of the UV map to which the mask is combined to obtain the compensated UV map; and a UV map detailed estimator that combines the compensation UV map and the mask, and obtains the UV complete map by extracting and restoring fine features of the compensation UV map to which the mask is combined according to a pre-learned pattern estimation method. have.

상기 UV 맵 완성부는 학습용 2D 얼굴 이미지와 상기 학습용 2D 얼굴 이미지에 대응하도록 미리 생성된 UV 맵인 기준 UV 맵이 매칭되어 포함된 학습 데이터를 이용하여, 상기 얼굴 모델 정렬부가 상기 3D 얼굴 모델을 상기 2D 얼굴 이미지에 정확하게 정렬시켰는지 여부를 상기 학습용 2D 얼굴 이미지로부터 획득된 UV 완성 맵인 학습 UV 완성 맵에 대해 기지정된 방식으로 계산되는 에너지로부터 생성 손실로 획득하고, 상기 UV 맵 완성부가 상기 학습 UV 완성 맵의 홀 영역을 정상적으로 채웠는지 여부를 마진이 고려된 상기 UV 완성 맵과 학습 UV 완성 맵 사이의 차로 계산하여 판별 손실을 획득하여, 상기 생성 손실과 상기 판별 손실이 저감되도록 상기 얼굴 모델 정렬부와 상기 UV 맵 완성부를 경쟁 학습 방식으로 학습시키는 에너지 판정부를 더 포함할 수 있다.The UV map completion unit uses the training data included by matching the 2D face image for training and the reference UV map that is a pre-generated UV map to correspond to the 2D face image for training, and the face model aligning unit converts the 3D face model to the 2D face Whether or not it is correctly aligned to the image is obtained as a generation loss from energy calculated in a predetermined manner for a learning UV completion map, which is a UV completion map obtained from the 2D face image for training, and the UV map completion unit of the training UV completion map The face model aligning unit and the UV to reduce the generation loss and the discrimination loss by calculating the difference between the UV completion map and the learning UV completion map in consideration of whether the hole area is normally filled or not It may further include an energy determination unit for learning the map completion unit in a competitive learning method.

상기 UV 맵 개략 추정부는 상기 UV 맵과 상기 마스크를 기지정된 방식으로 결합하여 출력하는 제1 마스크 결합부; 인공 신경망으로 구현되는 인코더와 디코더를 포함하는 오토 인코더로 구현되어 미리 학습된 패턴 추정 방식에 따라 마스크가 결합된 UV 맵의 패턴으로부터 얼굴의 수평 대칭성에 기반하는 반사 계수를 추정하여 홀 영역을 포함한 전 영역의 색상이 추정된 UV 맵인 알베도 맵을 획득하는 반사 추정부; 상기 반사 추정부의 인코더와 디코더 각각의 서로 대응하는 레이어에서 출력되는 특징맵을 인가받고, 인코더와 디코더 각각에서 인가된 특징맵을 기반으로 미리 학습된 패턴 추정 방식에 따라 상기 UV 맵의 조명 계수를 추정하여 홀 영역을 포함한 전 영역의 색상이 추정된 UV 맵인 조명 맵을 획득하는 조명 추정부; 및 상기 알베도 맵과 상기 조명 맵을 원소 곱하여 상기 보상 UV 맵을 획득하는 보상 UV 맵 획득부를 포함할 수 있다.The UV map rough estimation unit may include: a first mask combining unit for outputting the UV map and the mask by combining them in a predetermined manner; It is implemented as an auto-encoder including an encoder and a decoder implemented as an artificial neural network, and according to a pattern estimation method learned in advance, the reflection coefficient based on the horizontal symmetry of the face is estimated from the pattern of the UV map combined with the mask. a reflection estimator for obtaining an albedo map, which is a UV map from which the color of the region is estimated; The reflection estimator receives a feature map output from a layer corresponding to each of the encoder and decoder, and estimates the illumination coefficient of the UV map according to a pre-learned pattern estimation method based on the feature map applied from each encoder and decoder. a lighting estimator to obtain a lighting map that is a UV map in which the color of the entire area including the hole area is estimated; and a compensation UV map acquisition unit configured to obtain the compensation UV map by elementally multiplying the albedo map and the illumination map.

상기 UV 맵 세부 추정부는 상기 보상 UV 맵과 상기 마스크를 기지정된 방식으로 결합하여 출력하는 제2 마스크 결합부; 및 인공 신경망으로 구현되는 인코더와 디코더를 포함하는 오토 인코더로 구현되어 미리 학습된 패턴 추정 방식에 따라 마스크가 결합된 보상 UV 맵의 패턴으로부터 미세 특징을 추출 및 복원하여 상기 UV 완성 맵을 획득하는 미세 추정부를 포함할 수 있다.The UV map detailed estimator comprises: a second mask combiner for outputting the compensated UV map and the mask by combining the mask in a predetermined manner; And it is implemented as an auto encoder including an encoder and a decoder implemented as an artificial neural network and extracts and restores fine features from a pattern of a compensated UV map combined with a mask according to a pre-learned pattern estimation method to obtain the UV complete map. It may include an estimator.

상기 에너지 판정부는 UV 맵(T)에서 마스크(M)에 의해 지정되는 홀 영역을 제외한 유효 영역에서의 픽셀 단위의 이미지 손실을 나타내는 UV 맵 이미지 에너지(E_uv(T,M))를 수학식 _{The energy determining unit calculates the UV map image energy (E uv} (T,M)) representing the pixel-unit image loss in the effective area except for the hole area designated by the mask M in the UV map T

(여기서 Tc와 Tf는 각각 보상 UV 맵과 UV 완성 맵을 나타내며, ⊙는 원소 곱 연산자이다.)에 따라 계산하고, 상기 알베도 맵(A)이 얼굴의 수평 대칭성에 따라 홀 영역을 정상적으로 채워졌는지 여부를 나타내는 UV 맵 대칭 에너지(E_f(T,M))를 수학식(Where Tc and Tf represent the compensation UV map and the UV completion map, respectively, and ⊙ is the elemental multiplication operator.), whether the albedo map (A) normally fills the hole area according to the horizontal symmetry of the face The UV map symmetry energy (E _f (T,M)) representing

(여기서 flip()는 좌우 반전 함수이며,

이진 논리합(binary OR) 연산자이다.)에 따라 계산하며, 상기 UV 완성 맵의 세밀도를 나타내는 UV 맵 지각 에너지(E_p(T,M))를 수학식 (where flip() is a left-right flip function,

It is calculated according to the binary OR operator), and the UV map perception energy (E _p (T,M)) representing the granularity of the UV complete map is calculated according to the equation

(여기서 F는 n번째 선택된 레이어의 사전 훈련된 VGG(Visual Geometry Group) 얼굴 식별자이다.)에 따라 계산하여, 총 UV 에너지를 (Where F is the pre-trained Visual Geometry Group (VGG) face identifier of the nth selected layer), the total UV energy

(여기서 λ_uv, λ_p 및 λ_f 는 에너지 가중치이다.)로 계산할 수 있다.(where λ _uv , λ _p and λ _f are energy weights).

상기 에너지 판정부는 상기 얼굴 모델 정렬부가 상기 학습용 2D 얼굴 이미지로부터 3D 얼굴 모델을 정렬하여 획득한 UV 맵과 마스크인 학습 UV 맵(T_e)과 학습 마스크(M_e)에 대해 계산되는 총 학습 UV 에너지E(T_e, M_e)를 상기 생성 손실(L_G)로서 수학식 The energy determination unit is the total learning UV energy calculated for the _{training UV map (T e} ) and the training mask (M _e ) that are the UV map and mask obtained by the face model aligning unit aligning the 3D face model from the 2D face image for training E(T _e , M _e ) as the production loss (L _G )

과 같이 획득하고, 상기 기준 UV 맵(T_ref)과 상기 기준 UV 맵(T_ref)에 대응하는 기준 마스크(M_ref)에 대해 계산되는 총 기준 UV 에너지E(T_ref, M_ref)와 총 학습 UV 에너지E(T_e, M_e) 및 마진을 이용하여 상기 판별 손실(L_D)을 수학식 Obtained as and the standard UV map (T _ref) and the reference UV map (T _ref) Total reference is calculated for the reference mask (M _ref) corresponding to the UV energy E (T _ref, M _ref) and learn the total Using the UV energy E (T _e , M _e ) and the margin, the discriminant loss (L _D ) is calculated as

과 같이 획득할 수 있다.can be obtained as

상기 에너지 판정부는 상기 생성 손실(L_G)과 상기 판별 손실(L_D) 각각 또는 상기 생성 손실(L_G)과 상기 판별 손실(L_D)의 합이 기지정된 기준 손실 이하가 되도록 반복하여 역전파하거나, 기지정된 횟수로 반복하여 역전파하여 학습을 수행할 수 있다.The energy determining section the generation loss (L _G) and the determined loss (L _D), respectively, or the generation loss (L _G) and the sum is a group propagation repeatedly so that the reference loss below a specified station in the determined loss (L _D) Alternatively, the learning may be performed by repeating and backpropagating a predetermined number of times.

상기 다른 목적을 달성하기 위한 본 발명의 다른 실시예에 따른 3차원 얼굴 이미지 합성 방법은 3D 얼굴 이미지로 합성하고자 하는 2D 얼굴 이미지를 인가받아 미리 학습된 패턴 추정 방식에 따라 상기 2D 얼굴 이미지의 패턴으로부터 매칭 파라미터를 추출하여 기지정된 3D 얼굴 모델을 상기 2D 얼굴 이미지에 정렬시켜 UV 맵을 획득하는 단계; 및 미리 학습된 패턴 추정 방식에 따라 상기 2D 얼굴 이미지의 스캔되지 않은 영역에 의해 상기 UV 맵에 발생된 홀 영역을 얼굴의 수평 대칭성에 기반하여 추정하여 채워 보상 UV 맵을 획득하고, 보상 UV 맵의 미세 특징을 추출하여, 미세 특징이 반영된 UV 맵을 복원함으로써 UV 완성 맵을 획득하는 단계를 포함한다.In a 3D face image synthesis method according to another embodiment of the present invention for achieving the above other object, a 2D face image to be synthesized into a 3D face image is received, and the pattern of the 2D face image is obtained according to a pre-learned pattern estimation method. obtaining a UV map by extracting matching parameters and aligning a predetermined 3D face model with the 2D face image; and a hole area generated in the UV map by an unscanned area of the 2D face image according to a pre-learned pattern estimation method is estimated based on the horizontal symmetry of the face to obtain a compensation UV map, and a compensation UV map is obtained. and extracting the micro-features to obtain a UV complete map by restoring the UV map reflecting the micro-features.

따라서, 본 발명의 실시예에 따른 3차원 얼굴 이미지 합성 장치 및 방법은 랜드마크나 특징점과 같은 제한조건 없이 3D 얼굴 모델을 2D 얼굴 이미지에 정확하게 정렬시켜 UV 맵을 획득할 수 있고, 전체 얼굴에 대해 캡쳐된 2D 얼굴 이미지가 제공되지 않더라도 완성된 UV 맵의 에너지에 기반하여 학습되어 캡쳐되지 않은 폐색 영역까지 세밀하고 정확하게 복원할 수 있다.Therefore, the 3D face image synthesizing apparatus and method according to an embodiment of the present invention can obtain a UV map by accurately aligning a 3D face model to a 2D face image without limiting conditions such as landmarks or feature points, and for the entire face Even if the captured 2D face image is not provided, it can be learned based on the energy of the completed UV map and accurately and accurately restored even the occluded area that was not captured.

도 1은 본 발명의 일 실시예에 따른 3D 얼굴 이미지 합성 장치의 개략적 구조를 나타낸다.
도 2는 도 1의 3D 얼굴 이미지 합성 장치의 동작을 설명하기 위한 도면이다.
도 3은 인공신경망을 이용한 도 1의 UV 맵 완성부의 구현 예를 나타낸다.
도 4는 도 1의 얼굴 모델 정렬부와 UV 맵 완성부의 경쟁 학습에 의한 성능을 비교하여 나타낸 도면이다.
도 5는 도 1의 3D 얼굴 이미지 합성 장치에 입력되는 2D 얼굴 이미지가 3D 얼굴 이미지로 합성되면서 변화되는 과정의 일 예를 나타낸다.
도 6은 본 발명의 일 실시예에 따른 3D 얼굴 이미지 합성 방법을 나타낸다.1 shows a schematic structure of an apparatus for synthesizing a 3D face image according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining an operation of the 3D face image synthesizing apparatus of FIG. 1 .
3 shows an implementation example of the UV map completion unit of FIG. 1 using an artificial neural network.
4 is a view showing the comparison of the performance of the face model aligning unit and the UV map completion unit of FIG. 1 by competitive learning.
FIG. 5 shows an example of a process in which a 2D face image input to the 3D face image synthesizing apparatus of FIG. 1 is changed while being synthesized into a 3D face image.
6 shows a 3D face image synthesis method according to an embodiment of the present invention.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다. In order to fully understand the present invention, the operational advantages of the present invention, and the objects achieved by the practice of the present invention, reference should be made to the accompanying drawings illustrating preferred embodiments of the present invention and the contents described in the accompanying drawings.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써, 본 발명을 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다. Hereinafter, the present invention will be described in detail by describing preferred embodiments of the present invention with reference to the accompanying drawings. However, the present invention may be embodied in various different forms, and is not limited to the described embodiments. In addition, in order to clearly explain the present invention, parts irrelevant to the description are omitted, and the same reference numerals in the drawings indicate the same members.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈", "블록" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. Throughout the specification, when a part "includes" a certain component, it does not exclude other components, unless otherwise stated, meaning that other components may be further included. In addition, terms such as "...unit", "...group", "module", and "block" described in the specification mean a unit that processes at least one function or operation, which is hardware, software, or hardware. and a combination of software.

도 1은 본 발명의 일 실시예에 따른 3D 얼굴 이미지 합성 장치의 개략적 구조를 나타내고, 도 2는 도 1의 3D 얼굴 이미지 합성 장치의 동작을 설명하기 위한 도면이며, 도 3은 인공신경망을 이용한 도 1의 UV 맵 완성부의 구현 예를 나타낸다.1 shows a schematic structure of a 3D facial image synthesizing apparatus according to an embodiment of the present invention, FIG. 2 is a diagram for explaining the operation of the 3D facial image synthesizing apparatus of FIG. 1 , and FIG. 3 is a diagram using an artificial neural network An example of implementation of the UV map completion part of 1 is shown.

도 1을 참조하면, 본 실시예에 따른 3D 얼굴 이미지 합성 장치는 얼굴 이미지 획득부(100), 얼굴 모델 정렬부(200), UV 맵 완성부(300) 및 3D 영상 합성부(400)를 포함할 수 있다.Referring to FIG. 1 , the 3D face image synthesizing apparatus according to the present embodiment includes a face image acquisition unit 100 , a face model aligning unit 200 , a UV map completion unit 300 , and a 3D image synthesis unit 400 . can do.

얼굴 이미지 획득부(100)는 3D 얼굴 이미지를 생성하기 위한 2D 얼굴 이미지(I)를 획득한다. 즉 얼굴 이미지 획득부(100)는 3D 얼굴 이미지로 변환될 2D 얼굴 이미지(I)를 획득하기 위한 구성으로, 일 예로 카메라와 같은 비디오 영상 촬영 장치로 구현되거나, 미리 획득된 비디오 영상이 저장된 저장 장치 또는 네트워크를 통해 다른 기기로부터 비디오 영상을 인가받는 통신부 등으로 구현될 수 있다.The face image acquisition unit 100 acquires a 2D face image I for generating a 3D face image. That is, the face image acquisition unit 100 is configured to acquire a 2D face image I to be converted into a 3D face image, and is implemented as a video image capturing device such as a camera, for example, or a storage device in which a video image acquired in advance is stored. Alternatively, it may be implemented as a communication unit that receives a video image from another device through a network.

얼굴 모델 정렬부(200)는 미리 학습된 패턴 추정 방식에 따라 얼굴 이미지 획득부(100)가 획득한 2D 얼굴 이미지(I)에 대해 3D 얼굴 모델을 정렬시켜 맵핑하여 3D 얼굴 이미지를 획득하고, 획득된 3D 얼굴 이미지를 2차원의 UV 공간 상에 투사하여 UV 맵을 획득한다. 3D 여기서 3D 얼굴 모델은 3DMM에 기반하여 획득될 수 있다.The face model aligning unit 200 aligns and maps the 3D face model with respect to the 2D face image (I) obtained by the face image acquisition unit 100 according to the pre-learned pattern estimation method to obtain a 3D face image, and obtain The 3D face image is projected onto a two-dimensional UV space to obtain a UV map. 3D Here, a 3D face model may be obtained based on 3DMM.

얼굴 이미지 획득부(100)에서 획득된 2D 얼굴 이미지(I)는 카메라 등을 이용하여 특정 방향 및 위치에서 대상자의 얼굴을 캡쳐한 이미지이다. 그리고 대상자의 얼굴 형상 또한 개개인에 따라 서로 상이하다.The 2D face image I acquired by the face image acquisition unit 100 is an image obtained by capturing the subject's face in a specific direction and location using a camera or the like. In addition, the face shape of the subject is also different from each other.

이에 얼굴 모델 정렬부(200)는 2D 얼굴 이미지(I)를 3D 얼굴 모델에 맵핑하여 3D 얼굴 이미지로 변환하기 위해서는 우선 3D 얼굴 모델을 대상자의 얼굴 형상에 대응하는 형상으로 변형하고, 변형된 3D 얼굴 모델을 2D 얼굴 이미지가 캡쳐된 방향 및 위치에 대응하도록 회전 및 이동시켜 3D 얼굴 모델이 2D 얼굴 이미지(I)에 대응하여 정렬될 수 있도록 한다.Accordingly, the face model aligning unit 200 first transforms the 3D face model into a shape corresponding to the subject's face shape in order to map the 2D face image (I) to the 3D face model and convert it into a 3D face image, and then transform the 3D face The model is rotated and moved to correspond to the direction and position in which the 2D face image was captured so that the 3D face model can be aligned corresponding to the 2D face image (I).

얼굴 모델 정렬부(200)는 UV 인코더(210)와 UV 맵핑부(220)를 포함할 수 있다. UV 인코더(210)는 인공 신경망으로 구현되어 3D 얼굴 모델과 2D 얼굴 이미지(I)를 정렬시키기 위한 매칭 파라미터를 추출한다.The face model alignment unit 200 may include a UV encoder 210 and a UV mapping unit 220 . The UV encoder 210 is implemented as an artificial neural network to extract matching parameters for aligning the 3D face model and the 2D face image (I).

일 예로 UV 인코더(210)는 미리 학습된 패턴 추정 방식에 따라 2D 얼굴 이미지(I)를 인코딩하여 3D 얼굴 모델의 형상을 변형하기 위한 형상 파라미터(f_s)와 3D 얼굴 모델 방향 및 위치를 조절하기 위한 위치 파라미터(f_c)를 매칭 파라미터로 추출할 수 있다. 여기서 형상 파라미터(f_s)는 3D 얼굴 모델의 형상이 2D 얼굴 이미지(I)에 포함된 대상자의 얼굴 형상에 대응하도록 변형하기 위한 파라미터이다. 그리고 위치 파라미터(f_c)는 2D 얼굴 이미지(I)를 획득하기 위해 적용된 카메라 파라미터인 카메라의 회전(R), 평행 이동(t) 및 초점 길이(f)등에 대응하여, 3D 얼굴 모델의 위치 및 지향 방향을 조절하기 위한 파라미터이다.As an example, the UV encoder 210 encodes a 2D face image (I) according to a pre-learned pattern estimation method to adjust a shape parameter (f _s ) for transforming a shape of a 3D face model and a 3D face model direction and position. It is possible to extract the location parameter (f _{c ) for the matching parameter.} Here, the shape parameter f _s is a parameter for transforming the shape of the 3D face model to correspond to the face shape of the subject included in the 2D face image I. And the position parameter (f _c ) corresponds to the rotation (R), translation (t) and focal length (f) of the camera, which are camera parameters applied to obtain the 2D face image (I), the position of the 3D face model and This is a parameter for adjusting the orientation direction.

UV 맵핑부(220)는 UV 인코더(210)에서 획득된 매칭 파라미터 중 형상 파라미터(f_s)에 따라 3D 얼굴 모델의 외형을 변형하고, 위치 파라미터(f_c)에 따라 3D 얼굴 모델을 이동시켜 3D 얼굴 모델을 2D 얼굴 이미지(I)에 정렬시키고, 정렬된 2D 얼굴 이미지를 3D 얼굴 모델에 맵핑하여 2D 얼굴 이미지를 3D 얼굴 이미지로 변환한다.The UV mapping unit 220 transforms the appearance of the 3D face model according to the _{shape parameter f s} among the matching parameters obtained from the UV encoder 210 , and moves the 3D face model according to the _{position parameter f c to 3D} A face model is aligned to a 2D face image (I), and the 2D face image is converted into a 3D face image by mapping the aligned 2D face image to the 3D face model.

그리고 변환된 3D 얼굴 이미지를 2차원의 UV 공간 상에 전개(unwrap)하여, 3D 얼굴 이미지에 대한 2차원의 UV 맵(T)을 획득한다. 이때 UV 맵(T)은 3D 얼굴 모델의 3차원 좌표를 기반으로 3D 얼굴 이미지를 2차원의 UV 공간에 전개하여 획득된다.Then, the converted 3D face image is unwrapped in a two-dimensional UV space to obtain a two-dimensional UV map T for the 3D face image. At this time, the UV map T is obtained by developing a 3D face image in a two-dimensional UV space based on the three-dimensional coordinates of the 3D face model.

여기서 UV 맵(T)은 구형 전개(spherical unwrap)를 적용하여 메쉬 구조의 3D 얼굴 모델에서 각 꼭지점(F)의 3차원 좌표(X, Y, Z)를 수학식 1에 따라 UV 좌표계(v_uv = (u, v))의 좌표로 변환하여 획득될 수 있다.Here, the UV map (T) applies spherical unwrap to the three-dimensional coordinates (X, Y, Z) of each vertex (F) in the 3D face model of the mesh structure according to Equation 1 in the UV coordinate system (v _uv = (u, v)).

여기서 r은 구형 전개 시에 구의 반지름으로

이다.where r is the radius of the sphere when unfolding the sphere

am.

다만 상기한 바와 같이, 2D 얼굴 이미지(I)에서는 2D 이미지의 특성상 캡쳐된 영역만이 표현될 뿐, 대상자의 얼굴의 모든 영역이 표현되어 있지 않다. 따라서 캡쳐한 방향에 따라 대상자의 얼굴에서 캡쳐되지 않은 영역이 존재하며, 이러한 캡쳐되지 않은 폐색 영역은 2D 얼굴 이미지(I)를 3D 얼굴 모델에 맵핑하더라도 3D 얼굴 모델에 맵핑되지 않게 된다. 따라서 변환된 3D 얼굴 이미지로부터 UV 맵(T)을 획득하는 경우, 폐색 영역은 홀 영역으로 나타나게 된다.However, as described above, in the 2D face image I, only the captured area is expressed due to the characteristics of the 2D image, and not all areas of the subject's face are expressed. Therefore, there is an area that is not captured on the subject's face depending on the captured direction, and the uncaptured occluded area is not mapped to the 3D face model even if the 2D face image (I) is mapped to the 3D face model. So converted When the UV map T is obtained from the 3D face image, the occluded area appears as a hole area.

이에 마스크 획득부(230)는 UV 맵(T)에서 2D 얼굴 이미지(I)의 각 화소에 대한 얼굴 가시성(facial visibility)을 샘플링하여 홀 영역을 나타내는 마스크(M)를 생성할 수 있으며, 얼굴 가시성은 래스터화(rasterization)를 통해 3D 얼굴 모델을 이미지 평면에 투영함으로써 판별될 수 있다. 마스크(M)는 이진 마스크로서 일 예로 홀 영역이 0으로 채워지고 나머지 영역은 1로 채워지도록 생성될 수 있다.Accordingly, the mask acquirer 230 may generate a mask M indicating a hole area by sampling the facial visibility of each pixel of the 2D face image I from the UV map T, and the face visibility can be determined by projecting the 3D face model onto the image plane through rasterization. The mask M is a binary mask, and may be created such that, for example, the hole region is filled with 0 and the remaining region is filled with 1.

여기서 마스크 획득부(230)가 마스크(M)를 생성하는 것은 UV 맵 완성부(300)가 UV 맵의 홀 영역을 2D 얼굴 이미지(I)에 대응하는 질감(texture)으로 채울 때, 홀 영역에 의한 영향을 받지 않도록 하기 위함이다. Here, the mask acquisition unit 230 generates the mask M is generated in the hole region when the UV map completion unit 300 fills the hole region of the UV map with a texture corresponding to the 2D face image I. in order not to be affected by

도 2를 참조하여 얼굴 모델 정렬부(200)의 동작을 살펴보면, 도 2에서는 이해의 편의를 위하여, 3D 얼굴 모델이 2D 얼굴 이미지(I)에 정확하게 정렬되는 경우와 잘못 정렬되는 경우를 (a)와 (b)로 구분하여 표현하였다.Referring to the operation of the face model alignment unit 200 with reference to FIG. 2 , in FIG. 2 , for convenience of understanding, a case in which a 3D face model is correctly aligned with a 2D face image (I) and a case in which it is misaligned are (a) and (b).

이에 UV 인코더(210)가 정상적으로 학습되어 2D 얼굴 이미지(I)의 패턴을 정확하게 추정한 경우, UV 인코더(210)에서 추출한 매칭 파라미터(f_s, f_c)에 따라 3D 얼굴 모델을 변형하면, (a)에 도시된 바와 같이, 3D 얼굴 모델이 2D 얼굴 이미지(I)에서 나타난 대상자의 얼굴 형상과 지향 방향에 정확하게 정렬될 수 있다. 반면, UV 인코더(210)가 비정상적으로 학습된 경우, 매칭 파라미터(f_s, f_c)에 따라 3D 얼굴 모델을 변형하면, (b)에 도시된 바와 같이, 3D 얼굴 모델의 얼굴 형상이나 지향 방향이 2D 얼굴 이미지(I)의 대상자와 상이하게 정렬되게 된다.Accordingly, when the UV encoder 210 is normally trained to accurately estimate the pattern of the 2D face image (I), if the 3D face model is transformed according to the _{matching parameters (f s} , f _{c ) extracted from the UV encoder 210, (} As shown in a), the 3D face model can be precisely aligned with the face shape and orientation direction of the subject shown in the 2D face image (I). On the other hand, if the UV encoder 210 is abnormally trained, if _{the 3D face model is deformed according to the matching parameters (f s} , f _c ), as shown in (b), the face shape or orientation direction of the 3D face model It is aligned differently with the subject of this 2D face image (I).

그리고 (a)와 정상적으로 정렬된 UV 맵(T)에서는 얼굴 형태와 홀 영역이 정확한 위치에 표시되어 있는 반면, (b)와 비정상적으로 정렬된 UV 맵(T)에서는 2D 얼굴 이미지가 3D 얼굴 모델에서 잘못된 위치에 맵핑됨에 따라 UV 맵(T)에서 얼굴 형태가 비정상적으로 표현되었을 뿐만 아니라 홀 영역의 위치 또한 정상 정렬된 경우와 상이하게 나타나게 됨을 알 수 있다.And in (a) and the normally aligned UV map (T), the face shape and hole area are displayed at the correct location, whereas in (b) and the abnormally aligned UV map (T), the 2D face image is displayed in the 3D face model. It can be seen that not only the face shape was abnormally expressed on the UV map T as it was mapped to the wrong position, but also the position of the hole area was different from the case where it was normally aligned.

인공 신경망으로 구현되는 UV 인코더(210)를 포함하여 3D 얼굴 모델과 2D 얼굴 이미지(I)를 정렬시켜 UV 맵(T)을 획득하는 얼굴 모델 정렬부(200)는 정렬 네트워크(Alignment Network)라고도 할 수 있다.The face model aligning unit 200 that obtains the UV map T by aligning the 3D face model and the 2D face image I, including the UV encoder 210 implemented as an artificial neural network, is also referred to as an alignment network. can

UV 맵 완성부(300)는 얼굴 모델 정렬부(200)에서 획득된 UV 맵(T)을 인가받고, 미리 학습된 패턴 추정 방식에 따라 UV 맵(T)의 패턴을 추정하여 홀 영역에 질감을 채워 UV 맵을 완성한다.The UV map completion unit 300 receives the UV map T obtained from the face model alignment unit 200, estimates the pattern of the UV map T according to a pre-learned pattern estimation method, and applies a texture to the hole area. Fill in to complete the UV map.

UV 맵 완성부(300)는 UV 맵 개략 추정부(310) 및 UV 맵 세부 추정부(320)를 포함할 수 있다. UV 맵 개략 추정부(310)는 얼굴 모델 정렬부(200)에서 획득된 UV 맵(T)을 인가받아, 인가된 UV 맵(T)의 패턴으로부터 홀 영역에 대한 반사 계수(albedo)와 조명 계수(lighting coefficient)를 추정하여 UV 맵(T)에서 홀 영역의 색상 및 질감을 개략적으로 채움으로써 보상 UV 맵(T_c)을 획득한다.The UV map completion unit 300 may include a UV map rough estimation unit 310 and a UV map detailed estimation unit 320 . The UV map rough estimation unit 310 receives the UV map T obtained from the face model aligning unit 200, and from the pattern of the applied UV map T, the reflection coefficient (albedo) and illumination coefficient for the hole area _{A compensation UV map (T c} ) is obtained by estimating a lighting coefficient to roughly fill the color and texture of the hole region in the UV map (T).

UV 맵 개략 추정부(310)는 제1 마스크 결합부(311), 반사 추정부(312), 조명 추정부(313) 및 보상 UV 맵 획득부(314)를 포함할 수 있다.The rough UV map estimator 310 may include a first mask combiner 311 , a reflection estimator 312 , an illumination estimator 313 , and a compensated UV map acquirer 314 .

제1 마스크 결합부(311)는 UV 맵핑부(22)에서 획득된 UV 맵(T)과 마스크 획득부(230)에서 획득된 마스크(M)를 결합하여 반사 추정부(312)로 전달한다. 이때 제1 마스크 결합부(311)는 UV 맵(T)과 마스크(M)를 단순 결합(concatenate)하거나 UV 맵(T)과 마스크(M)를 원소 곱 연산하여 반사 추정부(312)로 전달할 수 있다. The first mask combining unit 311 combines the UV map T obtained by the UV mapping unit 22 and the mask M obtained by the mask obtaining unit 230 and transmits it to the reflection estimation unit 312 . At this time, the first mask combiner 311 simply concatenates the UV map T and the mask M, or performs elemental multiplication operation on the UV map T and the mask M to deliver it to the reflection estimator 312 . can

반사 추정부(312)는 패턴 추정 방식이 미리 학습된 인공 신경망으로 구현되어, 제1 마스크 결합부(311)에서 마스크(M)와 결합되어 인가되는 UV 맵(T)의 패턴을 추정하여, UV 맵(T)의 홀 영역의 반사 계수를 추정하여 채움으로써, 알베도 맵(A)을 획득한다.The reflection estimator 312 is implemented as an artificial neural network in which the pattern estimation method has been learned in advance, and the first mask combiner 311 estimates the pattern of the UV map T applied in combination with the mask M, By estimating and filling the reflection coefficient of the hole region of the map T, the albedo map A is obtained.

반사 추정부(312)는 도 3에 도시된 바와 같이, 서로 대칭되는 구조의 다수의 레이어를 포함하는 인코더(en1)와 디코더(de1)가 조합된 오토 인코더(auto-encoder) 구조로 구성될 수 있다. 인코더(en1)는 마스크(M)에 의해 지정되는 홀 영역을 제외한 나머지 영역에서의 특징을 추출하여 특징맵을 획득하고, 디코더(de1)는 인코더(en1)에서 획득된 특징맵으로부터 업샘플링하여 홀 영역을 채워 알베도 맵(A)을 획득한다. 여기서 알베도 맵(A)은 3D 얼굴 모델을 고려하여 UV 맵의 각 위치에 따른 반사율을 추정하여 홀 영역을 채운 2D 이미지로서, 조명에 의한 영향을 제거한 이미지로 추출될 수 있다. 여기서 반사 추정부(312)는 UV 맵(T)에서 사람 얼굴의 수평 대칭성에 따라 홀 영역을 추정하도록 학습될 수 있다. 즉 사람의 얼굴은 기본적으로 수평 대칭을 이루므로, 스캔되지 않은 얼굴 영역인 홀 영역은 스캔된 얼굴 영역으로부터 대칭적으로 추정되어 채워질 수 있다.As shown in FIG. 3 , the reflection estimator 312 may have an auto-encoder structure in which an encoder en1 and a decoder de1 including a plurality of layers having a structure symmetric to each other are combined. have. The encoder en1 obtains a feature map by extracting features from the remaining regions except for the hole region designated by the mask M, and the decoder de1 upsamples the hole from the feature map obtained from the encoder en1 Fill in the area to obtain an albedo map (A). Here, the albedo map (A) is a 2D image that fills the hole area by estimating the reflectance according to each position of the UV map in consideration of the 3D face model, and may be extracted as an image in which the effect of lighting is removed. Here, the reflection estimator 312 may be trained to estimate the hole area according to the horizontal symmetry of the human face in the UV map T. That is, since the human face is basically horizontally symmetrical, the hole region, which is an unscanned face region, may be symmetrically estimated and filled from the scanned face region.

이때, 인코더(en1)의 다수의 레이어는 게이트 콘볼루션 레이어(gated convolution layer)로 구성될 수 있으며, 디코더(de1)는 인코더(en1)에 대칭되는 구조로 구성될 수 있다.In this case, the plurality of layers of the encoder en1 may be configured as a gated convolution layer, and the decoder de1 may have a structure symmetrical to the encoder en1 .

한편, 조명 추정부(313)는 반사 추정부(312)의 인코더(en1)와 디코더(de1)의 다수의 레이어 중 미리 지정된 서로 대응하는 가운데 레이어들로부터 특징맵을 인가받고, 인가된 특징맵 사이의 패턴 차이를 기반으로 UV 맵의 조명 특징을 추출함으로써, 조명 맵(S)을 획득한다.On the other hand, the illumination estimator 313 receives a feature map from among the plurality of layers of the encoder en1 and the decoder de1 of the reflection estimator 312 and corresponds to each other specified in advance, and receives a feature map between the applied feature maps. By extracting the lighting features of the UV map based on the pattern difference of , the lighting map S is obtained.

조명 추정부(313)가 반사 추정부(312)와 별도로 UV 맵(T)에서 조명에 의한 특징을 추출하여 조명 맵(S)을 생성함에 따라 조명 맵(S)은 조명으로 인한 특징 차이를 더욱 강조하여 UV 맵(T)의 홀 영역이 채워지도록 할 수 있을 뿐만 아니라, 반사 추정부(312)가 알베도에 집중된 특징을 추출하여 UV 맵(T)의 홀 영역을 채울 수 있도록 할 수 있다.As the lighting estimator 313 generates the lighting map S by extracting the lighting features from the UV map T separately from the reflection estimating part 312, the lighting map S further increases the difference in features due to lighting. By emphasizing it, the hole region of the UV map T can be filled, and the reflection estimator 312 can extract the albedo-focused feature to fill the hole region of the UV map T.

여기서 알베도 맵(A)은 UV 맵(T)의 홀 영역에 대한 색상을 채운 이미지로 볼 수 있으며, 조명 맵(S)은 UV 맵(T)의 홀 영역에 대한 명암을 채운 이미지로 볼 수 있다.Here, the albedo map (A) can be viewed as an image filled with the color of the hole area of the UV map (T), and the illumination map (S) can be viewed as an image filled with the contrast of the hole area of the UV map (T) .

보상 UV 맵 획득부(314)는 반사 추정부(312)에서 추정된 알베도 맵(A)과 조명 추정부(313)에서 추정된 조명 맵(S)을 결합하여 보상 UV 맵(T_c)을 획득한다. 보상 UV 맵 획득부(314)는 일 예로 수학식 2와 같이, 알베도 맵(A)과 조명 맵(S)의 픽셀간 원소 곱에 의해 보상 UV 맵(T_c)을 획득할 수 있다.The compensated UV map acquisition unit 314 combines the albedo map A estimated by the reflection estimator 312 and the illumination map S estimated by the illumination estimator 313 to obtain _{a compensation UV map T c .} do. _{The compensation UV map acquisition unit 314 may acquire the compensation UV map T c} by elemental product between the pixels of the albedo map A and the illumination map S as shown in Equation 2, for example.

여기서 ⊙는 원소 곱 연산자를 나타낸다.Here, ⊙ denotes an elemental multiplication operator.

일반적으로 사람의 얼굴 질감은 알베도와 조명에 의한 명암에 의해 표현될 수 있으며, 명암은 조명과 얼굴 표면 형상의 법선에 의해 결정될 수 있다. 따라서 UV 맵(T)에서 홀 영역의 각 픽셀 위치에 알베도와 명암을 적용하면, 홀 영역에 대한 질감이 개략적(coarse)으로 채워진 보상 UV 맵(T_c)을 획득할 수 있다.In general, the texture of a person's face can be expressed by albedo and light and dark, and the contrast can be determined by lighting and the normal of the shape of the face. Accordingly, if albedo and contrast are applied to each pixel position of the hole region in the UV map T, a compensation UV map T _c in which the texture for the hole region is coarsely filled can be obtained.

인공 신경망으로 구현되는 반사 추정부(312), 조명 추정부(313)를 포함하여, 홀 영역에 대한 질감이 개략적으로 채워진 보상 UV 맵(T_c)을 획득하는 UV 맵 개략 추정부(310)를 개략적 네트워크(coarse network)라고 할 수 있다.A UV map rough estimation unit 310 for obtaining a _{compensation UV map (T c} ) in which the texture for the hole area is roughly filled, including the reflection estimation unit 312 and the lighting estimation unit 313 implemented as an artificial neural network. It can be called a coarse network.

UV 맵 세부 추정부(320)는 UV 맵 개략 추정부(310)에서 개략적으로 채워진 보상 UV 맵(T_c)을 인가받고, 미리 학습된 패턴 추정 방식에 따라 보상 UV 맵(T_c)의 미세 특징을 추출하고, 추출된 미세 특징을 복원함으로써, 보상 UV 맵(T_c)보다 정밀한 질감을 갖는 UV 완성 맵(T_f)을 획득한다.The UV map detailed estimator 320 receives the compensation UV map T _c , which is roughly filled in from the UV map coarse estimator 310 , and fine features _{of the compensation UV map T c} according to a pre-learned pattern estimation method. By extracting and reconstructing the extracted fine features, a UV completion map (T _f ) with _{a more precise texture than the compensation UV map (T c ) is obtained.}

도 3에 도시된 바와 같이, UV 맵 세부 추정부(320)는 제2 마스크 결합부(321)와 미세 추정부(322)를 포함할 수 있다.3 , the UV map detailed estimator 320 may include a second mask combiner 321 and a fine estimator 322 .

제2 마스크 결합부(321)는 보상 UV 맵(T_c)과 마스크(M)를 인가받아 결합한다. 이때 제2 마스크 결합부(321)는 보상 UV 맵(T_c)과 마스크(M)를 단순 결합하거나 UV 맵(T)과 마스크(M)를 원소 곱 연산하여 결합할 수 있다.The second mask coupling unit 321 receives and combines the compensation UV map T _c and the mask M. In this case, the second mask coupling unit 321 may _{simply combine the compensation UV map T c} and the mask M or combine the UV map T and the mask M by elemental multiplication operation.

미세 추정부(322)는 UV 맵 개략 추정부(310)의 반사 추정부(312)와 유사하게 서로 대칭되는 구조의 다수의 레이어를 포함하는 인코더(en2)와 디코더(de2)가 조합된 오토 인코더로 구성될 수 있으나, UV 맵 개략 추정부(310)에 비해 더 정밀한 특징을 추출하고 복원할 수 있도록 더 많은 레이어를 포함하도록 구성될 수 있다. 그리고 인코더(en2)의 다수의 레이어도 게이트 콘볼루션 레이어로 구성될 수 있으며, 디코더(de2)는 인코더(en2)에 대칭되는 구조로 구성될 수 있다. 다만, 서로 대칭되는 구조의 인코더(en2)와 디코더(de2)의 조합으로 구성되는 UV 맵 세부 추정부(320)에서 가운데 위치하는 기지정된 개수의 레이어는 팽창된 게이트 콘볼루션 레이어(Dilated Gated Convolution)로 구성되어, 얼굴 표현의 정밀도를 개선하기 위한 다양한 수용 영역의 특징들을 추출할 수 있도록 한다.Similar to the reflection estimator 312 of the UV map coarse estimator 310 , the fine estimator 322 is an auto-encoder in which an encoder en2 and a decoder de2 including a plurality of layers having a structure symmetric to each other are combined. may be configured, but may be configured to include more layers to extract and restore more precise features compared to the UV map rough estimation unit 310 . In addition, a plurality of layers of the encoder en2 may also be configured as gate convolutional layers, and the decoder de2 may have a symmetrical structure to the encoder en2 . However, in the UV map detailed estimator 320 composed of a combination of the encoder en2 and the decoder de2 having a symmetric structure, the predetermined number of layers located in the center is a dilated gated convolution layer. , so that features of various receptive areas can be extracted to improve the precision of facial expression.

상기한 바와 같이, UV 맵 완성부(300)가 UV 맵의 홀 영역에 대한 질감을 개략적으로 보완하여 UV 맵 개략 추정부(310)와 세밀한 질감으로 보상하는 UV 맵 세부 추정부(320)를 구분하여 구비함으로써, 본 실시예에 따른 3D 얼굴 이미지 합성 장치는 고품질로 완성되는 UV 맵을 획득할 수 있다.As described above, the UV map completion unit 300 roughly supplements the texture of the hole region of the UV map to distinguish the UV map coarse estimator 310 and the UV map detailed estimator 320 that compensates for the fine texture. By providing this, the 3D face image synthesizing apparatus according to the present embodiment can obtain a high-quality UV map.

한편 에너지 판정부(330)는 3D 얼굴 이미지 합성 장치에서 인공 신경망으로 구성되는 얼굴 모델 정렬부(200)의 UV 인코더(210)와, UV 맵 완성부(300)의 반사 추정부(312), 조명 추정부(313) 및 미세 추정부(322)를 학습시키기 위한 구성이다.Meanwhile, the energy determining unit 330 includes the UV encoder 210 of the face model aligning unit 200 composed of an artificial neural network in the 3D face image synthesizing device, the reflection estimating unit 312 of the UV map completion unit 300, and lighting. It is a configuration for learning the estimator 313 and the fine estimator 322 .

인공 신경망으로 구성되는 UV 인코더(210)와 반사 추정부(312), 조명 추정부(313) 및 미세 추정부(322)는 패턴 추정 방식이 미리 학습되어야 한다. UV 인코더(210)의 경우, 수학적 연산을 통해 3D 얼굴 모델을 2D 얼굴 이미지에 정렬하는 기존의 3D 얼굴 정렬 기법에 따라 대량의 학습 데이터를 획득할 수 있다. 비록 수학적 연산 방식으로 3D 얼굴을 2D 얼굴 이미지에 정렬하는 기법은 복잡한 연산 방식을 이용하므로 학습 데이터를 획득하는 것이 용이하지는 않지만, UV 인코더(210)를 학습시키기 위해 요구되는 개수의 학습 데이터를 획득할 수는 있다.The UV encoder 210 , the reflection estimator 312 , the illumination estimator 313 , and the fine estimator 322 composed of an artificial neural network must learn a pattern estimation method in advance. In the case of the UV encoder 210 , a large amount of training data may be acquired according to the existing 3D face alignment technique of aligning a 3D face model to a 2D face image through mathematical operation. Although the technique of aligning a 3D face to a 2D face image by a mathematical operation method uses a complex operation method, it is not easy to obtain training data, but the number of training data required to train the UV encoder 210 can be obtained. can be

그리고 학습 데이터를 이용하여 UV 인코더(210)를 학습시키고자 하는 경우, UV 인코더(210)에 의한 UV 맵(T)의 손실(L_uv)은 수학식 3과 같이 계산될 수 있다.And when it is desired to learn the UV encoder 210 using the learning data, the loss (L _uv ) of the UV map T by the UV encoder 210 may be calculated as in Equation 3 .

여기서 T_ref 는 학습 데이터에 대해 수학적 연산 방식으로 획득된 학습 UV 맵을 나타내고, T_e 는 학습 데이터에 대해 얼굴 모델 정렬부(200)에서 출력되는 UV 맵을 나타내며, M_ref 는 학습 데이터에서 홀 영역을 나타내는 마스크를 의미한다. 그리고 ⊙는 원소 곱 연산자를 나타낸다.Here, T _ref denotes a training UV map obtained by a mathematical operation method for _{the training data, T e} denotes a UV map output from the face model aligning unit 200 _{for the training data, and M ref} denotes a hole region in the training data. means a mask representing And ⊙ denotes an elemental multiplication operator.

수학식 3에 따르면, UV 인코더(210)는 마스크(M_ref)에 의해 지정되는 홀 영역을 제외한 나머지 영역에서 학습 UV 맵(T_ref)과 UV 맵(T_e) 사이의 차이로 계산되는 손실(L_uv)이 최소가 되도록 학습이 수행될 수 있다.Loss is calculated as the difference between the mathematical according to formula 3, UV encoder 210 is a mask (M _ref) learning UV map the remaining area other than the hole area indicated by a (T _ref) and the UV map (T _e) ( Learning may be performed such that L _{uv ) is minimized.}

그러나 UV 맵(T)의 홀 영역은 2D 얼굴 이미지(I)에서 캡쳐되지 않아 질감 정보가 획득되지 않는 영역이다. 따라서 UV 맵 완성부(300)의 반사 추정부(312), 조명 추정부(313) 및 미세 추정부(322)를 학습시키기 위한 대상자의 얼굴의 모든 영역이 캡쳐된 2D 얼굴 이미지가 학습 데이터로 이용되어야 하므로, 학습 데이터를 획득하기가 용이하지 않다. 즉 UV 맵 완성부(300)를 학습시키기 위한 대량의 학습 데이터를 획득하기 어렵다.However, the hole area of the UV map T is not captured in the 2D face image I and thus texture information is not obtained. Therefore, the 2D face image in which all areas of the subject's face are captured for learning the reflection estimator 312, the lighting estimator 313, and the fine estimator 322 of the UV map completion unit 300 is used as learning data. Therefore, it is not easy to obtain training data. That is, it is difficult to obtain a large amount of learning data for learning the UV map completion unit 300 .

이에 본 실시예에서는 에너지 판정부(330)를 포함하여, 얼굴 모델 정렬부(200)와 UV 맵 완성부(300)를 UV 에너지에 기반하는 생성적 적대 신경망(Generative Adversarial Network: 이하 GAN) 구조로 인식하여 얼굴 모델 정렬부(200)와 UV 맵 완성부(300) 사이에 경쟁적 학습이 수행되도록 한다.Accordingly, in this embodiment, including the energy determination unit 330, the face model aligning unit 200 and the UV map completion unit 300 are configured as a generative adversarial network (GAN) structure based on UV energy. By recognizing, competitive learning is performed between the face model alignment unit 200 and the UV map completion unit 300 .

이는 얼굴 모델 정렬부(200)가 3D 얼굴 모델을 2D 얼굴 이미지(I)에 정확하게 정렬하면, UV 맵 완성부(300)가 더욱 고품질의 UV 맵을 완성하게 된다는 개념을 이용한 것이다.This is based on the concept that when the face model aligning unit 200 accurately aligns the 3D face model to the 2D face image I, the UV map completion unit 300 completes a higher quality UV map.

이를 위해서는 우선 UV 맵 완성부(300)의 손실을 정의할 필요가 있다. 이에 UV 맵 완성부(300)에서 반사 추정부(312), 조명 추정부(313) 및 미세 추정부(322)에 의한 손실을 에너지로서 계산하며, 먼저 인가되는 UV 맵(T)에서 마스크(M)에 의해 지정되는 홀 영역을 제외한 유효 영역에서의 픽셀 단위의 이미지 손실(E_uv(T,M))을 고려할 수 있다. UV 맵(T)에서 마스크(M)에 의한 홀 영역을 제외한 나머지 영역은 2D 얼굴 이미지로부터 획득된 질감이므로, 2D 얼굴 이미지의 질감이 최대한 유지되어야 하므로, UV 맵 이미지 에너지(E_uv(T,M))는 수학식 4에 따라 정의할 수 있다.To this end, it is necessary to first define the loss of the UV map completion unit 300 . Accordingly, the UV map completion unit 300 calculates the loss by the reflection estimator 312 , the illumination estimator 313 , and the fine estimator 322 as energy, and in the UV map T applied first, the mask M _{The image loss (E uv} (T,M)) in pixels in the effective area excluding the hole area designated by ) can be considered. In the UV map (T), the remaining area except the hole area by the mask (M) is a texture obtained from the 2D face image, so the texture of the 2D face image should be maintained as much as possible, so the UV map image energy (E _uv (T, M) )) can be defined according to Equation (4).

그리고 반사 추정부(312)가 사람 얼굴의 수평 대칭성에 따라 홀 영역을 정상적으로 채워 알배도 맵(A)을 획득하였는지 여부를 나타내는 UV 맵 대칭 에너지(E_f(T,M))를 수학식 5에 따라 정의할 수 있다. _{And the UV map symmetric energy (E f} (T,M)) indicating whether the reflection estimator 312 obtained the albedo map (A) by normally filling the hole region according to the horizontal symmetry of the human face is expressed in Equation 5 can be defined accordingly.

여기서 flip()는 좌우 반전 함수이며,

이진 논리합(binary OR) 연산자이다.where flip() is a left-right flip function,

It is a binary OR operator.

한편, 미세 추정부(322)가 UV 맵을 세밀하게 완성하였는지 여부를 나타내는 UV 맵 지각 에너지(E_p(T,M))가 수학식 5에 따라 정의될 수 있다. _{Meanwhile, the UV map perception energy E p} (T,M) indicating whether the fine estimator 322 has completed the UV map in detail may be defined according to Equation 5.

여기서 F는 n번째 선택된 레이어의 사전 훈련된 VGG(Visual Geometry Group) 얼굴 식별자이다.where F is the pre-trained Visual Geometry Group (VGG) face identifier of the nth selected layer.

따라서 UV 맵 완성부(300)의 총 UV 에너지는 수학식 4 내지 6에서 계산된 에너지들을 결합하여, 수학식 7로 획득될 수 있다.Therefore, the total UV energy of the UV map completion unit 300 may be obtained by Equation 7 by combining the energies calculated in Equations 4 to 6 .

여기서 λ_uv, λ_p 및 λ_f 는 에너지 가중치이다.where λ _uv , λ _p and λ _f are energy weights.

얼굴 모델 정렬부(200)와 UV 맵 완성부(300)를 GAN 구조로 인식하여, 경쟁적 학습이 수행되도록 하는 경우, 얼굴 모델 정렬부(200)를 생성기(generator)로 가정하고, 에너지 판정부(330)를 포함하는 UV 맵 완성부(300)를 판별기(discriminator)로 고려하여 생성 손실(L_G)과 판별 손실(L_D)을 수학식 7에 기반하여 수학식 8 및 9와 같이 획득할 수 있다.When the face model alignment unit 200 and the UV map completion unit 300 are recognized as a GAN structure and competitive learning is performed, the face model alignment unit 200 is assumed as a generator, and the energy determination unit ( 330) by considering the UV map completion unit 300 as a discriminator to obtain the generation loss ( _LG ) and the discrimination loss (L _D ) as shown in Equations 8 and 9 based on Equation 7 can

여기서 [·]₊ = max(0, ·)로서 0과 · 중 큰 값을 선택하는 함수이고, m은 양의 마진값을 나타낸다. 그리고 얼굴 모델 정렬부(200)의 초기 생성 손실(L_G')은 수학식 3의 UV 맵(T)의 손실(L_uv)과 수학식 9의 판별 손실(L_D)로부터 수학식 10으로 정의될 수 있다.Here, [·] ₊ = max(0, ·), which is a function that selects the larger value between 0 and ·, and m represents a positive margin value. And the initial generation loss (L _G ') of the face model aligning unit 200 is defined by Equation 10 from _{the loss (L uv} ) of the UV map (T) of Equation 3 and the discrimination loss (L _{D ) of Equation 9} can be

여기서 λ_D 는 판별 손실 가중치이다.where λ _D is the discriminant loss weight.

에너지 판정부(330)는 수학식 8 내지 10에 따라 손실을 계산하고, 계산된 손실을 역전파하여 얼굴 모델 정렬부(200)와 UV 맵 완성부(300)를 경쟁 학습시킨다. 이때 에너지 판정부(330)는 기지정된 횟수 또는 계산된 손실이 기지정된 기준값 이하가 될 때까지 반복적으로 손실을 계산하여 역전파하여 얼굴 모델 정렬부(200)와 UV 맵 완성부(300)를 학습시킬 수 있다.The energy determination unit 330 calculates a loss according to Equations 8 to 10, and backpropagates the calculated loss to compete for the face model aligning unit 200 and the UV map completion unit 300 to learn. At this time, the energy determination unit 330 repeatedly calculates and backpropagates the loss until a predetermined number of times or the calculated loss becomes less than or equal to a predetermined reference value to learn the face model alignment unit 200 and the UV map completion unit 300 . can do it

본 실시예에 따른 3D 얼굴 이미지 합성 장치는 얼굴 모델 정렬부(200)는 3D 얼굴 모델에 따른 UV 맵을 생성하는 생성기로 볼 수 있으며, UV 맵 완성부(300)는 3D 얼굴 모델이 2D 얼굴 이미지에 정확하게 정렬되어 UV 맵이 정상적으로 생성되었는지를 판별하기 위해 UV 맵의 홀 영역을 채워 손실을 확인하는 판별기로 볼 수 있으며, 이는 GAN에 대응하는 구조로서 생성기와 판별기가 경쟁 학습을 수행하는 것으로 볼 수 있다.In the 3D face image synthesizing apparatus according to the present embodiment, the face model aligning unit 200 can be viewed as a generator that generates a UV map according to the 3D face model, and the UV map completion unit 300 is configured to convert the 3D face model into a 2D face image. It can be viewed as a discriminator that checks the loss by filling the hole area of the UV map to determine whether the UV map is normally generated by being accurately aligned with the GAN. have.

여기서 에너지 판정부(330)는 학습이 완료된 이후에는 제거될 수도 있다.Here, the energy determination unit 330 may be removed after learning is completed.

3D 이미지 합성부(400)는 UV 맵 완성부(300)에서 출력되는 UV 완성 맵(T_f)을 얼굴 모델 정렬부(200)에서 형상 파라미터(f_s)에 따라 형상 변형된 3D 얼굴 모델에 맵핑하여, 3D 얼굴 이미지를 합성한다. 여기서 UV 완성 맵(T_f)은 얼굴 전체에 대한 이미지가 포함되고, 3D 얼굴 모델의 3차원 좌표에 대응하는 UV 좌표계 상의 좌표값을 가지므로, 3D 얼굴 모델에 맵핑하는 경우, 홀 영역이 없는 완전한 3D 얼굴 이미지로 합성될 수 있다. 따라서 홀 영역이 없는 합성된 3D 얼굴 이미지는 사용자의 요청에 따라 도 2의 우측에 도시된 바와 같이, 2D 얼굴 이미지에서 대상자의 지향 방향에 무관하게 자유롭게 지향 방향을 조절할 수 있다.The 3D image synthesizing unit 400 maps the UV complete map (T _f ) output from the UV map complete unit ( 300 ) to the 3D face model deformed according _{to the shape parameter ( f s} ) in the face model aligning unit ( 200 ). Thus, a 3D face image is synthesized. Here, the UV completion map (T _f ) includes the image of the entire face and has coordinate values on the UV coordinate system corresponding to the three-dimensional coordinates of the 3D face model. It can be synthesized into a 3D face image. Accordingly, the orientation direction of the synthesized 3D face image without the hole region can be freely adjusted regardless of the orientation direction of the subject in the 2D face image as shown on the right side of FIG. 2 according to the user's request.

일반적으로 3D 얼굴 이미지를 합성하기 위해 요구되는 정보는 UV 완성 맵(T_f)이고, UV 완성 맵(T_f)으로부터 실제 합성된 3D 얼굴 이미지를 획득하는 것은 실제 응용에서 요구되는 사항이므로, 3D 이미지 합성부(400)는 3D 얼굴 이미지 합성 장치에서 생략될 수도 있다.In general, the information required for synthesizing a 3D face image is a UV completion map (T _f ), and obtaining an actual synthesized 3D face image from the UV completion map (T _f ) is a requirement in practical applications, so the 3D image The synthesizing unit 400 may be omitted from the 3D face image synthesizing apparatus.

도 4는 도 1의 얼굴 모델 정렬부와 UV 맵 완성부의 경쟁 학습에 의한 성능을 비교하여 나타낸 도면이다.4 is a view showing the comparison of the performance of the face model aligning unit and the UV map completion unit of FIG. 1 by competitive learning.

도 4에서 (a)는 입력되는 2D 얼굴 이미지를 나타내고, (b)는 얼굴 모델 정렬부(200)에서 출력되는 UV 맵(T)을 나타내며, (c)는 경쟁 학습 방식을 수행하지 않고 오토 인코더(auto-encoder)를 이용하여 UV 맵(T)을 보완한 결과를 나타내며, (d)는 본 실시예에 따라 경쟁 학습을 수행한 UV 맵 완성부(300)가 UV 맵(T)을 보완한 UV 완성 맵(T_f)을 나타낸다.4, (a) shows an input 2D face image, (b) shows a UV map (T) output from the face model aligner 200, (c) shows an auto-encoder without performing a competitive learning method Shows the result of supplementing the UV map (T) using (auto-encoder), (d) shows that the UV map completion unit 300, which has performed competitive learning according to this embodiment, supplements the UV map (T) Shows the UV completion map (T _f ).

도 4에서 (c)와 (d)를 비교하면, (c)에서는 홀 영역이 대상자의 얼굴과 무관한 질감으로 채워진 반면, 본 실시예에 따라 경쟁 학습이 수행된 UV 맵 완성부(300)에서 출력되는 UV 완성 맵(T_f)의 경우, 대상자의 얼굴 형상에 따른 자연스러운 질감으로 홀 영역이 채워졌음을 알 수 있다.Comparing (c) and (d) in FIG. 4 , in (c), the hole area is filled with a texture irrelevant to the subject's face, whereas in the UV map completion unit 300 on which competitive learning is performed according to the present embodiment. In the case of the output UV completion map (T _f ), it can be seen that the hole area is filled with a natural texture according to the face shape of the subject.

도 5는 도 1의 3D 얼굴 이미지 합성 장치에 입력되는 2D 얼굴 이미지가 3D 얼굴 이미지로 합성되면서 변화되는 과정의 일 예를 나타낸다.FIG. 5 shows an example of a process in which a 2D face image input to the 3D face image synthesizing apparatus of FIG. 1 is changed while being synthesized into a 3D face image.

도 5에서 (a)는 입력되는 2D 얼굴 이미지이고, (b)는 얼굴 모델 정렬부(200)에서 출력되는 UV 맵(T_e)을 나타내고, (c)는 조명 추정부(313)에서 추정된 조명 맵(S)을 나타내며, (d)는 반사 추정부(312)에서 추정된 알베도 맵(A)을 나타낸다. 그리고 (e)는 보상 UV 맵 획득부(314)에서 획득된 보상 UV 맵(T_c)을 나타내고, (f)는 UV 맵 세부 추정부(320)에서 추정된 UV 완성 맵(T_f)을 나타낸다.In FIG. 5 , (a) is an input 2D face image, (b) is a UV map (T _e ) output from the face model aligning unit 200 , (c) is estimated by the lighting estimator 313 . An illumination map S is shown, and (d) is an albedo map A estimated by the reflection estimation unit 312 . And (e) shows the compensated UV map (T _c ) obtained by the compensated UV map acquisition unit 314 , (f) shows the UV complete map (T _f ) estimated by the UV map detailed estimator 320 . .

(g)와 (h)는 각각 조명 맵(S)과 UV 완성 맵(T_f)을 3D 얼굴 모델에 맵핑하여 획득된 3D 얼굴 이미지를 나타낸다.(g) and (h) show 3D face images obtained by mapping the illumination map (S) and the UV completion map (T _f ) to the 3D face model, respectively.

(h)에 도시된 바와 같이, 본 실시예에 따라 합성된 3D 얼굴 이미지는 남성 또는 여성과 같이 서로 상이한 2D 얼굴 이미지에 대해서도 고품질의 3D 얼굴 이미지를 합성할 수 있을 뿐만 아니라, 흑백 이미지에 대해서도 3D 얼굴 이미지를 합성할 수 있음을 알 수 있다.As shown in (h), the 3D face image synthesized according to this embodiment can not only synthesize high-quality 3D face images for different 2D face images such as men or women, but also 3D face images for black and white images. It can be seen that face images can be synthesized.

도 6은 본 발명의 일 실시예에 따른 3D 얼굴 이미지 합성 방법을 나타낸다.6 shows a 3D face image synthesis method according to an embodiment of the present invention.

도 1 내지 도 3을 참조하여, 도 6의 3D 얼굴 이미지 합성 방법을 설명하면, 3D 얼굴 이미지 합성 장치에 포함된 인공 신경망을 학습시키기 위한 학습 단계(S10)를 수행한다. 학습 단계에서는 먼저 학습용 2D 얼굴 이미지와 학습용 2D 얼굴 이미지에 대응하는 기준 UV 맵(T_ref)을 포함하는 학습 데이터를 획득한다(S11). 여기서 기준 UV 맵(T_ref)은 수학적 연산을 통해 2D 얼굴 이미지에 대해 정렬된 3D 얼굴 모델에 학습용 2D 얼굴 이미지를 맵핑하여 2차원의 UV 공간 좌표 상에 전개하여 획득된 UV 맵이다.Referring to FIGS. 1 to 3 , the 3D face image synthesizing method of FIG. 6 is described. A learning step S10 for learning the artificial neural network included in the 3D face image synthesizing apparatus is performed. In the learning step, first, learning data including a 2D face image for training and a reference UV map (T _ref ) corresponding to the 2D face image for training is acquired ( S11 ). Here, the reference UV map (T _ref ) is a UV map obtained by mapping a 2D face image for training to a 3D face model aligned with respect to the 2D face image through mathematical operation and developing it on two-dimensional UV spatial coordinates.

학습 데이터가 획득되면, 학습용 2D 얼굴 이미지를 인가받아 학습되는 패턴 추정 방식에 따라 학습용 2D 얼굴 이미지의 패턴을 추정하여, 매칭 파라미터(f_s, f_c)를 추출하고, 추출된 매칭 파라미터(f_s, f_c)에 따라 기지정된 3D 얼굴 모델의 형상과 위치 및 지향 방향을 조절하고, 조절된 3D 얼굴 모델에 학습용 2D 얼굴 이미지를 맵핑하여 학습 UV 맵(T_e)을 획득한다(S12). 이때, 학습용 2D 얼굴 이미지의 얼굴 가시성을 샘플링하여 대상자의 얼굴에서 스캔되지 않은 홀 영역을 나타내는 학습 마스크(M_e)를 함께 획득할 수 있다.When the training data is obtained, the pattern of the 2D face image for training is estimated according to the pattern estimation method that is learned by receiving the 2D face image for training, and the matching parameters (f _s , f _c ) are extracted, and the extracted matching parameters (f _s) , f _c ), the shape, position, and orientation of a predetermined 3D face model are adjusted, and a 2D face image for training is mapped to the adjusted 3D face model _{to obtain a training UV map (T e} ) (S12). In this case, by sampling the facial visibility of the 2D face image for training, a learning mask (M _e ) indicating an unscanned hole area in the subject's face may be acquired together.

그리고 학습 UV 맵(T_e)과 학습 마스크(M_e)를 이용하여 학습되는 패턴 추정 방식에 따라 얼굴의 대칭성에 기반하여 반사 계수에 따른 홀 영역이 추정된 알베도 맵(A)과 조명 계수에 따라 추정되는 조명 맵(S)을 추정하여 학습 UV 맵(T_e)을 개략적으로 보완한 보상 UV 맵(T_ec)을 획득하고, 보상 UV 맵(T_ec)에 대해 다시 학습되는 패턴 추정 방식에 따라 정밀한 질감을 갖도록 보완하여 학습 UV 완성 맵(T_ef)을 획득한다(S13).And according to the pattern estimation method learned using the learning UV map (T _e ) and the learning mask (M _e ), the hole area according to the reflection coefficient is estimated based on the symmetry of the face according to the albedo map (A) and the illumination coefficient By estimating the estimated illumination map (S), a compensation UV map (T _ec _{) is obtained that schematically supplements the training UV map (T e} ), and according to the pattern estimation method that is learned again for the compensation UV map (T _{ec )} By supplementing it to have a precise texture, a learning UV completion map (T _ef ) is obtained (S13).

이후 학습 UV 완성 맵(T_ef)의 UV 맵 이미지 에너지(E_uv(T_e,M_e)), UV 맵 대칭 에너지(E_f(T_e,M_e)) 및 UV 맵 지각 에너지(E_p(T_e,M_e))에 기반하여 학습 UV 완성 맵(T_ef)의 총 UV 에너지(E(T_e,M_e))를 생성 손실(L_G)로 획득하고, 학습 UV 완성 맵(T_ef)의 총 UV 에너지(E(T_e,M_e))와 기준 UV 맵(T_ref)에 대한 총 UV 에너지(E(T_ref,M_ref)) 사이의 관계에 따라 마진(m)을 포함하는 판별 손실(L_D)을 계산하여 역전파하여 인공 신경망을 학습시킨다(S14). 이때, 생성 손실(L_G)은 학습 UV 맵(T_e)의 생성 시에 학습용 2D 얼굴 이미지에 대해 학습에 의해 수행된 3D 얼굴 모델의 정렬 오차에 따른 손실을 반영하는 것으로 볼 수 있고, 판별 손실(L_D)은 학습 UV 맵(T_e)의 정렬 오차를 판별하기 위해 학습 UV 맵(T_e)을 학습 UV 완성 맵(T_ef)으로 보완하고 판정하는 과정에서 발생된 오차로 볼 수 있다. 따라서 학습 UV 맵(T_e)의 생성과 학습 UV 맵(T_e)을 보완하고 판정하는 판별 단계의 경쟁 학습을 수행하는 것으로 볼 수 있다.After learning UV map image energy (E _uv (T _e ,M _e )) of UV completion map (T _ef ), UV map symmetry energy (E _f (T _e ,M _e )) and UV map perceptual energy (E _p ( T _e, M _e)) the total UV energy (e (T _e of the learning UV complete map (T _ef) on the basis of, M _e)) to generate losses (L _G), and learning UV complete map (T _ef obtained by ) with a margin (m) according to the relationship between the total UV energy (E(T _e ,M _e )) and the total UV energy (E(T _ref ,M _ref )) for the reference UV map (T _{ref ).} calculated by backpropagation discriminant loss (L _D) thereby learn the neural network (S14). At this time, the generation loss (L _G ) can be seen as reflecting the loss due to the alignment error of the 3D face model performed by training on the 2D face image for training when the training UV map (T _{e ) is generated, and the discrimination loss} (L _D) can be considered as generated in the process to supplement the study UV complete map (T _ef) the learning UV map (T _e) to determine the alignment error of the learning UV map (T _e) and the determined error. Therefore it may be viewed as performing a competitive learning of the discrimination step that complements the production and study UV map (T _e) of the learning map UV (T _e) is determined.

그리고 학습 종료 여부를 판별한다(S15). 이때 학습은 생성 손실(L_G)과 판별 손실(L_D) 각각 또는 생성 손실(L_G)과 판별 손실(L_D)의 합이 기지정된 기준 손실 이하가 될 때까지 반복한 후 종료되도록 설정될 수 있다. 또는 기지정된 횟수로 반복되어 수행된 후 종료되도록 설정될 수 있다.And it is determined whether the learning is finished (S15). At this time, the learning may _{be set to end after repeating until the generation loss (LG} ) and discrimination loss (L _D ), respectively, or _{the sum of generation loss (LG} ) and discrimination loss (L _D ) is less than or equal to a predetermined reference loss. can Alternatively, it may be set to be completed after being repeated a predetermined number of times.

만일 학습이 설정된 조건을 만족하여 종료된 것으로 판별되면, 3D 얼굴 이미지로 합성하고자 하는 2D 얼굴 이미지(I)를 획득한다(S20).If it is determined that the learning has been completed by satisfying the set condition, a 2D face image I to be synthesized into a 3D face image is obtained (S20).

그리고 미리 학습된 패턴 추정 방식에 따라 2D 얼굴 이미지(I)의 패턴을 추정하여, 기지정된 3D 얼굴 모델이 학습용 2D 얼굴 이미지에 정렬되도록 형상 변화와 위치 및 지향 방향을 조절하는 매칭 파라미터(f_s, f_c)를 추출한다(S21). 또한 추출된 맵칭 파라미터(f_s, f_c)에 따라 3D 얼굴 모델을 변형 및 이동시켜 3D 얼굴 모델을 2D 얼굴 이미지(I)에 정렬시키고, 정렬된 3D 얼굴 모델에 2D 얼굴 이미지를 맵핑하여 획득되는 3D 얼굴 이미지를 2차원 UV 좌표계에 전개하여 UV 맵(T)을 획득한다(S22). 이때, 2D 얼굴 이미지(I)에서 마스크(M)를 함께 획득할 수 있다. 마스크(M)는 2D 얼굴 이미지(I)의 대상자의 얼굴에서 스캔되지 않은 홀 영역을 분석하여 획득할 수 있다.And by estimating the pattern of the 2D face image (I) according to the pre-learned pattern estimation method, a matching parameter (f _s , f _c ) is extracted (S21). Also, according to the extracted mapping parameters (f _s , f _c ), the 3D face model is transformed and moved to align the 3D face model to the 2D face image (I), and the 2D face image is mapped to the aligned 3D face model. A UV map T is obtained by developing a 3D face image in a two-dimensional UV coordinate system (S22). In this case, the mask M may also be acquired from the 2D face image I. The mask M may be obtained by analyzing an unscanned hole area in the subject's face in the 2D face image I.

2D 얼굴 이미지(I)에 대응하는 UV 맵(T)과 마스크(M)가 획득되면, UV 맵(T)과 마스크(M)를 기지정된 방식으로 결합하고, 미리 학습된 패턴 추정 방식에 따라 마스크(M)가 결합된 UV 맵(T)의 패턴을 추정하여, 마스크(M)에 의해 지정된 홀 영역이 채워진 알베도 맵(A)과 조명 맵(S)을 추출하여 획득한다(S41). 여기서 알베도 맵(A)은 인코더와 디코더를 포함하는 오토 인코더로 구성된 인공 신경망을 이용하여 사람 얼굴의 수평 대칭성에 기반하여 반사 계수를 추정함으로써, 홀 영역이 채워진 UV 맵(T)의 색상을 표현하는 맵이다. 그리고 조명 맵(S)은 오토 인코더에서 인코더와 디코더의 대응하는 레이어에서 출력되는 특징맵을 인가받고, 인가된 특징맵 사이의 패턴 차이를 기반으로 UV 맵의 조명 특징을 추출하여 획득되는 맵이다.When the UV map (T) and mask (M) corresponding to the 2D face image (I) are obtained, the UV map (T) and the mask (M) are combined in a predetermined manner, and the mask according to the pre-learned pattern estimation method (M) is obtained by estimating the pattern of the combined UV map (T), by extracting the albedo map (A) and the illumination map (S) filled with the hole area designated by the mask (M) (S41). Here, the albedo map (A) uses an artificial neural network composed of an auto-encoder including an encoder and a decoder to estimate the reflection coefficient based on the horizontal symmetry of the human face, thereby expressing the color of the UV map (T) filled with the hole area. it's a map And the lighting map (S) is a map obtained by receiving a feature map output from the corresponding layer of the encoder and the decoder from the auto encoder, and extracting the lighting feature of the UV map based on the pattern difference between the applied feature maps.

알베도 맵(A)과 조명 맵(S)이 획득되면, 알베도 맵(A)과 조명 맵(S)을 원소 곱하여 홀 영역이 개략적으로 보상된 보상 UV 맵(T_c)을 획득한다(S42).When the albedo map (A) and the illumination map (S) are obtained, the compensation UV map (T _c ) in which the hole area is roughly compensated is obtained by elementally multiplying the albedo map (A) and the illumination map (S) ( S42 ).

보상 UV 맵(T_c)이 획득되면, 보상 UV 맵(T_c)과 마스크(M)를 기지정된 방식으로 결합하고, 미리 학습된 패턴 추정 방식에 따라 마스크(M)가 결합된 보상 UV 맵(T_c)의 패턴을 추정하여 미세 특징을 추출하고, 추출된 미세 특징에 기반하여 UV 맵(T_c)의 세부 영역을 보완하여 정밀한 질감을 갖는 UV 완성 맵(T_f)을 획득한다(S50).When the compensation UV map (T _c ) is obtained, the compensation UV map (T _c ) and the mask (M) are combined in a predetermined way, and the compensation UV map ( T _c) estimates the pattern by extracting a fine features of, and obtain, based on the extracted fine features UV map (T _c) UV complete map having a fine texture to complement the subdivision of (T _f) (S50) .

추가적으로 사용자 명령에 응답하여, 획득된 UV 완성 맵(T_f)을 대응하는 3D 얼굴 모델에 맵핑하여 3D 얼굴 이미지를 합성할 수 있다(S60).Additionally, in response to a user command, a 3D face image may be synthesized by mapping the _{obtained UV completion map T f to a corresponding 3D face model ( S60 ).}

본 발명에 따른 방법은 컴퓨터에서 실행시키기 위한 매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다. 여기서 컴퓨터 판독가능 매체는 컴퓨터에 의해 액세스 될 수 있는 임의의 가용 매체일 수 있고, 또한 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함하며, ROM(판독 전용 메모리), RAM(랜덤 액세스 메모리), CD(컴팩트 디스크)-ROM, DVD(디지털 비디오 디스크)-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등을 포함할 수 있다.The method according to the present invention may be implemented as a computer program stored in a medium for execution by a computer. Here, the computer-readable medium may be any available medium that can be accessed by a computer, and may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, and read dedicated memory), RAM (Random Access Memory), CD (Compact Disk)-ROM, DVD (Digital Video Disk)-ROM, magnetic tape, floppy disk, optical data storage, and the like.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다.Although the present invention has been described with reference to the embodiment shown in the drawings, which is only exemplary, those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom.

따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.Accordingly, the true technical protection scope of the present invention should be defined by the technical spirit of the appended claims.

100: 얼굴 이미지 획득부 200: 얼굴 모델 정렬부
210: UV 인코더 220: UV 맵핑부
230: 마스크 획득부 300: UV 맵 완성부
310: UV 맵 개략 추정부 311: 제1 마스크 결합부
312: 반사 추정부 313: 조명 추정부
314: 보상 UV 맵 획득부 320: UV 맵 세부 추정부
321: 제2 마스크 결합부 322: 미세 추정부
330: 에너지 판정부 400: 3D 영상 합성부100: face image acquisition unit 200: face model alignment unit
210: UV encoder 220: UV mapping unit
230: mask acquisition unit 300: UV map completion unit
310: UV map rough estimation unit 311: first mask coupling unit
312: reflection estimator 313: lighting estimator
314: compensation UV map acquisition unit 320: UV map detailed estimation unit
321: second mask coupling unit 322: fine estimation unit
330: energy determination unit 400: 3D image synthesis unit

Claims

A 2D face image to be synthesized into a 3D face image is received, matching parameters are extracted from the pattern of the 2D face image according to a pre-learned pattern estimation method, and a predetermined 3D face model is aligned with the 2D face image to obtain a UV map a face model alignment unit; and
According to the pre-learned pattern estimation method, the hole area generated in the UV map by the unscanned area of the 2D face image is estimated and filled based on the horizontal symmetry of the face to obtain a compensation UV map, and the fineness of the compensation UV map A 3D face image synthesizing apparatus comprising a UV map completion unit that extracts features and obtains a UV complete map by restoring a UV map reflecting fine features.

According to claim 1, wherein the face model alignment unit
A pattern estimation method is implemented as a pre-trained artificial neural network to obtain the matching parameters indicating the shape change and position and orientation direction required to align the 3D face model to the 2D face image from the pattern of the 2D face image UV encoder;
According to the matching parameter, the 3D face model is aligned to the 2D face image, the 2D face image is mapped to the 3D face model, and the mapped 3D face model is developed on a predetermined two-dimensional UV space to develop the UV mapping unit to obtain a UV map; and
and a mask acquisition unit configured to detect a hole area generated by an unscanned area of the 2D face image in the UV map and generate a mask corresponding to the hole area.

The method of claim 2, wherein the UV map completion unit
The UV map and the mask are combined, and the reflection coefficient based on the horizontal symmetry of the face is estimated from the pattern of the UV map to which the mask is combined according to a pre-learned pattern estimation method to determine the color of the hole area generated in the UV map. a UV map rough estimation unit for obtaining the compensated UV map by filling in and extracting the contrast according to the illumination of the UV map including the hole area from the pattern of the UV map coupled with the mask; and
A 3D face comprising a UV map detailed estimator that combines the compensation UV map and the mask, and obtains the UV complete map by extracting and restoring fine features of the compensation UV map to which the mask is combined according to a pre-learned pattern estimation method image compositing device.

The method of claim 3, wherein the UV map completion part
Using the training data included by matching the 2D face image for training and the reference UV map, which is a pre-generated UV map corresponding to the 2D face image for training, the face model aligning unit accurately aligns the 3D face model to the 2D face image Whether or not is obtained as a generation loss from energy calculated in a predetermined manner for a learning UV completion map, which is a UV completion map obtained from the 2D face image for training,
By calculating the difference between the UV completion map and the learning UV completion map in which the margin is considered whether the UV map completion unit normally fills the hole area of the learning UV completion map, a discrimination loss is obtained,
3D face image synthesizing apparatus further comprising an energy determining unit configured to train the face model aligning unit and the UV map completion unit in a competitive learning method so that the generation loss and the discrimination loss are reduced.

The method of claim 4, wherein the UV map rough estimation unit
a first mask coupling unit for outputting the UV map and the mask by combining them in a predetermined manner;
It is implemented as an auto-encoder including an encoder and a decoder implemented as an artificial neural network, and according to a pattern estimation method learned in advance, the reflection coefficient based on the horizontal symmetry of the face is estimated from the pattern of the UV map combined with the mask. a reflection estimator for obtaining an albedo map, which is a UV map from which the color of the region is estimated;
The reflection estimator receives a feature map output from a layer corresponding to each of the encoder and decoder, and estimates the illumination coefficient of the UV map according to a pre-learned pattern estimation method based on the feature map applied from each encoder and decoder. a lighting estimator to obtain a lighting map that is a UV map in which the colors of the entire area including the hole area are estimated; and
and a compensation UV map acquisition unit configured to obtain the compensation UV map by elementally multiplying the albedo map and the illumination map.

The method of claim 5, wherein the UV map detailed estimation unit
a second mask coupling unit for outputting the compensation UV map and the mask by combining them in a predetermined manner; and
Fine estimation to obtain the UV complete map by extracting and restoring fine features from the pattern of the compensated UV map combined with a mask according to a pattern estimation method learned in advance by being implemented as an auto encoder including an encoder and a decoder implemented as an artificial neural network 3D face image synthesizing device with wealth.

The method of claim 6, wherein the energy determination unit
_{In the UV map (T), the UV map image energy (E uv} (T,M)) representing the image loss in pixels in the effective area excluding the hole area specified by the mask (M) is calculated by the equation

(where Tc and Tf represent the compensation UV map and the UV completion map, respectively, and ⊙ is the elemental product operator.)
_{The UV map symmetric energy (E f} (T,M)) indicating whether the albedo map (A) normally fills the hole area according to the horizontal symmetry of the face is calculated

(where flip() is a left-right flip function,

It is calculated according to the binary OR operator),
_{The UV map perceptual energy (E p} (T,M)) representing the granularity of the UV complete map is calculated by the equation

(Where F is the pre-trained Visual Geometry Group (VGG) face identifier of the nth selected layer.)
total UV energy

(Where λ _uv , λ _p and λ _f are energy weights.)
A 3D face image synthesizing device that counts with

The method of claim 7, wherein the energy determination unit
The total learning UV energy E (T _e _{) calculated for the training UV map (T e} ) and the training mask (M _e ) that are the UV map and mask obtained by aligning the 3D face model from the 2D face image for training by the face model aligner , M _e ) as the production loss (L _G )

obtained as
The reference UV map (T _ref) and the reference UV map (T _ref) based on a mask (M _ref) a total standard UV energy E (T _ref, M _ref) and the total study UV energy E (T, which is calculated for the corresponding _e , M _e ) and the margin using the above discriminant loss (L _D ) is expressed in Equation

A 3D face image synthesizing device acquired as

The method of claim 8, wherein the energy determination unit
The generation loss ( _LG ) and the discrimination loss (L _D ), respectively, or the _{sum of the generation loss (LG} ) and the discrimination loss (L _D ) is repeatedly backpropagated to be less than or equal to a predetermined reference loss, or A 3D face image synthesizing device that performs learning by repeating backpropagation a number of times.

A 2D face image to be synthesized into a 3D face image is received, matching parameters are extracted from the pattern of the 2D face image according to a pre-learned pattern estimation method, and a predetermined 3D face model is aligned with the 2D face image to obtain a UV map to do; and
According to the pre-learned pattern estimation method, the hole area generated in the UV map by the unscanned area of the 2D face image is estimated and filled based on the horizontal symmetry of the face to obtain a compensation UV map, and the fineness of the compensation UV map A method for synthesizing a 3D face image, comprising the step of extracting features and restoring a UV map reflecting fine features to obtain a UV complete map.

The method of claim 10, wherein the acquiring the UV map comprises:
A pattern estimation method is implemented as a pre-trained artificial neural network to obtain the matching parameters indicating the shape change and position and orientation direction required to align the 3D face model to the 2D face image from the pattern of the 2D face image step;
According to the matching parameter, the 3D face model is aligned to the 2D face image, the 2D face image is mapped to the 3D face model, and the mapped 3D face model is developed on a predetermined two-dimensional UV space to develop the obtaining a UV map; and
and detecting a hole area generated by an unscanned area of the 2D face image in the UV map and generating a mask corresponding to the hole area.

12. The method of claim 11, wherein obtaining the UV complete map comprises:
The UV map and the mask are combined, and the reflection coefficient based on the horizontal symmetry of the face is estimated from the pattern of the UV map to which the mask is combined according to a pre-learned pattern estimation method to determine the color of the hole area generated in the UV map. obtaining the compensated UV map by filling and extracting the light and shade according to the illumination of the UV map including the hole area from the pattern of the UV map to which the mask is combined; and
3D face image synthesis method comprising the step of combining the compensation UV map and the mask, and outputting the UV completion map by extracting and restoring fine features of the compensation UV map to which the mask is combined according to a pre-learned pattern estimation method .

The method of claim 12, wherein obtaining the UV complete map comprises:
Using the training data included by matching the 2D face image for training and the reference UV map, which is a pre-generated UV map corresponding to the 2D face image for training, it is determined whether the 3D face model is correctly aligned with the 2D face image for the training 2D obtained as a generation loss from energy calculated in a predefined manner for a training UV completion map, which is a UV completion map obtained from a face image;
By calculating the difference between the UV completion map and the learning UV completion map in which the margin is considered whether or not the hole area of the learning UV completion map is normally filled, a discrimination loss is obtained,
3D face image synthesis method further comprising learning in a competitive learning method so that the generation loss and the discrimination loss are reduced.

14. The method of claim 13, wherein obtaining the compensation UV map comprises:
combining and outputting the UV map and the mask in a predetermined manner;
It is implemented as an auto-encoder including an encoder and a decoder implemented as an artificial neural network, and according to a pattern estimation method learned in advance, the reflection coefficient based on the horizontal symmetry of the face is estimated from the pattern of the UV map combined with the mask. obtaining an albedo map, which is a UV map in which the color of the region is estimated;
Receive a feature map output from each corresponding layer of the encoder and decoder of the auto encoder, and estimate the illumination coefficient of the UV map according to a pre-learned pattern estimation method based on the feature map applied from each encoder and decoder to obtain an illumination map that is a UV map in which the color of the entire area including the hole area is estimated; and
and obtaining the compensated UV map by elementally multiplying the albedo map and the illumination map.

The method of claim 14, wherein outputting the UV completion map comprises:
combining and outputting the compensation UV map and the mask in a predetermined manner; and
Obtaining the UV complete map by extracting and restoring fine features from the pattern of the compensated UV map that is implemented as an auto encoder including an encoder and a decoder implemented as an artificial neural network and is combined with a mask according to a pre-learned pattern estimation method 3D face image synthesis method including.

16. The method of claim 15, wherein the learning step
_{In the UV map (T), the UV map image energy (E uv} (T,M)) representing the image loss in pixels in the effective area excluding the hole area specified by the mask (M) is calculated by the equation

(where flip() is a left-right flip function,

It is calculated according to the binary OR operator.)
_{The UV map perceptual energy (E p} (T,M)) representing the detail of the UV complete map is calculated by the equation

(Where λ _uv , λ _p and λ _f are energy weights.)
3D face image synthesis method calculated by .

The method of claim 16, wherein the learning step
The total learning UV energy E (T _e _{) calculated for the training UV map (T e} ) and the training mask (M _e ) that are the UV map and mask obtained by aligning the 3D face model from the 2D face image for training by the face model aligner , M _e ) into the equation

obtaining as the production loss (L _{G );}
The reference UV map (T _ref) and the reference UV map (T _ref) based on a mask (M _ref) a total standard UV energy E (T _ref, M _ref) and the total study UV energy E (T, which is calculated for the corresponding _{Equation using e} , M _e ) and margin

3D face image synthesis method comprising the step of obtaining the discrimination loss (L _{D ) according to}

18. The method of claim 17, wherein the learning step
The generation loss ( _LG ) and the discrimination loss (L _D ), respectively, or the _{sum of the generation loss (LG} ) and the discrimination loss (L _D ) is repeatedly backpropagated to be less than or equal to a predetermined reference loss, or A 3D face image synthesis method that performs learning by repeating and backpropagating a number of times.