KR20220050504A

KR20220050504A - Appratus and method for converting a facial images

Info

Publication number: KR20220050504A
Application number: KR1020200134256A
Authority: KR
Inventors: 송병훈; 문영필; 최호준
Original assignee: 주식회사 케이티
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2022-04-25

Abstract

The present invention relates to a GAN-based facial image conversion method. The method: generates a plurality of facial image learning data having various angles and distances by executing warping a source image of a first stage of converting scale; generates a facial image conversion module by deep learning the warped or scaled facial image learning data based on a generative adversarial neural network (GAN); and generates an input image similar to the trained image by inputting an input image to be converted into the facial image conversion module.

Description

FACIAL IMAGE CONVERSION DEVICE AND FACIAL IMAGE CONVERSION METHOD

본 발명은 디지털 이미지 처리 기술에 관한 것으로, 보다 구체적으로는 생성적 대립 신경망(GAN : Generative Adversarial Networks)을 이용하여 안면 이미지를 원하는 이미지와 유사하게 변환하는 장치 및 방법에 관한 것이다. The present invention relates to digital image processing technology, and more particularly, to an apparatus and method for converting a facial image to be similar to a desired image using a generative adversarial network (GAN).

모바일 관련 기술의 발전으로 인해, 카메라, 인터넷, 디스플레이 등과 같은 다양한 기능이 구현된 모바일 기기가 개발되어 사람들의 필수품으로 사용되고 있다. 이러한 모바일 기기는 일상 생활을 누리기 위해 다양하게 활용된다. Due to the development of mobile-related technologies, mobile devices in which various functions such as a camera, the Internet, and a display are implemented have been developed and used as necessities of people. These mobile devices are used in various ways to enjoy daily life.

가장 대표적으로, 모바일 기기는 사진 관련 기능을 수행하기 위해 활용된다. 예를 들어, 모바일 기기는 간편하게 사진을 촬영하고, 자신의 SNS(Social network system) 계정에 사진을 업로드하는데 사용될 수 있다. 최근 SNS가 사람들의 일상 생활에 깊숙이 자리함에 따라, 모바일 기기에서 사진 관련 기능을 사용하는 시간이 점점 증가하는 추세이다. Most representatively, a mobile device is utilized to perform a photo-related function. For example, the mobile device may be used to simply take a photo and upload the photo to one's own social network system (SNS) account. Recently, as social media has become deeply embedded in people's daily lives, the amount of time spent using photo-related functions on mobile devices is increasing.

SNS에 업로드하는 사진 중에서 인물 이미지는 상당한 비중을 차지하고 있다. 그러나, 종래의 모바일 기기 상의 인물 이미지 편집 기술은 인물 이미지 전체를 필터 처리하는 수준이다. 일부 인물 이미지 편집 기술은 인물 이미지에 포함된 얼굴의 변형이 가능하지만, 조작법이 어려워 사용자 편의성 측면에서 한계가 있었다.Among the photos uploaded to social media, portrait images occupy a significant proportion. However, the conventional person image editing technology on a mobile device is at the level of filtering the entire person image. Some portrait image editing technologies allow the transformation of faces included in portrait images, but there are limitations in terms of user convenience due to difficulties in operation.

예를 들어, 특허문헌 1(한국 공개특허공보 제 2020-0080577 호)은 사용자가 간편하게 얼굴의 표정 유형, 인상 유형, 포즈와 회전방향 등의 변형을 수행할 수 있는 인터페이스를 위한 이미지 편집 애플리케이션을 제공한다. For example, Patent Document 1 (Korean Patent Application Laid-Open No. 2020-0080577) provides an image editing application for an interface that allows a user to easily change facial expression types, impression types, poses and rotation directions, etc. do.

최근에는 딥러닝 기술을 활용한 각종 이미지 변환 및 합성 방법들이 등장하고 있다. Recently, various image conversion and synthesis methods using deep learning technology have emerged.

예를 들어, 특허문헌 2(일본 공개특허공보 제 2019-148980 호)는 비카메라시선(카메라에 시선이 맞지 않는) 입력 화상에 기반하여 심층 학습(deep learning) 기반의 생성모델인 생성적 대립 신경망(GAN : Generative Adversarial Networks)을 이용하여 카메라 시선의 출력 이미지로 변환하는 기술을 제안한다. For example, Patent Document 2 (Japanese Patent Application Laid-Open No. 2019-148980) is a generative opposing neural network that is a deep learning-based generative model based on a non-camera gaze (the gaze does not match the camera) input image. (GAN: Generative Adversarial Networks) We propose a technology to convert the output image of the camera gaze.

이를 위해서, 특허문헌 2는 사전에 학습을 위해 등록된 많은 수의 샘플 안면 이미지를 필요로 한다. 또한, 이렇게 많은 수의 샘플 안면 이미지를 생성하기 위해 제어된 설정하에서 여러 위치에 배치된 알려진 속성을 가진 많은 수의 카메라를 사용해야 한다. 이러한 시스템은 비싸고 섬세할 뿐만 아니라 다루기 어렵고 일반 사용자가 사용할 수 없다. 물론, 얼굴의 식별 정확도와 변환의 정확성은 다양하고, 많은 등록된 샘플 안면 이미지를 통해 이점을 얻을 수 있지만 이러한 이미지를 제공하면 사용자의 부담이 크게 증가한다.To this end, Patent Document 2 requires a large number of sample facial images registered for learning in advance. In addition, to generate such a large number of sampled facial images, it is necessary to use a large number of cameras with known properties placed in multiple locations under controlled settings. Not only are these systems expensive and delicate, but they are also difficult to handle and not available to the average user. Of course, the identification accuracy of the face and the accuracy of the transformation vary, and it can benefit from many registered sample facial images, but providing these images greatly increases the burden on the user.

KRKR 2020-00805772020-0080577 AA JPJP 2019-1489802019-148980 AA

이와 같이, 종래에는 생성적 대립 신경망(GAN : Generative Adversarial Networks)을 적용하여 이용자의 얼굴을 학습시킨 얼굴로 변환함에 있어서, 원하는 안면 이미지를 만들어내기 위해서 각도와 거리를 달리하는 여러 장의 이미지 학습 데이터를 수집해야만 했다. 즉, 위나 아래 또는 측면에 대한 다양한 이미지 학습 데이터가 수집되지 않으면, 대응하지 않는 각도와 거리의 입력 영상이 들어왔을 때 깨짐이 심한 출력 이미지를 생성할 수 밖에 없다. In this way, in the prior art, in converting a user's face into a learned face by applying a generative adversarial network (GAN), several pieces of image learning data with different angles and distances to create a desired facial image are used. had to collect That is, if various image learning data for the top, bottom, or side are not collected, an output image with severe cracking is inevitably generated when an input image of an angle and distance that does not correspond to an input image is received.

따라서, 본 발명이 달성하고자 하는 기술적 과제는, 1장의 이미지 학습 데이터에 기반하여 다양한 각도와 스케일을 갖는 다수의 이미지 학습데이터를 생성하는 방법을 제공하는 것이다. Accordingly, the technical problem to be achieved by the present invention is to provide a method of generating a plurality of image learning data having various angles and scales based on one sheet of image learning data.

본 발명의 다른 기술적 과제는 1장의 이미지 학습 데이터에 기반하여 다양한 각도와 거리에서도 깨짐이 없는 변환된 이미지를 생성할 수 있는 인공지능 딥러닝 모델을 제공하는 것이다. Another technical task of the present invention is to provide an artificial intelligence deep learning model capable of generating a transformed image without breakage even at various angles and distances based on image learning data of one sheet.

또한, 본 발명의 또 다른 기술적 과제는 생성적 대립 신경망(GAN)을 이용하여 사용자의 얼굴을 학습시킨 얼굴로 변환하는 방법을 제공하는 것이다. In addition, another technical object of the present invention is to provide a method for converting a user's face into a learned face using a generative adversarial neural network (GAN).

상술한 기술적 과제를 달성하기 위한 본 발명의 제 1 양태에 따른 GAN 기반의 안면 이미지 변환방법은, 1장의 소스 이미지를 워핑(Warping) 및 스케일 변환하여 여러 각도와 거리를 갖는 다수의 안면 이미지 학습데이터를 생성하는 제 1 단계; 상기 제 1 단계에서 워핑되거나 스케일이 조정된 상기 안면 이미지 학습데이터를 생성적 대립 신경망(GAN)에 기반하여 심층 학습함으로써 안면 이미지 변환모듈을 생성하는 제 2 단계; 및 상기 안면 이미지 변환모듈에 변환하고자 하는 입력 이미지를 입력하여 학습된 이미지와 유사한 출력 이미지를 생성하는 제 3 단계를 포함한다. GAN-based facial image conversion method according to the first aspect of the present invention for achieving the above-described technical problem, a plurality of face image learning data having various angles and distances by warping and scaling one source image A first step of creating a; a second step of generating a facial image conversion module by deep learning the facial image training data warped or scaled in the first step based on a generative alternative neural network (GAN); and a third step of generating an output image similar to the learned image by inputting an input image to be converted into the facial image conversion module.

본 발명의 다른 제 2 양태에 따른 입력 이미지를 학습된 이미지와 유사한 출력 이미지로 변환하는 안면 이미지 변환장치는, 하나 이상의 인스트럭션들을 저장하는 메모리; 및 상기 메모리에 저장된 상기 하나 이상의 인스트럭션들을 실행하는 프로세서를 포함하고, 상기 프로세서는, 1장의 소스 이미지를 워핑(Warping) 및 스케일 변환하여 여러 각도와 거리를 갖는 다수의 안면 이미지 학습데이터를 생성하는 제 1 단계와, 상기 안면 이미지 학습데이터를 생성적 대립 신경망(GAN)에 기반하여 심층 학습함으로써 안면 이미지 변환모듈을 생성하는 제 2 단계 및 상기 안면 이미지 변환모듈에 변환하고자 하는 입력 이미지를 입력하여 학습된 상기 소스 이미지와 유사한 출력 이미지를 생성하는 제 3 단계를 포함하는 프로세스를 실행하는 것을 특징으로 한다. A facial image conversion apparatus for converting an input image into an output image similar to a learned image according to another second aspect of the present invention includes: a memory for storing one or more instructions; and a processor executing the one or more instructions stored in the memory, wherein the processor generates a plurality of face image learning data having various angles and distances by warping and scaling one source image Step 1, the second step of generating a facial image conversion module by deep learning the facial image learning data based on a generative alternative neural network (GAN), and input the input image to be converted into the facial image conversion module and executing a process comprising a third step of generating an output image similar to the source image.

상기 제 1 단계는, 상기 1장의 소스 이미지에 기반하여 상,하,좌,우로 일정한 각도 회전된 다수의 워핑 이미지를 생성하는 워핑 단계와; 상기 워핑 이미지의 안면 이미지의 스케일을 조정하는 스케일 변환단계;를 포함한다. The first step may include: a warping step of generating a plurality of warped images rotated at a constant angle up, down, left, and right based on the one source image; and a scale conversion step of adjusting the scale of the facial image of the warping image.

상기 워핑 이미지는 상기 소스 이미지를 상,하 20도 범위내 그리고, 좌,우 30도 범위내로 회전시킨 것이고, 상기 스케일은 0.25배로부터 1.5배까지 조정되는 것을 특징으로 한다. The warping image is a rotation of the source image within a range of 20 degrees up and down and 30 degrees left and right, and the scale is adjusted from 0.25 times to 1.5 times.

상기 워핑 단계는, 상기 소스 이미지로부터 여러개의 히트맵을 생성하고, 상기 히트맵을 FC 레이어를 거쳐서 Affine 변환식의 계수들을 얻고, 상기 Affine 변환식에 기반하여 순차적으로 이미지를 딥러닝시키는 것에 의해 Affine 변환식 네트워크를 얻고, 이 Affine 변환식 네트워크를 통해 상기 워핑 이미지를 생성하는 것을 특징으로 한다. In the warping step, the Affine transform network by generating several heatmaps from the source image, passing the heatmaps through the FC layer to obtain coefficients of the Affine transform equation, and deep learning the image sequentially based on the Affine transform equation , and generating the warping image through this Affine transform network.

상기 제 2 단계는, 워핑되고, 스케일이 변환된 상기 안면 이미지 학습데이터를 랜드마크 학습데이터로 변환하는 단계와; 상기 안면 이미지 학습데이터를 스타일 인코딩한 데이터의 산술연산 곱을 산출하는 단계와; 상기 랜드마크 학습데이터를 GAN 모듈의 생성기의 초기 입력으로 하고, 상기 산술연산 곱을 상기 생성기의 네트워크 레이어별로 입력하여 GAN 모듈의 식별기로 하여금 판별하게 함으로써 생성모델과 판별모델이 경쟁하면서 심층학습하는 것에 의해 상기 안면 이미지 변환모듈을 생성하는 단계를 포함한다. The second step includes: converting the warped, scale-converted face image learning data into landmark learning data; calculating an arithmetic operation product of the style-encoded data of the facial image learning data; By taking the landmark learning data as the initial input of the generator of the GAN module, and inputting the arithmetic operation product for each network layer of the generator to make the identifier of the GAN module discriminate, the generating model and the discriminating model compete with each other and deep learning by and generating the facial image conversion module.

상기 산출연산 곱을 산출하는 단계는, 상기 스타일 인코딩을 통해 나온 특징의 시그모이드(Sigmoid) 함수와 상기 안면 이미지 학습데이터가 네트워크의 레이어를 거쳐 나온 특징을 행렬 산술연산 곱(Element-wise multiplication)을 하는 것을 특징으로 한다. The step of calculating the calculation product is a matrix arithmetic operation product (Element-wise multiplication) of a sigmoid function of the feature obtained through the style encoding and the feature that the facial image learning data has passed through the layer of the network characterized in that

상기 제 3 단계는, 변환하고자 하는 입력 이미지의 변형 랜드마크를 생성하는 단계와; 상기 안면 이미지 변환모듈이 상기 변형 랜드마크에 기반하여 상기 학습된 이미지와 유사한 출력 이미지를 생성하는 단계를 포함한다. The third step may include: generating a deformation landmark of the input image to be converted; and generating, by the facial image conversion module, an output image similar to the learned image based on the deformation landmark.

상기 변형 랜드마크는 상기 학습된 이미지의 랜드마크의 위치평균과 분산을 구하고, 이 위치평균과 분산에 기반하여 상기 입력 이미지의 랜드마크의 위치의 평균과 분산을 조정한 것을 특징으로 한다. The modified landmark is characterized in that the average and variance of the location of the landmark of the learned image are obtained, and the mean and variance of the location of the landmark of the input image are adjusted based on the mean and variance of the location.

본 발명의 또 다른 제 3 양태에 따른 안면 이미지 변환 동작을 훈련하기 위한 장치는, 하나 이상의 프로세서; 및 상기 하나 이상의 프로세서에 결합되어 입력 이미지를 학습된 이미지와 유사한 출력 이미지로 변환하기 위해 상기 하나 이상의 프로세서에 의해 실행 가능한 하나 이상의 프로그램 모듈들을 포함하는 메모리를 포함하고; 상기 프로그램 모듈은, 소스 이미지를 상,하,좌,우로 일정한 각도 회전시키고, 이미지의 스케일을 조정하여 1장의 소스 이미지로부터 각도와 거리를 달리하는 다수의 안면 이미지 학습데이터를 생성하는 이미지 전처리 모듈과; 상기 다수의 안면 이미지 학습데이터를 생성모델과 판별모델이 경쟁하면서 심층 학습하는 생성적 대립 신경망(GAN) 기반의 GAN 모듈을 포함한다. An apparatus for training a facial image transformation operation according to another third aspect of the present invention comprises: one or more processors; and a memory coupled to the one or more processors comprising one or more program modules executable by the one or more processors to convert an input image to an output image similar to a learned image; The program module includes an image pre-processing module that rotates a source image at a certain angle up, down, left, and right, and adjusts the scale of the image to generate a plurality of face image learning data with different angles and distances from one source image and ; It includes a generative adversarial neural network (GAN)-based GAN module for deep learning while a generative model and a discrimination model compete for the plurality of facial image learning data.

상기 이미지 전처리 모듈은, 상기 1장의 소스 이미지에 기반하여 상,하,좌,우로 일정한 각도 회전된 다수의 워핑 이미지를 생성하는 워핑부와; 상기 워핑 이미지의 안면 이미지의 스케일을 조정하는 스케일 변환부;를 포함한다. The image preprocessing module includes: a warping unit for generating a plurality of warped images rotated at a predetermined angle up, down, left, and right based on the one source image; and a scale converter for adjusting the scale of the facial image of the warping image.

상기 워핑부는, 상기 소스 이미지로부터 여러개의 히트맵을 생성하고, 상기 히트맵을 FC 레이어를 거쳐서 Affine 변환식의 계수들을 얻고, 상기 Affine 변환식에 기반하여 순차적으로 이미지를 딥러닝시키는 것에 의해 생성되는 Affine 변환식 네트워크이다. The warping unit generates a plurality of heatmaps from the source image, obtains the coefficients of the Affine transform formula by passing the heat map through the FC layer, and sequentially deep-leaves the image based on the Affine transform formula. It is a network.

상기 GAN 모듈은 생성기와 식별기를 포함하고, 워핑 및 스케일이 변환된 상기 안면 이미지 학습데이터로부터 생성되는 랜드마크 학습데이터를 상기 생성기의 초기 입력으로 하고, 상기 안면 이미지 학습데이터를 스타일 인코딩한 데이터의 산술연산 곱을 상기 생성기의 네트워크 레이어별로 입력하여 상기 식별기로 하여금 판별하게 함으로써 생성모델과 판별모델이 경쟁하면서 심층학습하는 생성적 대립 신경망이다. The GAN module includes a generator and an identifier, and uses landmark training data generated from the warping and scale-converted facial image training data as an initial input to the generator, and arithmetic of the facial image training data style-encoded data It is a generative opposing neural network in which a generative model and a discriminant model compete in deep learning by inputting an operation product for each network layer of the generator and allowing the discriminator to discriminate.

상기 산술연산 곱은, 상기 스타일 인코딩을 통해 나온 특징의 시그모이드(Sigmoid) 함수와 상기 안면 이미지 학습데이터가 네트워크의 레이어를 거쳐 나온 특징을 행렬 산술연산 곱(Element-wise multiplication)을 하는 것을 특징으로 한다. The arithmetic operation product is a matrix arithmetic operation multiplication (Element-wise multiplication) of the sigmoid function of the feature obtained through the style encoding and the feature that the facial image learning data has passed through the network layer, characterized in that do.

본 발명에 따르면, 1장의 소스 이미지로부터 각도와 거리를 달리하는 여러장의 이미지 학습데이터를 생성하고, 이 이미지 학습데이터를 딥러닝하여 생성적 대립 신경망 기반의 안면 이미지 변환모델을 생성할 수 있다. 이로 인해, 각도와 거리를 달리하는 여러장의 이미지 학습데이터를 수집할 필요가 없기 때문에 값싸면서 섬세하고, 사용자의 부담을 경감할 수 있는 인공지능 기반의 이미지 변환기술을 제공할 수 있다. 또한, 각도와 거리를 달리하는 입력 이미지로부터 학습된 이미지와 유사한 출력 이미지를 깨짐이나 일그러짐 없이 생성할 수 있기 때문에 안면 이미지에 대한 변환의 정확성을 높일 수 있다. According to the present invention, it is possible to generate multiple pieces of image learning data with different angles and distances from a single source image, and deep learning the image learning data to generate a face image transformation model based on a generative adversarial neural network. For this reason, since there is no need to collect multiple pieces of image learning data with different angles and distances, it is possible to provide an artificial intelligence-based image conversion technology that is inexpensive and delicate, and can reduce the user's burden. In addition, since an output image similar to an image learned from an input image having different angles and distances can be generated without breakage or distortion, it is possible to increase the accuracy of conversion for a face image.

따라서, 본 발명의 안면 이미지 변환장치를 이용하여 캐릭터 변환을 통해 영상 통화에 사용하거나 3D나 VR게임에 적용하게 되면 영상 통화의 흥미나 게임의 몰입도를 높이는 것이 가능하다.Therefore, when the facial image conversion device of the present invention is used for video calls through character conversion or applied to 3D or VR games, it is possible to increase interest in video calls or immersion in games.

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 후술되는 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니된다.
도 1은 본 발명의 바람직한 일 실시예에 따른 안면 이미지 변환 장치(100)의 구성 블록도이다.
도 2는 본 발명의 바람직한 일 실시예에 따른 GAN 기반의 안면 이미지 변환 동작을 설명하기 위한 기능 블록도이다.
도 3은 안면 회전 생성모델을 학습시키는 과정에 대한 흐름도이다.
도 4는 안면 회전 생성모델에 의해 좌,우 30도 범위로 회전된 안면 이미지들을 나타내는 도면이다.
도 5는 스케일 변환부에 의해 0.25배로부터 1.5배까지 스케일이 조정된 안면 이미지들을 나타내는 도면이다.
도 6은 본 발명에 따른 생성적 대립 신경망(GAN)의 학습 구조를 나타내는 도면이다.
도 7은 본 발명에 따른 변형 랜드마크(Regressed Landmark)를 이용한 이미지 변환 과정에 대한 흐름도이다. The following drawings attached to this specification illustrate preferred embodiments of the present invention, and serve to further understand the technical spirit of the present invention together with the detailed description of the present invention to be described later, so that the present invention is a matter described in such drawings should not be construed as being limited only to
1 is a block diagram of a facial image conversion apparatus 100 according to a preferred embodiment of the present invention.
2 is a functional block diagram for explaining a GAN-based facial image conversion operation according to an embodiment of the present invention.
3 is a flowchart of a process of learning a face rotation generation model.
4 is a view showing facial images rotated in the range of 30 degrees left and right by the facial rotation generation model.
5 is a view showing facial images scaled from 0.25 times to 1.5 times by the scale converter.
6 is a diagram illustrating a learning structure of a generative alternative neural network (GAN) according to the present invention.
7 is a flowchart of an image conversion process using a regressed landmark according to the present invention.

이하, 본 발명의 실시예를 첨부된 도면들을 참조하여 더욱 상세하게 설명한다. 본 발명의 실시예는 여러 가지 형태로 변형할 수 있으며, 본 발명의 범위가 아래의 실시예들로 한정되는 것으로 해석되어서는 안된다. 본 실시예는 당업계에서 평균적인 지식을 가진자에게 본 발명을 더욱 완전하게 설명하기 위해 제공되는 것이다. 또한, 본 발명의 도면과 명세서에서 특정한 용어들이 사용되었으나, 이는 단지 본 발명을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 진정한 기술적 보호범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다. Hereinafter, embodiments of the present invention will be described in more detail with reference to the accompanying drawings. The embodiments of the present invention may be modified in various forms, and the scope of the present invention should not be construed as being limited to the following examples. This example is provided to more fully explain the present invention to those of ordinary skill in the art. In addition, although specific terms have been used in the drawings and the specification of the present invention, these are used only for the purpose of describing the present invention and are not used to limit the meaning or the scope of the present invention described in the claims. Therefore, it will be understood by those skilled in the art that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the appended claims.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세하게 설명하도록 한다. 첨부된 도면은 축척에 의하여 도시되지 않았으며, 각 도면의 동일한 참조 번호는 동일한 구성 요소를 지칭한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. The accompanying drawings are not drawn to scale, and like reference numbers in each drawing refer to like elements.

도 1은 본 발명의 바람직한 일 실시예에 따른 안면 이미지 변환 장치(100)의 구성 블록도이다. 1 is a block diagram of a facial image conversion apparatus 100 according to a preferred embodiment of the present invention.

본 발명의 안면 이미지 변환장치(100)는 휴대 전화, 태블릿 컴퓨터, 개인용 디지털 비서, 휴대용 음악/비디오 플레이어, 웨어러블 장치 또는 카메라 시스템을 포함하는 기타 전자 장치와 같은 다기능 장치의 일부일 수 있다. 또한, 안면 이미지 변환장치(100)는 독립적인 시스템으로서 모바일 장치, 태블릿 장치, 데스크톱 장치 및 서버 등과 같은 네트워크 저장장치와 같은 네트워크를 통해 다른 전자 장치에 연결될 수 있다. 또한, 안면 이미지 변환장치(100)는 무선 또는 유선 연결을 통해 다른 전자장치에 연결될 수 있다. The facial image converter 100 of the present invention may be part of a multifunctional device such as a mobile phone, a tablet computer, a personal digital assistant, a portable music/video player, a wearable device, or other electronic device including a camera system. In addition, the facial image converting apparatus 100 may be connected to other electronic devices through a network such as a network storage device such as a mobile device, a tablet device, a desktop device, and a server as an independent system. In addition, the facial image conversion device 100 may be connected to another electronic device through a wireless or wired connection.

안면 이미지 변환장치(100)는 프로세서(110)를 포함할 수 있다. 프로세서 (110)는 모바일 장치에서 발견되는 것과 같은 시스템 온 칩일 수 있고 하나 이상의 중앙 처리 장치(CPU), 전용 그래픽 처리 장치(GPU) 또는 둘 다를 포함 할 수 있다. 또한, 프로세서(110)는 동일하거나 상이한 유형의 다중 프로세서를 포함 할 수 있다. The facial image conversion apparatus 100 may include a processor 110 . Processor 110 may be a system-on-a-chip, such as found in mobile devices, and may include one or more central processing units (CPUs), dedicated graphics processing units (GPUs), or both. Also, the processor 110 may include multiple processors of the same or different types.

안면 이미지 변환장치(100)는 메모리(150)를 포함 할 수 있다. 메모리(150)는 프로세서(110)와 함께 디바이스 기능을 수행하기 위해 사용될 수 있는 하나 이상의 상이한 유형의 메모리를 포함 할 수 있다. 예를 들어, 메모리(150)는 임의의 유형의 비 일시적 스토리지를 포함 할 수 있다. 메모리(150)는 실행 중에 이미지 전처리 모듈(160) 및 GAN 모듈(170)을 포함하는 다양한 프로그래밍 모듈을 저장할 수 있다. 이미지 전처리 모듈(160) 및 GAN 모듈(170)은 다른 전자 장치의 메모리를 포함하여 메모리(150) 이외의 메모리에 저장될 수도 있다. 이미지 전처리 모듈(160) 및 GAN 모듈(170)은 일부 실시예에서 별도의 실행 가능한 프로그래밍 모듈을 포함할 수 있지만, 프로그래밍 모듈의 기능은 단일 프로그래밍 모듈로 결합 될 수 있다. The facial image conversion apparatus 100 may include a memory 150 . Memory 150 may include one or more different types of memory that may be used in conjunction with processor 110 to perform device functions. For example, memory 150 may include any type of non-transitory storage. The memory 150 may store various programming modules including the image preprocessing module 160 and the GAN module 170 during execution. The image pre-processing module 160 and the GAN module 170 may be stored in a memory other than the memory 150 including the memory of another electronic device. The image preprocessing module 160 and the GAN module 170 may include separate executable programming modules in some embodiments, but the functions of the programming modules may be combined into a single programming module.

상기 이미지 전처리 모듈(160)은 도 2와 같이, 워핑부(162), 스케일 변환부(164), 랜드마크부(166) 및 변형 랜드마크부(168)를 포함하고, 상기 GAN 모듈(170)은 생성적 대립 신경망(GAN : Generative Adversarial Networks)으로서 생성기(172)와 식별기(174)를 포함한다. 상기 이미지 전처리 모듈(160)은 1장의 소스 이미지를 상,하,좌,우로 회전시켜 워핑(Warping)하거나 스케일을 조정하여 여러 각도의 원근감을 갖는 다수의 안면 이미지 학습데이터를 생성한다. 상기 GAN 모듈(170)은 상기 안면 이미지 학습데이터를 심층학습(딥러닝)하여 학습 완료된 GAN 모듈인 안면 이미지 변환모듈(170')을 완성한다. The image pre-processing module 160 includes a warping unit 162, a scale conversion unit 164, a landmark unit 166, and a deformable landmark unit 168, as shown in FIG. 2, and the GAN module 170 is a Generative Adversarial Networks (GAN) including a generator 172 and an identifier 174 . The image pre-processing module 160 generates a plurality of face image learning data having a perspective of various angles by rotating a single source image up, down, left, and right to warp or adjust the scale. The GAN module 170 completes the facial image conversion module 170', which is a GAN module that has been learned by deep learning (deep learning) of the facial image learning data.

또한, 안면 이미지 변환장치(100)는 하나 이상의 카메라(120)를 포함할 수 있다. 카메라(120)는 이미지 센서, 렌즈 스택 및 이미지를 캡처하는데 사용될 수 있는 다른 구성 요소를 포함할 수 있다. 예를 들어, 카메라(120)는 특정 얼굴의 이미지를 캡처하도록 구성될 수 있다. Also, the facial image conversion apparatus 100 may include one or more cameras 120 . Camera 120 may include image sensors, lens stacks, and other components that may be used to capture images. For example, camera 120 may be configured to capture an image of a particular face.

하나 이상의 실시예에서, 안면 이미지 변환장치(100)는 I/O(입/출력) 디바이스(140)를 포함할 수 있다. 이 I/O 디바이스(140)는 음성 제어 입력을 위한 마이크, 오디오 데이터를 위한 스피커와 같은 임의의 종류의 I/O 장치일 수 있다. 예를 들어, 시각 데이터 입력을 위한 카메라, 시각 데이터 출력을 위한 디스플레이(예를 들어, LCD, LED, 유기 LED(OLED) 등과 같은 임의의 종류의 디스플레이)일 수 있다In one or more embodiments, the facial image converter 100 may include an input/output (I/O) device 140 . This I/O device 140 may be any kind of I/O device, such as a microphone for voice control input, or a speaker for audio data. For example, it can be a camera for visual data input, a display for visual data output (eg, any kind of display such as LCD, LED, organic LED (OLED), etc.)

본 발명에 따른 안면 이미지 변환장치(100)가 전술한 다수의 구성 요소를 포함하는 것으로 도시되었지만, 하나 이상의 실시예에서, 다양한 구성 요소는 분산 시스템의 일부로서 다수의 장치에 걸쳐 분산된다. 또한, 추가 구성 요소를 사용할 수 있으며 구성 요소 중 일부 기능을 결합할 수 있다.Although facial image converter 100 in accordance with the present invention is illustrated as including a number of the components described above, in one or more embodiments, the various components are distributed across multiple devices as part of a distributed system. Additionally, additional components are available and some of the functions of the components can be combined.

도 2는 본 발명의 바람직한 일 실시예에 따른 GAN 기반의 안면 이미지 변환 동작을 설명하기 위한 기능 흐름도이다. 안면 이미지 변환 동작은 크게 GAN 모듈(170)을 훈련시키기 위한 학습데이터를 생성하는 이미지 전처리 과정과, GAN 모듈(170)의 딥러닝을 통해 안면 이미지 변환모듈(170')을 생성하는 GAN 학습과정 및 안면 이미지 변환모듈(170')을 통해 입력 이미지(135)를 학습된 소스 이미지와 비슷한 출력 이미지(180)로 변환하는 과정으로 이루어진다. 2 is a functional flowchart for explaining a GAN-based facial image conversion operation according to an embodiment of the present invention. The facial image conversion operation is largely an image preprocessing process for generating learning data for training the GAN module 170, and a GAN learning process for generating a facial image conversion module 170' through deep learning of the GAN module 170 and It consists of a process of converting the input image 135 into an output image 180 similar to the learned source image through the facial image conversion module 170'.

먼저, 상기 이미지 전처리 과정은 이미지 전처리 모듈(160)에 의해 프로그래밍되는 동작으로서 소스 이미지를 워핑(Warping)하거나 스케일 변환하여 여러 각도와 거리를 갖는 다수의 이미지 학습데이터를 생성하는 과정이다. First, the image pre-processing process is an operation programmed by the image pre-processing module 160 to warp or scale a source image to generate a plurality of image learning data having various angles and distances.

먼저, 이미지 전처리 모듈(160)의 워핑부(Warping unit)(162)는 1장의 소스 이미지(130)에 기반하여 도 4와 같이 좌,우 30도 범위내 및 상,하 20도 범위내로 회전된 여러가지 각도의 워핑 이미지를 생성하는 딥러닝 기반의 안면 회전 생성모델이다. First, the warping unit 162 of the image preprocessing module 160 is rotated within the range of 30 degrees left and right and within the range of 20 degrees up and down as shown in FIG. 4 based on one source image 130. It is a deep learning-based facial rotation generation model that generates warping images of various angles.

도 3은 안면 회전 생성모델을 학습시키는 과정에 대한 흐름도이다. 도 3을 참조하면, 우선 소스 이미지(130)를 24fps씩 쪼갠다. T+1초의 이미지(162a)는 컨볼루션 네트워크(162b)를 통과시켜 20개의 히트맵(Heatmap)(162c)을 얻는다. 이 20개의 히트맵(162c)은 FC 레이어(Fully-Connected Layer)(162d)를 거쳐 Affine 변환식의 계수들(162e)을 얻는다. 그리고, 이 Affine 변환식과 T초의 이미지의 픽셀들(Pxi, Pyi, i=1~N, N개의 픽셀 개수)을 각각 곱해 T+1초일 때의 변환 예측 값(162f)을 얻을 수 있다. 도 3의 162g는 T초의 이미지 프레임을 나타내고, 162h는 상기 T초의 이미지 프레임(162g)과 Affine 변환식의 곱을 통해 얻어진 예측 이미지 프레임이다. 이렇게 순차적으로 이미지를 학습시키는 것에 의해 학습된 Affine 변환식 네트워크인 안면 회전 생성모델을 얻을 수 있다. 3 is a flowchart of a process of learning a face rotation generation model. Referring to FIG. 3 , first, the source image 130 is divided by 24 fps. The T+1 second image 162a is passed through the convolutional network 162b to obtain 20 heatmaps 162c. The 20 heatmaps 162c pass through the FC layer (Fully-Connected Layer) 162d to obtain coefficients 162e of the Affine transform equation. Then, by multiplying the Affine transformation equation by the pixels (Pxi, Pyi, i=1 to N, N number of pixels) of the image for T seconds, respectively, a transformation prediction value 162f at T+1 seconds can be obtained. 3, 162g indicates an image frame of T seconds, and 162h indicates a predicted image frame obtained by multiplying the T-second image frame 162g and the Affine transform equation. By sequentially learning the images in this way, it is possible to obtain a face rotation generation model, which is a trained Affine transformation network.

상기 학습된 안면 회전 생성모델은 1장의 소스 이미지(130)로부터 도 4와 같이 좌로 30도, 우로 30도 범위내에서 회전된 다수의 안면 이미지 학습데이터를 생성한다. 이와 동일한 원리로 상기 학습된 안면 회전 생성모델은 위로 20도, 아래로 20도 범위내에서 회전된 다수의 안면 이미지도 생성할 수 있다. The learned face rotation generation model generates a plurality of face image training data rotated within the range of 30 degrees to the left and 30 degrees to the right as shown in FIG. 4 from one source image 130 . According to the same principle, the learned face rotation generation model may also generate a plurality of face images rotated within the range of 20 degrees upward and 20 degrees downward.

상기 안면 이미지 학습데이터의 회전 각도가 좌,우 30도 범위와 상,하 20도 범위를 초과하게 되면, 이미지가 심하게 일그러질 수 있다. When the rotation angle of the face image learning data exceeds the range of 30 degrees left and right and the range of 20 degrees up and down, the image may be severely distorted.

이렇게 워핑부(162)에 의해 여러 각도로 회전된 안면 이미지 학습데이터는 스케일 변환부(164)에 의해 스케일(scale)이 조정된다. 즉, 스케일 변환부(164)는 여러 각도로 회전된 안면 이미지 학습데이터를 도 5와 같이 0.25배로부터 1.5배까지 그 스케일을 조정할 수 있다. 이때, 스케일 변환부(164)는 도 5와 같이 이미지의 사이즈는 고정시키고, 안면 크기만을 조정하여 안면 이미지에 원근감을 부여한다. In this way, the face image learning data rotated at various angles by the warping unit 162 is scaled by the scale conversion unit 164 . That is, the scale conversion unit 164 can adjust the scale of the facial image learning data rotated at various angles from 0.25 times to 1.5 times as shown in FIG. 5 . In this case, the scale converter 164 fixes the size of the image as shown in FIG. 5 , and adjusts only the size of the face to give perspective to the face image.

이렇게 워핑부(162)와 스케일 변환부(164)에 의해 회전 각도와 안면 스케일이 조정된 안면 이미지 학습데이터(I_o)는 랜드마크부(166)에 의해 랜드마크 학습데이터(I_L)로 변환된다. 여기서, 랜드마크(landmark)는 안면 검출 영역 내에서 객체의 각 특징(feature)을 대표하는 특징점(feature point)을 나타내는 것으로서, 예를 들어, 사람의 얼굴을 대표하는 특징인 눈, 코, 입, 및 눈썹 등에 대응하는 특징점(point)일 수 있다. 랜드마크는 객체의 각 특징에 대해 적어도 하나가 설정될 수 있고, 하나의 특징에 복수개 설정될 수도 있다. 이렇게, 얼굴 검출 영역 내의 객체에 대해 설정된 특징의 개수 및 각 특징에 대한 랜드마크의 개수는 필요에 따라 지정될 수 있다.In this way, the facial image learning data (I _o ) whose rotation angle and the facial scale are adjusted by the warping unit 162 and the scale converting unit 164 is converted into the landmark learning data (I _L ) by the landmark unit 166 . do. Here, the landmark indicates a feature point representing each feature of the object within the face detection area, for example, eyes, nose, mouth, and a feature point corresponding to an eyebrow or the like. At least one landmark may be set for each feature of the object, or a plurality of landmarks may be set for one feature. In this way, the number of features set for the object in the face detection area and the number of landmarks for each feature may be designated as needed.

다음으로, 상기 GAN 학습과정은 이미지 전처리 모듈(160)에 의해 워핑되거나 스케일이 조정된 상기 안면 이미지 학습데이터(I_L, I_O)를 GAN 모듈(170)을 통해 심층 학습함으로써 안면 이미지 변환모듈(170')을 생성하는 과정이다. Next, the _GAN learning process is a facial _image conversion module ( 170') is created.

상기 GAN 모듈(170)은 생성기(Generator)(172)와 식별기(Discriminator)(174)로 이루어지는 생성적 대립 신경망(Generative Adversarial Networks)이다. The GAN module 170 is a generative adversarial network comprising a generator 172 and a discriminator 174 .

도 6은 본 발명에 따른 생성적 대립 신경망(GAN)의 학습 구조를 나타내는 도면이다. 6 is a diagram illustrating a learning structure of a generative alternative neural network (GAN) according to the present invention.

도 6을 참조하면, GAN 모듈(170)은 상기 랜드마크 학습데이터(I_L)를 생성기(172)의 초기 입력으로 하고, 회전 각도와 안면 스케일이 조정된 상기 안면 이미지 학습데이터(I_o)를 스타일 인코딩한 데이터의 행렬 산술연산 곱(Element-wise multiplication)을 생성기(172)의 심층 신경망의 네트워크 레이어별로 입력하여 생성 이미지를 생성하고, 이를 식별기(174)로 하여금 판별하게 함으로써 생성모델과 판별모델이 경쟁하면서 심층학습하여 학습 완료된 GAN 모듈인 안면 이미지 변환모듈(170')을 완성한다. Referring to FIG. 6 , the GAN module 170 takes the landmark learning data I _L as an initial input of the generator 172, and the facial image learning data I _o with the rotation angle and facial scale adjusted. By inputting element-wise multiplication of the style-encoded data for each network layer of the deep neural network of the generator 172 to generate a generated image, and having the identifier 174 discriminate it, the generative model and the discrimination model In this competition, deep learning completes the facial image conversion module 170', which is a GAN module that has been learned.

보다 상세하게, 심층 신경망 네트워크의 레이어별로 스타일 인코딩을 통해 나온 특징(F)의 시그모이드(Sigmoid) 함수[M(F)]와 상기 안면 이미지 학습데이터(I_o)를 심층 신경망 네트워크의 레이어를 거쳐 나온 특징(F')을 행렬 산술연산 곱(Element-wise multiplication)을 하여 상기 랜드마크 학습데이터(I_L)와 함께 생성기(172)의 심층 신경망 네트워크의 레이어를 거치게 함으로써 식별기(174)에 의해 판별할 이미지 데이터를 생성한다. In more detail, the sigmoid function [M(F)] of the feature (F) obtained through style encoding for each layer of the deep neural network and the facial image training data (I _o ) are combined with the layers of the deep neural network. By performing matrix arithmetic operation (Element-wise multiplication) on the passed feature (F') and passing it through the layer of the deep neural network of the generator 172 together with the landmark learning data (I _L ), by the identifier 174 Generate image data to be discriminated.

이때, 스타일 인코딩을 하지 않은 안면 이미지 학습데이터(I_o)와 랜드마크 학습데이터(I_L)를 합쳐서 심층 학습하는 경우 이미지 데이터가 심하게 일그러질 수 있다. 따라서, 본 발명의 GAN 모듈(170)의 생성기(172)는 스타일 인코딩을 통해 안면 이미지에서 특징이 뚜렷한 부분만을 취함으로써 이미지 데이터의 일그러짐을 보상할 수 있다. In this case, when deep learning is performed by combining the facial image learning data (I _o ) and landmark learning data (I _L ) without style encoding, the image data may be severely distorted. Accordingly, the generator 172 of the GAN module 170 of the present invention can compensate for distortion of image data by taking only a portion with distinct features from the facial image through style encoding.

마지막으로, 상기 이미지 변환과정은 상기 학습 완료된 GAN 모듈인 안면 이미지 변환모듈(170')에 변환하고자 하는 입력 이미지(135)를 입력하여 학습된 상기 소스 이미지(130)와 유사한 출력 이미지(180)로 변환하는 과정이다. Finally, in the image conversion process, the input image 135 to be converted is input to the facial image conversion module 170 ', which is the learned GAN module, and the learned source image 130 is similar to the output image 180. It is a process of transformation.

이때, 안면 이미지 변환모듈(170')에 입력 이미지(135)의 랜드마크(136)를 그대로 입력하게 되면, 학습한 이미지와 다르기 때문에 원하는 출력 이미지(180)를 생성할 수 없다. 따라서, 입력 이미지(135)를 이미지 전처리 모듈(160)의 변형 랜드마크부(168)를 통해 변형된 랜드마크(137)로 생성하고, 이 변형된 랜드마크(137)를 안면 이미지 변환모듈(170')에 입력한다. At this time, if the landmark 136 of the input image 135 is directly input to the facial image conversion module 170 ′, the desired output image 180 cannot be generated because it is different from the learned image. Therefore, the input image 135 is generated as a landmark 137 deformed through the deformable landmark unit 168 of the image pre-processing module 160, and this deformed landmark 137 is converted into the facial image conversion module 170 ') is entered.

도 7은 본 발명에 따른 변형 랜드마크(137)를 이용한 이미지 변환 과정에 대한 흐름도이다. 7 is a flowchart of an image conversion process using the deformable landmark 137 according to the present invention.

도 7을 참조하면, 상기 변형 랜드마크(137)는 학습된 안면 이미지(145)의 랜드마크(눈, 눈썹, 코, 입, 턱선)의 위치평균과 분산을 구하고, 이 위치평균과 분산에 기반하여 상기 입력 이미지(135)의 랜드마크(136)의 위치의 평균과 분산을 조정한 것이다. Referring to FIG. 7 , the deformation landmark 137 obtains the position average and variance of the landmarks (eyes, eyebrows, nose, mouth, and jaw line) of the learned facial image 145, and based on the position average and variance Thus, the average and variance of the positions of the landmarks 136 of the input image 135 are adjusted.

이렇게 입력 이미지(135)에 대한 변형 랜드마크(137)를 안면 이미지 변환모듈(170')에 입력하는 것에 의해 원하는 소스 이미지와 유사한 출력 이미지(180)를 생성할 수 있다. In this way, by inputting the modified landmark 137 for the input image 135 into the facial image conversion module 170 ′, it is possible to generate an output image 180 similar to a desired source image.

본 발명의 일 실시예에 따른 안면 이미지 변환장치(100)의 동작방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. The operating method of the facial image conversion apparatus 100 according to an embodiment of the present invention may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination.

상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

또한, 개시된 실시예들에 따른 안면 이미지 변환장치 또는 안면 이미지 변환 동작은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. In addition, the facial image conversion apparatus or the facial image conversion operation according to the disclosed embodiments may be provided by being included in a computer program product. Computer program products may be traded between sellers and buyers as commodities.

컴퓨터 프로그램 제품은 S/W 프로그램, S/W 프로그램이 저장된 컴퓨터로 읽을 수 있는 저장 매체를 포함할 수 있다. 예를 들어, 컴퓨터 프로그램 제품은 전자 장치의 제조사 또는 전자 마켓(예, 구글 플레이 스토어, 앱 스토어)을 통해 전자적으로 배포되는 S/W 프로그램 형태의 상품(예, 다운로더블 앱)을 포함할 수 있다. 전자적 배포를 위하여, S/W 프로그램의 적어도 일부는 저장 매체에 저장되거나, 임시적으로 생성될 수 있다. 이 경우, 저장 매체는 제조사의 서버, 전자 마켓의 서버, 또는 SW 프로그램을 임시적으로 저장하는 중계 서버의 저장매체가 될 수 있다.The computer program product may include a S/W program and a computer-readable storage medium in which the S/W program is stored. For example, computer program products may include products (eg, downloadable apps) in the form of S/W programs distributed electronically through manufacturers of electronic devices or electronic markets (eg, Google Play Store, App Store). there is. For electronic distribution, at least a portion of the S/W program may be stored in a storage medium or may be temporarily generated. In this case, the storage medium may be a server of a manufacturer, a server of an electronic market, or a storage medium of a relay server temporarily storing a SW program.

컴퓨터 프로그램 제품은, 서버 및 클라이언트 장치로 구성되는 시스템에서, 서버의 저장매체 또는 클라이언트 장치의 저장매체를 포함할 수 있다. 또는, 서버 또는 클라이언트 장치와 통신 연결되는 제3의 장치(예, 스마트폰)가 존재하는 경우, 컴퓨터 프로그램 제품은 제3의 장치의 저장매체를 포함할 수 있다. 또는, 컴퓨터 프로그램 제품은 서버로부터 클라이언트 장치 또는 제3의 장치로 전송되거나, 제3의 장치로부터 클라이언트 장치로 전송되는 S/W 프로그램 자체를 포함할 수 있다. 이 경우, 서버, 클라이언트 장치 및 제3의 장치 중 하나가 컴퓨터 프로그램 제품을 실행하여 개시된 실시예들에 따른 방법을 수행할 수 있다. 또는, 서버, 클라이언트 장치 및 제3의 장치 중 둘 이상이 컴퓨터 프로그램 제품을 실행하여 개시된 실시예들에 따른 방법을 분산하여 실시할 수 있다. The computer program product, in a system consisting of a server and a client device, may include a storage medium of the server or a storage medium of the client device. Alternatively, when there is a third device (eg, a smart phone) that is communicatively connected to the server or client device, the computer program product may include a storage medium of the third device. Alternatively, the computer program product may include the S/W program itself transmitted from the server to the client device or a third party device, or transmitted from the third device to the client device. In this case, one of the server, the client device, and the third device may execute the computer program product to perform the method according to the disclosed embodiments. Alternatively, two or more of a server, a client device, and a third device may execute a computer program product to distribute the method according to the disclosed embodiments.

예를 들면, 서버(예로, 클라우드 서버 또는 인공 지능 서버 등)가 서버에 저장된 컴퓨터 프로그램 제품을 실행하여, 서버와 통신 연결된 클라이언트 장치가 개시된 실시예들에 따른 방법을 수행하도록 제어할 수 있다.For example, a server (eg, a cloud server or an artificial intelligence server) may execute a computer program product stored in the server to control a client device communicatively connected with the server to perform the method according to the disclosed embodiments.

이상에서 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 이것에 의해 한정되지 않으며 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 본 발명의 기술사상과 아래에 기재될 특허청구범위의 균등범위 내에서 다양한 수정 및 변형이 가능함은 물론이다.In the above, although the present invention has been described with reference to limited embodiments and drawings, the present invention is not limited thereto and will be described below with the technical idea of the present invention by those of ordinary skill in the art to which the present invention pertains. Of course, various modifications and variations are possible within the scope of equivalents of the claims.

100 : 안면 이미지 변환장치, 110 : 프로세서, 120 : 카메라, 130 : 소스 이미지, 135 : 입력 이미지, 136 : 랜드마크, 137 : 변형 랜드마크, 140 : I/O 디바이스, 150 : 메모리, 160 : 이미지 전처리 모듈, 162 : 워핑부, 164 : 스케일 변환부, 166 : 랜드마크부, 170 : GAN 모듈, 170' : 안면 이미지 변환모듈, 172 : 생성기, 174 : 식별기, 180 : 출력 이미지100: facial image converter, 110: processor, 120: camera, 130: source image, 135: input image, 136: landmark, 137: deformed landmark, 140: I/O device, 150: memory, 160: image Pre-processing module, 162: warping unit, 164: scale conversion unit, 166: landmark unit, 170: GAN module, 170': facial image conversion module, 172: generator, 174: identifier, 180: output image

Claims

A first step of generating a plurality of face image training data having various angles and distances by warping and scaling one source image;
a second step of generating a facial image conversion module by deep learning the facial image learning data based on a generative adversarial neural network (GAN); and
GAN-based facial image conversion method comprising a third step of generating an output image similar to the learned image by inputting an input image to be converted into the facial image conversion module.

According to claim 1, wherein the first step,
a warping step of generating a plurality of warped images rotated at a constant angle up, down, left, and right based on the one source image;
GAN-based facial image conversion method comprising a; scale conversion step of adjusting the scale of the facial image of the warping image.

3. The method of claim 2,
The warping image is a GAN-based facial image conversion method, characterized in that the source image is rotated within a range of 20 degrees up and down, and within a range of 30 degrees left and right.

According to claim 3, wherein the warping step,
Affine transformation network is obtained by generating several heatmaps from the source image, obtaining coefficients of the Affine transformation equation by passing the heatmap through the FC layer, and deep learning the image sequentially based on the Affine transformation equation. GAN-based facial image conversion method, characterized in that generating the warping image through a transformation network.

3. The method of claim 2,
GAN-based facial image conversion method, characterized in that the scale is adjusted from 0.25 times to 1.5 times.

According to claim 1, wherein the second step,
converting the warping and scale-converted facial image learning data into landmark learning data;
calculating an arithmetic operation product of the style-encoded data of the facial image learning data;
By taking the landmark learning data as the initial input of the generator of the GAN module, and inputting the arithmetic operation product for each network layer of the generator to make the identifier of the GAN module discriminate, the generating model and the discriminating model compete with each other and deep learning by GAN-based facial image conversion method comprising the step of generating the facial image conversion module.

The method of claim 6, wherein the calculating of the product of the calculation operation comprises:
GAN-based facial, characterized in that the sigmoid function of the feature output through the style encoding and the feature that the facial image learning data passes through the network layer are multiplied by matrix arithmetic operation (Element-wise multiplication) How to convert an image.

According to claim 1, wherein the third step,
generating a deformation landmark of the input image to be converted;
GAN-based facial image conversion method comprising the step of generating, by the facial image conversion module, an output image similar to the learned image based on the deformation landmark.

9. The method of claim 8,
The modified landmark obtains the position average and variance of the landmarks of the learned image, and based on the position average and variance, the average and variance of the positions of the landmarks of the input image are adjusted based on the GAN-based How to convert facial image.

A device for training facial image transformation behavior, comprising:
one or more processors; and
a memory coupled to the one or more processors comprising one or more program modules executable by the one or more processors to convert an input image into an output image similar to a learned image;
The program module is
an image pre-processing module for generating a plurality of face image learning data with different angles and distances from one source image by rotating the source image at a constant angle up, down, left, and right and adjusting the scale of the image;
An apparatus for training a facial image transformation operation, characterized in that it includes a generative adversarial neural network (GAN)-based GAN module for deep learning while a generative model and a discrimination model compete for the plurality of facial image learning data.

11. The method of claim 10, wherein the image pre-processing module,
a warping unit generating a plurality of warping images rotated at a predetermined angle up, down, left, and right based on the one source image;
An apparatus for training a facial image transformation operation comprising a; a scale transformation unit for adjusting the scale of the facial image of the warping image.

The method of claim 11, wherein the warping unit,
It is an Affine transformation network generated by generating several heatmaps from the source image, obtaining coefficients of the Affine transformation equation through the FC layer, and deep learning the image sequentially based on the Affine transformation equation A device that trains facial image transformation behavior with

12. The method of claim 11,
The GAN module includes a generator and an identifier,
The landmark learning data generated from the warping and scale-converted facial image learning data is an initial input to the generator, and the arithmetic operation product of the style-encoded data of the facial image learning data is input for each network layer of the generator. A device for training facial image transformation behavior, characterized in that it is a generative adversarial neural network that deep learns while the generative model and the discriminant model compete by making the discriminator discriminate.

14. The method of claim 13,
The arithmetic operation product is a matrix arithmetic operation multiplication (Element-wise multiplication) of the sigmoid function of the feature obtained through the style encoding and the feature that the facial image learning data has passed through the network layer, characterized in that A device that trains facial image transformation movements.

A facial image converter that converts an input image into an output image similar to a learned image,
a memory storing one or more instructions; and
a processor executing the one or more instructions stored in the memory;
The processor is
A first step of generating a plurality of face image training data having various angles and distances by warping and scaling one source image,
A second step of generating a facial image conversion module by deep learning the facial image learning data based on a generative alternative neural network (GAN), and
Facial image conversion device, characterized in that executing a process comprising a third step of generating an output image similar to the learned source image by inputting an input image to be converted into the face image conversion module.

The method of claim 15, wherein the first step comprises:
a warping step of generating a plurality of warped images rotated at a constant angle up, down, left, and right based on the one source image;
Scale conversion step of adjusting the scale of the facial image of the warping image; Facial image conversion device comprising a.

The method of claim 16, wherein the warping step comprises:
Affine transformation network is obtained by generating several heatmaps from the source image, obtaining coefficients of the Affine transformation equation by passing the heatmap through the FC layer, and deep learning the image sequentially based on the Affine transformation equation. Facial image conversion device, characterized in that for generating the warping image through a conversion type network.

The method of claim 15, wherein the second step comprises:
converting the warping and scale-converted facial image learning data into landmark learning data;
calculating an arithmetic operation product of the style-encoded data of the facial image learning data;
By taking the landmark learning data as the initial input of the generator of the GAN module, and inputting the arithmetic operation product for each network layer of the generator to make the identifier of the GAN module discriminate, the generating model and the discriminating model compete with each other and deep learning by Facial image conversion device comprising the step of generating the face image conversion module.

19. The method of claim 18, wherein the step of calculating the calculation operation product,
Facial image conversion apparatus, characterized in that the sigmoid function of the feature output through the style encoding and the feature that the facial image learning data passes through the network layer are multiplied by matrix arithmetic operation (Element-wise multiplication) .

16. The method of claim 15, wherein the third step comprises:
generating a deformation landmark of the input image to be converted;
Facial image conversion device, characterized in that it comprises the step of generating, by the facial image conversion module, an output image similar to the learned image based on the deformation landmark.

21. The method of claim 20,
The modified landmark obtains the position average and variance of the landmarks of the learned image, and based on the position average and variance, adjusts the average and variance of the positions of the landmarks of the input image. Device.