KR102525181B1

KR102525181B1 - System for correcting image and image correcting method thereof

Info

Publication number: KR102525181B1
Application number: KR1020190130282A
Authority: KR
Inventors: 조영주; 박종열; 배유석
Original assignee: 한국전자통신연구원
Priority date: 2019-03-06
Filing date: 2019-10-18
Publication date: 2023-04-25
Also published as: KR20200107742A

Abstract

본 발명의 이미지 수정 방법은 원본 이미지에 대해 전처리 과정을 수행하여 상기 원본 이미지 내에서 지워진 영역만을 포함하는 마스크(mask) 이미지를 생성하는 단계; 생성적 적대 신경망(Generative Adversarial Networks)을 이용하여 상기 마스크 이미지 내에서 상기 지워진 영역에 합성될 이미지를 예측하는 단계; 및 상기 예측된 이미지를 상기 원본 이미지 내에서 상기 지원진 영역에 합성하여 새로운 이미지를 생성하는 단계를 포함한다.An image correction method of the present invention includes the steps of performing a pre-processing process on an original image to generate a mask image including only erased areas in the original image; predicting an image to be synthesized into the erased region in the mask image using generative adversarial networks; and generating a new image by combining the predicted image with the original image in the original image.

Description

Image correction system and image correction method thereof {SYSTEM FOR CORRECTING IMAGE AND IMAGE CORRECTING METHOD THEREOF}

본 발명은 이미지를 수정하기 위한 기술에 관한 것으로, 더욱 상세하게는 얼굴 이미지를 수정하기 위한 기술에 관한 것이다.The present invention relates to techniques for modifying images, and more particularly to techniques for modifying facial images.

최근 SNS 를 통해 이미지와 같은 다양한 정보를 공유하는 사람들이 증가하는 추세이고, 이러한 추세에 따라, 이미지 편집 프로그램이나 어플리케이션에 대한 관심이 높아지고 있다.Recently, the number of people who share various information such as images through SNS is increasing, and according to this trend, interest in image editing programs or applications is increasing.

종래의 이미지 편집 프로그램(또는 이미지 편집 툴)이나 이와 관련된 어플리케이션은 주로 이미지의 픽셀 값을 조절하는 방식으로 이미지를 수정한다. 사실적으로 수정된 이미지는 이미지 편집 프로그램을 다루는 사용자의 숙련도에 따라 결정된다. A conventional image editing program (or image editing tool) or an application related thereto mainly modifies an image by adjusting pixel values of the image. Realistically corrected images are determined by the user's proficiency in handling image editing programs.

따라서, 이미지 편집 프로그램에 대해 전문적인 지식이나 경험이 적은 일반 사용자가 이미지 편집 프로그램을 이용하여 이미지를 수정할 경우, 그 결과물은 사실적이지 못하고, 어색한 이미지일 가능성이 높다.Therefore, when a general user with little professional knowledge or experience in image editing programs modifies an image using an image editing program, the result is likely to be an unrealistic and awkward image.

이미지 편집과 관련해, 이미지에 대해 3D 모델링을 수행하여 획득한 3D 모델을 엔진(이하, 3D 모델 엔진)을 이용하여 편집하는 방식도 있다. 그러나, 이러한 방식은 3D 모델을 구현하는 3D 모델 엔진이 필요하며 이러한 3D 모델 엔진은 다양하고, 사용하는 3D 모델 엔진마다 결과물의 완성도나 완성에 필요한 지식이 천차만별로 다르다.Regarding image editing, there is also a method of editing a 3D model obtained by performing 3D modeling on an image using an engine (hereinafter referred to as a 3D model engine). However, this method requires a 3D model engine that implements a 3D model, and these 3D model engines are diverse, and the degree of completion of the result or the knowledge required for completion varies greatly depending on the 3D model engine used.

또한 각 3D 모델 엔진에 대한 사용법을 사용자가 충분히 숙지하고 있어야 하는 한계가 존재한다. 즉, 기존의 방법들은 단순한 이미지 입력만으로 이미지를 사실적으로 수정하는 것이 매우 어렵다.In addition, there is a limit that the user must be fully aware of how to use each 3D model engine. That is, in the existing methods, it is very difficult to realistically modify an image only with a simple image input.

본 발명은 얼굴 이미지와 사용자 입력을 이용하여 사전에 학습시킨 신경망을 이용하여, 기존 방식에 비해 쉽고 빠르게 사실적인 합성 이미지를 제공할 수 있는 얼굴 이미지 수정 시스템 및 그 방법을 제공하는 데 목적이 있다.An object of the present invention is to provide a facial image correction system and method capable of providing a realistic composite image more easily and quickly than conventional methods using a neural network trained in advance using facial images and user input.

즉, 본 발명에서는 사용자가 쉽고 직관적으로 얼굴 이미지를 수정하고 사실적인 결과물을 얻을 수 있도록 하는 시스템과 방법을 제시한다. 사실적인 결과물을 얻기 위하여 전문적인 지식이나 경험을 필요로 하는 것이 아닌 누구나 쉽게 얼굴 이미지를 수정하도록 하는 것이 본 발명의 목적이다.That is, the present invention proposes a system and method that allow a user to easily and intuitively modify a face image and obtain a realistic result. It is an object of the present invention to enable anyone to easily modify a face image without requiring specialized knowledge or experience to obtain a realistic result.

본 발명의 전술한 목적들 및 그 이외의 목적과 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부된 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. The foregoing and other objects, advantages and characteristics of the present invention, and methods of achieving them will become clear with reference to embodiments described below in detail in conjunction with the accompanying drawings.

상술한 목적을 달성하기 위한 본 발명의 일측면에 따른 이미지 수정 방법은 원본 이미지에 대해 전처리 과정을 수행하여 상기 원본 이미지 내에서 지워진 영역만을 포함하는 마스크(mask) 이미지를 생성하는 단계; 생성적 적대 신경망(Generative Adversarial Networks)을 이용하여 상기 마스크 이미지 내에서 상기 지워진 영역에 합성될 이미지를 예측하는 단계; 및 상기 예측된 이미지를 상기 원본 이미지 내에서 상기 지원진 영역에 합성하여 새로운 이미지를 생성하는 단계를 포함한다.An image correction method according to an aspect of the present invention for achieving the above object includes performing a preprocessing process on an original image to generate a mask image including only erased areas in the original image; predicting an image to be synthesized into the erased region in the mask image using generative adversarial networks; and generating a new image by combining the predicted image with the original image in the original image.

본 발명의 다른 측면에 따른 이미지 수정 시스템은, 원본 이미지에 대해 전처리 과정을 수행하여 상기 원본 이미지 내에서 사용자 입력에 의해 지워진 영역만을 포함하는 마스크(mask) 이미지, 상기 사용자 입력에 의해 상기 지워진 영역에 스케치된 모양만을 포함하는 스케치 이미지 및 상기 사용자 입력에 의해 상기 지워진 영역에 칠해진 색깔만을 포함하는 색깔 이미지를 생성하는 전처리부;An image retouching system according to another aspect of the present invention performs a pre-processing process on an original image to obtain a mask image including only a region erased by a user input in the original image, and a region erased by the user input. a pre-processing unit generating a sketch image including only the sketched shape and a color image including only the color applied to the erased area by the user input;

생성적 적대 신경망(Generative Adversarial Networks)을 이용하여, 상기 마스크 이미지, 상기 스케치 이미지 및 상기 색깔 이미지로부터 상기 지워진 영역에 합성될 이미지를 예측하고, 상기 예측된 이미지를 상기 지워진 영역에 합성하여 상기 원본 이미지로부터 새로운 이미지를 생성하는 이미지 생성부; 및 상기 새로운 이미지를 표시하는 표시부를 포함한다. Using generative adversarial networks, an image to be synthesized in the erased area is predicted from the mask image, the sketch image, and the color image, and the predicted image is synthesized in the erased area to generate the original image. an image generating unit generating a new image from; and a display unit displaying the new image.

본 발명의 또 다른 측면에 따른 이미지 수정 방법은, 서로 적대적인 관계에 있는 생성적 신경망과 판별적 신경망으로 구성된 생성적 적대 신경망(Generative Adversarial Networks)을 학습시키는 단계; 학습이 완료된 상기 생성적 신경망을 저장부에 저장하는 단계; 전처리 과정을 통해 원본 이미지로부터 상기 원본 이미지 내에서 사용자 입력에 의해 지워진 영역만을 포함하는 마스크(mask) 이미지, 상기 사용자 입력에 의해 상기 지워진 영역에 스케치된 모양만을 포함하는 스케치 이미지 및 상기 사용자 입력에 의해 상기 지워진 영역에 칠해진 색깔만을 포함하는 색깔 이미지를 생성하는 단계; 상기 저장부에 저장된 생성적 신경망을 이용하여, 상기 마스크 이미지, 상기 스케치 이미지 및 상기 색깔 이미지로부터 상기 지워진 영역에 합성될 이미지를 예측하는 단계; 및 상기 예측된 이미지를 상기 지워진 영역에 합성하여 상기 원본 이미지로부터 새로운 이미지를 생성하는 단계를 포함한다.An image correction method according to another aspect of the present invention includes training a generative adversarial network composed of a generative neural network and a discriminant neural network in an adversarial relationship with each other; storing the generative neural network for which learning has been completed in a storage unit; From the original image through a pre-processing process, a mask image including only the area erased by the user input in the original image, a sketch image including only the shape sketched in the area erased by the user input, and the user input generating a color image containing only the color painted on the erased area; predicting an image to be synthesized in the erased area from the mask image, the sketch image, and the color image using a generative neural network stored in the storage unit; and generating a new image from the original image by combining the predicted image with the erased region.

본 발명에 의하면, 적대적 신경망을 이용하여 얼굴 이미지를 수정함으로써, 별도의 이미지 툴에 대한 전문적인 지식이나 경험 없이 사용자가 원하는 방식으로 얼굴 이미지를 빠르고 간편하게 할 수 있다.According to the present invention, by modifying a face image using an adversarial neural network, a face image can be quickly and conveniently created in a manner desired by a user without specialized knowledge or experience of a separate image tool.

기존의 이미지 수정 프로그램은 얼굴을 가늘게 만들거나 눈을 키우거나 하기 위하여 각각에 해당하는 툴이 따로 존재하고 이를 잘 활용하기 위해서는 사용자의 많은 경험이 필요했다. Existing image correction programs have separate tools for making the face thinner or the eyes bigger, and a lot of user experience is required to use them well.

하지만, 본 발명에서 제공하는 시스템은 별도의 이미지 툴이 필요하지 않고, 마스크, 스케치 또는 색깔을 입력 정보로 사용하여 사용자가 원하는 방향으로 얼굴 이미지를 수정할 수 있다.However, the system provided by the present invention does not require a separate image tool, and can modify a face image in a desired direction by using a mask, sketch, or color as input information.

도 1은 본 발명의 실시 예에 따른 얼굴 이미지 수정 시스템의 블록도.
도 2는 도 1에 도시한 전처리부로부터 이미지 생성부로 입력되는 다수의 입력 이미지를 설명하기 위한 도면.
도 3 내지 도 7은 본 발명의 실시 예에 따른 사용자 인터페이스의 화면 구성을 나타내는 도면들.
도 8은 본 발명에 적용되는 생성적 적대 신경망(GANs)의 전체 네트워크 구조를 나타내는 도면.
도 9는 본 발명의 실시 예에 따른 이미지 수정 방법을 보여주는 흐름도.
도 10 내지 12는 본 발명의 이미지 수정 방법에 따라 원본 이미지를 수정한 결과 이미지들의 예시한 도면들.1 is a block diagram of a face image correction system according to an embodiment of the present invention;
2 is a diagram for explaining a plurality of input images input from the pre-processing unit shown in FIG. 1 to the image generating unit;
3 to 7 are diagrams illustrating a screen configuration of a user interface according to an embodiment of the present invention.
8 is a diagram showing the overall network structure of generative adversarial networks (GANs) applied to the present invention.
9 is a flowchart illustrating an image correction method according to an embodiment of the present invention.
10 to 12 are diagrams illustrating results of correcting an original image according to the image correction method of the present invention.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시 예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시예들은 다양한 형태로 실시될 수 있으며 본 명세서에 설명된 실시예들에 한정되지 않는다.Specific structural or functional descriptions of the embodiments according to the concept of the present invention disclosed in this specification are only illustrated for the purpose of explaining the embodiments according to the concept of the present invention, and the embodiments according to the concept of the present invention It can be embodied in many forms and is not limited to the embodiments described herein.

본 발명의 개념에 따른 실시예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시예들을 특정한 개시형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Embodiments according to the concept of the present invention can apply various changes and can have various forms, so the embodiments are illustrated in the drawings and described in detail herein. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosures, and includes modifications, equivalents, or substitutes included in the spirit and scope of the present invention.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만, 예를 들어 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another component, for example, without departing from the scope of rights according to the concept of the present invention, a first component may be named a second component, Similarly, the second component may also be referred to as the first component.

본 명세서에서 사용한 용어는 단지 특정한 실시예들을 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this specification are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, but one or more other features or numbers, It should be understood that the presence or addition of steps, operations, components, parts, or combinations thereof is not precluded.

이하, 실시 예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 특허출원의 범위가 이러한 실시 예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, the scope of the patent application is not limited or limited by these examples. Like reference numerals in each figure indicate like elements.

도 1은 본 발명의 실시 예에 따른 이미지 수정 시스템의 블록도이다.1 is a block diagram of an image correction system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시 예에 따른 이미지 수정 시스템(100)는 이미지 수정 프로그램과 같은 툴에 대한 전문적인 지식이나 충분한 경험이 없는 사용자가 이미지를 원하는 방식으로 빠르고 간편하게 수정 할 수 있다.Referring to FIG. 1 , the image correction system 100 according to an embodiment of the present invention can quickly and conveniently modify an image in a desired manner by a user without professional knowledge or sufficient experience in tools such as an image correction program.

본 실시 예에 따른 이미지 수정 시스템이 얼굴 이미지를 수정하는 것으로 한정한다. 따라서, 이하에서는 이미지 수정 시스템을 '얼굴 이미지 수정 시스템'으로 지칭한다. 그러나 본 발명이 얼굴 이미지 수정에 제한적으로 적용되는 것은 아니며, 차량 디자인, 건축 디자인, 가전 제품 디자인 등과 관련된 모든 종류의 이미지 수정에 적용될 수 있다.The image correction system according to the present embodiment is limited to correcting a face image. Therefore, hereinafter, the image correction system is referred to as a 'face image correction system'. However, the present invention is not limited to face image correction, and can be applied to all types of image correction related to vehicle design, architectural design, home appliance design, and the like.

사용자가 간편하고 빠르게 얼굴 이미지를 수정하는 시스템을 설계하기 위해, 얼굴 이미지 수정 시스템(100)은 입력부(110), 전치리부(120), 이미지 생성부(130), 표시부(140), 저장부(150) 및 학습부(160)를 포함한다.In order to design a system that allows a user to easily and quickly correct a face image, the face image correction system 100 includes an input unit 110, an anterior unit 120, an image generator 130, a display unit 140, and a storage unit. (150) and a learning unit (160).

입력부(110)input unit 110

입력부(110)는 사용자 입력을 레이어 생성부(120)로 전달하는 구성으로, 키보드, 마우스, 터치 패드, 터치 패널과 같은 하드웨어 수단과 이러한 하드웨어 수단과 연동하도록 프로그래밍된 소프트웨어 수단(이하, '사용자 인터페이스'라 함)을 포함한다. The input unit 110 is a component that transmits user input to the layer creation unit 120, and hardware means such as a keyboard, mouse, touch pad, and touch panel and software means programmed to work with these hardware means (hereinafter referred to as 'user interface'). ').

사용자는 입력부(110)를 통해 다음과 같은 작업을 수행할 수 있다.The user can perform the following tasks through the input unit 110.

- 원본 얼굴 이미지(Original face image, 10)에서 사용자가 수정하고자 하는 영역을 지우는 작업.- Erase the area the user wants to correct in the original face image (10).

- 원본 이미지(10)에서 지워진 영역(erased area) 내에 사용자가 수정하고자 하는 모양(또는 형태)으로 스케치하는 작업. 여기서, 스케치는 그 모양을 알아볼 수 있는 수준을 의미하며, 전문가 수준의 스케치는 불필요하다.- An operation of sketching a shape (or shape) that the user wants to modify within an erased area of the original image 10. Here, the sketch means the level at which the shape can be recognized, and the expert level sketch is unnecessary.

- 스케치한 모양을 사용자가 수정하고자 하는 컬러로 색칠하는 작업- Coloring the sketched shape with the color the user wants to modify

실시 예에 따르면, 사용자가 수정하고자 하는 영역은 원본 얼굴 이미지에서 눈 영역, 귀 영역, 입 영역, 코 영역, 헤어 영역 등을 포함한다.According to the embodiment, the region to be corrected by the user includes an eye region, an ear region, a mouth region, a nose region, a hair region, and the like in the original face image.

실시 예에 따르면, 사용자가 수정하고자 하는 모양(또는 형태)은 눈 영역에 착용하는 안경, 귀 영역에 착용하는 귀걸이, 입술 모양, 헤어 스타일 등을 포함한다. According to the embodiment, the shape (or form) that the user wants to modify includes glasses worn in the eye area, earrings worn in the ear area, lip shape, hairstyle, and the like.

실시 예에 따르면, 사용자가 수정하고자 하는 컬러는 눈동자의 색깔, 귀걸이의 색깔, 입술의 색깔, 헤어 색깔 등을 포함한다.According to an embodiment, the color that the user wants to modify includes eye color, earring color, lip color, hair color, and the like.

입력부(110)는, 상기의 작업들에 대응하는 사용자 입력에 따라, 원본 이미지에서 지워진 영역을 나타내는 제1 입력 값, 상기 지워진 영역에 스케치된 모양(또는 형태)를 나타내는 제2 입력 값 및 사용자가 수정하고자 하는 색깔을 나타내는 제3 입력 값을 생성하여 이를 전처리부(120)로 입력한다.The input unit 110 receives a first input value representing a region erased from the original image, a second input value representing a shape (or shape) sketched in the erased region, and a user input value corresponding to the above operations. A third input value representing a color to be corrected is generated and inputted to the preprocessing unit 120 .

전처리부(120)Pre-processing unit 120

전처리부(120)는 상기 입력부(110)로부터 입력된 제1 내지 제3 입력 값에 따라 원본 얼굴 이미지를 전처리 하여, 전처리 된 다수의 입력 이미지를 생성한다. 생성된 다수의 입력 이미지는 이미지 생성부(130)로 입력되는 정보로 사용된다. The pre-processing unit 120 pre-processes the original face image according to the first to third input values input from the input unit 110 and generates a plurality of pre-processed input images. The generated input images are used as information input to the image generating unit 130 .

도 2는 본 발명의 실시 예에 따른 전처리부로부터 이미지 생성부로 입력되는 다수의 입력 이미지를 설명하기 위한 도면이다.2 is a diagram for explaining a plurality of input images input from a pre-processing unit to an image generating unit according to an embodiment of the present invention.

도 2를 참조하면, 다수의 입력 이미지는 상기 제1 입력 값에 따라 사용자에 의해 지워진 영역을 포함하는 원본 얼굴 이미지(11), 상기 제1 입력 값에 따라 상기 지워진 영역만을 포함하는 마스크 이미지(mask image)(13), 상기 제2 입력 값에 따라 사용자에 의해 상기 지워진 영역에 스케치된 모양만을 나타내는 스케치 이미지(15), 상기 제3 입력 값에 따라 상기 지워진 영역에 나타나는 색깔만을 포함하는 색깔 이미지(17) 및 상기 지워진 영역의 가우시안 노이즈만을 나타내는 노이즈 이미지(19)를 포함한다.Referring to FIG. 2 , a plurality of input images include an original face image 11 including an area erased by a user according to the first input value, and a mask image including only the area erased according to the first input value. image) 13, a sketch image 15 representing only a shape sketched in the erased area by the user according to the second input value, and a color image including only colors appearing in the erased area according to the third input value ( 17) and a noise image 19 representing only Gaussian noise of the erased area.

이미지 image 생성부generator (130)(130)

다시 도 1을 참조하면, 이미지 생성부(130)는 사전에 학습된 신경망을 이용하여 상기 전처리부(120)로부터 입력된 상기 다수의 입력 이미지를 분석하고, 그 분석 결과에 따라 상기 원본 얼굴 이미지(10) 내에서 상기 지워진 영역에 사용자가 합성하고자 하는 사실적인 이미지를 예측하여, 상기 예측된 사실적인 이미지를 상기 원본 얼굴 이미지(10)의 지워진 영역에 자동으로 합성한다.Referring back to FIG. 1 , the image generator 130 analyzes the plurality of input images input from the pre-processor 120 using a neural network trained in advance, and according to the analysis result, the original face image ( 10) predicts a realistic image that the user intends to combine with the erased area, and automatically synthesizes the predicted realistic image with the erased area of the original face image 10 .

실시 예에 따르면, 상기 사전에 학습된 신경망은 생성적 적대 신경망(Generative Adversarial Networks, GANs) 일 수 있다. 이에 따라, 이미지 생성부(130)는 생성적 적대 신경망(GANs)을 이용하여 원본 얼굴 이미지를 수정한다.According to an embodiment, the pretrained neural networks may be generative adversarial networks (GANs). Accordingly, the image generator 130 modifies the original face image using generative adversarial networks (GANs).

생성적 적대 신경망(GANs)은 2014년 몬트리올 대학의 Ian Goodfellow와 Yoshua Bengio를 포함하는 연구자들이 작성한 논문에서 소개된 심층 신경망 기술이다.Generative adversarial networks (GANs) are deep neural network techniques introduced in a 2014 paper by researchers including Ian Goodfellow and Yoshua Bengio of the University of Montreal.

생성적 적대 신경망(GANs)에 대한 상세한 설명은 상기 논문으로 대신하고, 본 명세서에서는 생성적 적대 신경망(GANs)의 개략적인 동작 원리와 본 발명에 적용되도록 수정 및 변경된 부분에 대해서만 설명하기로 한다.A detailed description of generative adversarial networks (GANs) is replaced by the above paper, and in this specification, only the brief operating principle of generative adversarial networks (GANs) and modified and changed parts to be applied to the present invention will be described.

생성적 적대 신경망(GANs)은 생성기(Generator)로 불릴 수 있는 생성적 신경망(162, Generative neural network)과 판별기(Discriminator)로 불릴 수 있는 판별적 신경망(164, Discriminative neural network)으로 구성된 심층 신경망 구조를 갖는다. Generative adversarial networks (GANs) are deep neural networks composed of a generative neural network (162), which can be called a generator, and a discriminative neural network (164, discriminative neural network), which can be called a discriminator. have a structure

생성적 신경망(162)은 판별적 신경망(164)으로 전달하는 새로운 이미지를 생성하고, 판별적 신경망(164)은 생성적 신경망(162)로부터 입력된 새로운 이미지의 진위 여부를 판단한다.The generative neural network 162 generates a new image to be transmitted to the discriminant neural network 164, and the discriminant neural network 164 determines whether the new image input from the generative neural network 162 is authentic.

생성적 신경망(162)은 새로운 이미지를 생성하고, 판별적 신경망(164)는 생성적 신경망(162)으로부터 입력되는 새로운 이미지의 진위를 판별하고, 그 판별 결과를 생성적 신경망(162)의 입력으로 피드백한다.The generative neural network 162 generates a new image, the discriminant neural network 164 determines the authenticity of the new image input from the generative neural network 162, and uses the discrimination result as an input of the generative neural network 162. give feedback

이때, 생성적 신경망(162)은 자신이 생성한 새로운 이미지를 판별적 신경망(164)이 진짜 이미지(real image)로 판별하도록 이미지 생성 과정을 학습하고, 반대로 판별적 신경망(164)은 생성적 신경망(162)으로부터 입력된 새로운 이미지를 가짜 이미지로 판별하도록 이미지 판별 과정을 학습한다. At this time, the generative neural network 162 learns the image generation process so that the discriminant neural network 164 determines the new image it has created as a real image, and conversely, the discriminant neural network 164 is a generative neural network. The image discrimination process is learned to discriminate the new image input from (162) as a fake image.

이처럼 생성적 신경망(162)과 판별적 신경망(164)은 제로섬 게임처럼 서로 반대되는 목적 함수 또는 손실 함수를 통해 학습된다. 즉, 생성적 신경망(162)과 판별적 신경망(164)은 서로 적대적인(Adversarial) 관계를 유지하도록 학습되고 진화한다.As such, the generative neural network 162 and the discriminant neural network 164 are trained through opposing objective functions or loss functions like a zero-sum game. That is, the generative neural network 162 and the discriminant neural network 164 learn and evolve to maintain an adversarial relationship with each other.

이미지 생성부(130)는 생성적 신경망(162)과 판별적 신경망(164) 중에서 학습이 완료된 생성적 신경망(162)만을 이용하여 원본 얼굴 이미지에서 지워진 영역에 합성되는 사실적인 이미지를 생성한다.The image generator 130 uses only the generative neural network 162 for which learning has been completed among the generative neural network 162 and the discriminant neural network 164 to generate a realistic image synthesized in the deleted area of the original face image.

저장부(150)storage unit 150

저장부(150)에는 이미지 생성부(130)가 사용하는 생성적 신경망(162)이 저장된다. 생성적 신경망(162)은 생성 알고리즘(Generative Algorithms)이라는 용어로 대체될 수 있다. 유사하게, 판별적 신경망(164)은 판별 알고리즘(Discriminative Algorithms)이라는 용어로 대체될 수 있다. 저장부(150)는 휘발성 메모리 및 비휘발성 메모리로 구현될 수 있다.The storage unit 150 stores the generative neural network 162 used by the image generator 130 . The generative neural network 162 may be replaced by the term generative algorithms. Similarly, discriminant neural networks 164 may be replaced by the term Discriminative Algorithms. The storage unit 150 may be implemented with volatile memory and non-volatile memory.

학습부learning department (160)(160)

학습부(160)는 사전에 수집된 대용량의 훈련 데이터를 이용하여 생성적 적대 신경망(GANs), 즉, 생성적 신경망(162)과 판별적 신경망(164)을 학습시킨다.The learning unit 160 trains generative adversarial networks (GANs), that is, the generative neural network 162 and the discriminant neural network 164, using a large amount of training data collected in advance.

여기서, 대용량의 훈련 데이터는 대용량의 훈련용 얼굴 이미지, 서로 다른 위치에서 지워진 영역을 포함하도록 구성된 대용량의 훈련용 얼굴 이미지, 상기 지워진 영역만을 나타내는 대용량의 훈련용 마스크 이미지, 상기 지워진 영역에 다양한 모양이 스케치된 대용량의 훈련용 스케치 이미지, 상기 지워진 영역에 서로 다른 색깔이 채워진 대용량의 훈련용 색깔 이미지 및 대용량의 노이즈 이미지를 포함한다.Here, the large-capacity training data includes a large-capacity training face image, a large-capacity training face image configured to include erased areas at different locations, a large-capacity training mask image representing only the erased area, and various shapes in the erased area. It includes a large-capacity sketch image for training, a large-capacity color image for training in which different colors are filled in the erased area, and a large-capacity noise image.

이미지 생성부는 학습이 완료된 생성적 신경망(162)과 판별적 신경망(164) 중에서 생성적 신경망(162)만을 이용하므로, 학습부(160)는 학습이 완료된 생성적 신경망(162)과 판별적 신경망(164) 중에서 생성적 신경망(162)을 저장부(150)에 저장함으로써, 이미지 생성부(130)는 저장부(150)에 저장된 생성적 신경망(162)을 이용할 수 있게 된다.Since the image generator uses only the generative neural network 162 among the learned generative neural network 162 and the discriminant neural network 164, the learning unit 160 uses the learned generative neural network 162 and the discriminant neural network ( By storing the generative neural network 162 of 164 in the storage unit 150, the image generator 130 can use the generative neural network 162 stored in the storage unit 150.

표시부(140)display unit 140

표시부(140)는 이미지 생성부(130)에서 생성한 이미지를 표시하는 구성으로, 상기 원본 얼굴 이미지(10) 내에서 사용자 입력에 따라 지워진 영역에 사용자가 합성하고자 하는 사실적인 이미지가 자동으로 합성된 합성 이미지를 표시한다. 이러한 표시부(140)는 LCD, LED, OLE 등일 수 있다. The display unit 140 is configured to display the image generated by the image generator 130, and a realistic image to be synthesized by the user is automatically synthesized in the erased area according to the user input within the original face image 10. Display composite image. The display unit 140 may be an LCD, LED, OLE, or the like.

또한 표시부(140)는 사용자 인터페이스의 화면 구성을 표시한다.Also, the display unit 140 displays the screen configuration of the user interface.

도 3 내지 도 7은 본 발명의 실시 예에 따른 사용자 인터페이스의 화면 구성을 나타내는 도면들이다.3 to 7 are diagrams illustrating a screen configuration of a user interface according to an embodiment of the present invention.

사용자 인터페이스는 사용자에게 원본 얼굴이미지의 수정에 필요한 입력, 즉, 이미지 생성부로 입력되는 입력 정보를 생성하기 위한 환경을 제공한다.The user interface provides the user with an environment for generating input necessary for correcting the original face image, that is, input information input to the image generator.

이를 위해, 사용자 인터페이스의 화면 구성은 원본 얼굴 이미지 및 원본 얼굴 이미지에 대한 진행 과정이 표시되는 영역(31)과 수정된 결과(합성 이미지)가 표시되는 영역(32)을 포함한다.To this end, the screen configuration of the user interface includes an area 31 displaying an original face image and a progress process for the original face image, and an area 32 displaying a modified result (composite image).

또한, 사용자 인터페이스의 화면 구성은 입력 정보 생성과 관련된 다수의 아이콘 형태의 버튼(33, 34, 35, 36 및 37)을 포함한다. In addition, the screen configuration of the user interface includes a plurality of icon-shaped buttons 33, 34, 35, 36, and 37 related to generating input information.

먼저, 도 3을 참조하면, 버튼(33)은 원본 이미지의 불러오기 기능을 제공한다. 사용자가 버튼(33)을 터치 또는 클릭하면, 원본 이미지가 영역(31)에 표시된다.First, referring to FIG. 3 , a button 33 provides a function of loading an original image. When the user touches or clicks the button 33, the original image is displayed in the area 31.

도 4를 참조하면, 버튼(34)은 사용자가 원본 얼굴 이미지(10)에서 수정할 영역을 지우는 기능을 제공한다. 예를 들면, 사용자가 버튼(34)을 터치 또는 클릭하면, 지우개 형태의 아이콘 도구가 생성되고, 지우개 형태의 아이콘 도구를 이용하여 원본 이미지(10)에서 수정할 부분을 지운다. 도 4는 사용자가 원본 얼굴 이미지(10)에서 수정하고자 하는 영역이 눈 영역, 코 영역, 입술 영역인 경우를 도시한 것이다.Referring to FIG. 4 , a button 34 provides a function for the user to erase a region to be modified in the original face image 10 . For example, when the user touches or clicks the button 34, an eraser-shaped icon tool is created, and a part to be corrected in the original image 10 is erased using the eraser-shaped icon tool. FIG. 4 shows a case where the user's original face image 10 is to be corrected in an eye area, a nose area, and a lip area.

도 5를 참조하면, 버튼(35)은 사용자가 원본 얼굴 이미지(10)에서 지워진 영역에 수정하고자 하는 모양을 스케치하는 기능을 제공한다. 예를 들면, 사용자가 버튼(35)을 클릭 또는 터치하면, 펜 형태의 아이콘 도구가 생성되고, 펜 형태의 아이콘 도구를 이용하여 원본 얼굴 이미지(10)에서 지워진 영역에 수정하고자 하는 모양을 스케치한다. 이때, 스케치는 그 모양을 알아볼 수 있는 수준을 의미하며, 전문가 수준의 스케치는 불필요하다.Referring to FIG. 5 , a button 35 provides a function for the user to sketch a shape to be corrected in an area erased from the original face image 10 . For example, when the user clicks or touches the button 35, a pen-shaped icon tool is created, and a shape to be corrected is sketched in the area erased from the original face image 10 using the pen-shaped icon tool. . At this time, the sketch means the level at which the shape can be recognized, and the expert level sketch is unnecessary.

도 6을 참조하면, 버튼(36)은 스케치한 모양을 사용자가 수정하고자 하는 컬러로 색칠하는 기능을 제공한다. 예를 들면, 사용자가 버튼(36)을 클릭 또는 터치하면, 붓 형태의 아이콘 도구가 생성되고, 붓 형태의 아이콘 도구를 이용하여 스케치한 모양 또는 그 주변에 사용자가 결정한 색깔을 칠한다. 도 6에서는 눈동자의 색깔을 사용자가 정한 색깔로 칠한 경우를 도시한 것이다.Referring to FIG. 6 , a button 36 provides a function of coloring a sketched shape with a color the user wants to modify. For example, when the user clicks or touches the button 36, an icon tool in the form of a brush is created, and the sketched shape or its surroundings is painted with a color determined by the user using the icon tool in the form of a brush. 6 illustrates a case in which the color of the pupil is painted in a color determined by the user.

도 7을 참조하면, 버튼(37)은 수정된 결과를 보여주는 기능을 제공한다. 사용자가 버튼(37)을 클릭 또는 터치하면, 영역(32)에 상기 원본 얼굴 이미지(10) 내에서 사용자 입력에 따라 지워진 영역에 사용자가 합성하고자 하는 사실적인 이미지가 자동으로 합성된 합성 이미지가 표시된다. 도 7에서는 생성적 신경망(162)의 이미지 생성 과정에서 수정된 코 형상 이미지, 입술 형상 이미지 및 눈동자 색상 이미지가 원본 얼굴 이미지에 합성된 예를 도시한 것이다. Referring to FIG. 7 , button 37 provides a function of showing the modified result. When the user clicks or touches the button 37, a synthesized image in which a realistic image that the user wants to synthesize is automatically synthesized is displayed in the area 32 erased according to the user input within the original face image 10. do. FIG. 7 shows an example in which the nose shape image, the lip shape image, and the pupil color image corrected in the image generation process of the generative neural network 162 are combined with the original face image.

도 7에서는 생성적 신경망(162)이 마스크 작업(원본 얼굴 이미지에서 사용자가 수정하고자 하는 영역을 지우는 작업), 스케치 작업 및 색깔 작업에 따라 생성된 모든 입력 정보를 이용하여 합성된 이미지를 생성한 예를 도시하고 있으나, 합성된 이미지를 생성하기 위해 모든 입력 정보가 필요한 것은 아니다.In FIG. 7 , an example in which the generative neural network 162 generates a synthesized image using all input information generated according to a mask operation (an operation of erasing an area that a user wants to correct in an original face image), a sketch operation, and a color operation , but not all input information is required to generate a synthesized image.

예를 들면, 생성적 신경망(162)이 마스크 작업에 따라 생성된 오직 하나의 입력 정보, 즉, 마스크 이미지만을 이용하여 합성된 이미지를 생성할 수도 있다. 이는 생성적 신경망(162)이 대용량의 훈련 데이터, 즉, 대용량의 마스크 이미지, 대용량의 스케치 이미지 및 대용량의 색깔 이미지를 모두 이용하여 학습된 것이기 때문이다. For example, the generative neural network 162 may generate a synthesized image using only one input information generated according to a mask operation, that is, a mask image. This is because the generative neural network 162 is learned using all large-capacity training data, that is, large-capacity mask images, large-capacity sketch images, and large-capacity color images.

이것은 원본 얼굴 이미지(10)에서 사용자가 수정하고자 하는 영역의 위치를 나타내는 마스크 이미지는 반드시 생성적 신경망(162)의 입력으로 사용되어야 함을 의미하기도 한다. 즉, 마스크 이미지는 사용자 수정하고자 하는 의도를 생성적 신경망(162)에게 알리는 최소한 정보로 해석할 수 있다.This also means that the mask image indicating the location of the region to be corrected by the user in the original face image 10 must be used as an input to the generative neural network 162 . That is, the mask image can be interpreted as minimum information notifying the generative neural network 162 of the user's intention to modify.

물론, 생성적 신경망(162)으로 입력되는 정보의 량이 많을수록 사용자가 수정하고자 하는 의도에 가장 부합하는 수정된 얼굴 이미지가 생성될 확률은 당연히 가장 높을 것이다.Of course, the greater the amount of information input to the generative neural network 162, the higher the probability of generating a modified face image that most closely matches the user's intention to modify.

도 8은 본 발명에 적용되는 생성적 적대 신경망(GANs)의 전체 네트워크 구조를 나타내는 도면이다.8 is a diagram showing the overall network structure of generative adversarial networks (GANs) applied to the present invention.

전술한 바와 같이, 생성적 신경망(162)을 학습시키기 위해서는 생성적 긴경망(162)과 판별적 신경망(164)으로 구성된 전체 네트워크(GANs)를 같이 학습시켜야 한다. As described above, in order to train the generative neural network 162, the entire network (GANs) composed of the generative long neural network 162 and the discriminant neural network 164 must be trained together.

학습이 완료되면, 얼굴 이미지의 수정에는 생성적 신경망(162)만이 사용되므로, 생성적 신경망(162)의 속도 향상과 수정된 얼굴 이미지의 사실성을 높이기 위해, 생성적 적대 신경망(GANs)은, 도 8에 도시된 바와 같은 구조로 구성될 수 있다.When learning is completed, only the generative neural network 162 is used to modify the face image, so in order to improve the speed of the generative neural network 162 and increase the realism of the modified face image, generative adversarial networks (GANs) are It can be configured with a structure as shown in 8.

도 8에 도시된 바와 같이, 생성적 신경망(162)은 U-net 구조로 이루어져 있으며 게이트 컨볼루션 레이어(Gated convolution layer)들, 확장된 게이트 컨볼루션 레이어(Dilated gated convolution layer)들, 디컨볼루션 레이어(Deconvolution layer)들을 포함한다. 각 컨볼루션 레이어는 채널 수에 따라 서로 다른 사이즈의 육면체 형상으로 도시된다. As shown in FIG. 8, the generative neural network 162 is composed of a U-net structure and includes gated convolution layers, dilated gated convolution layers, and deconvolution. It includes deconvolution layers. Each convolution layer is shown as a hexahedron shape with different sizes according to the number of channels.

생성적 신경망(162)은 일반적인 컨볼루션 레이어(convolution layer)가 아니라 게이트 컨볼루션 레이어(gated convolution layer)를 사용함에 특징이 있다. 일반적인 컨볼루션 레이어는 이전단의 컨볼루션 레이어로부터 입력된 특징값에 대해 다른 특징값을 출력한다. 이에 반해, 게이트 컨볼루션 레이어는 이전단의 게이트 컨볼루션 레이어로부터 입력된 특징에 대해 다른 특징값과 마스크 이미지에 대한 특징값을 출력한다. 즉, 일반적인 컨볼루션 레이어는 하나의 데이터를 출력하고, 게이트 컨볼루션 레이어는 두 개의 데이터를 출력하는 점에서 차이가 있다. 게이트 컨볼루션 레이어에서 출력되는 2개의 데이터 중 하나의 데이터는 마스크 이미지의 특징값이고, 이 마스크 이미지의 특징값은 인접하지 않은 다른 게이트 컨볼루션 레이어로 입력된다. 도 8에서 점선의 화살표는 현재의 게이트 컨볼루션 레이어가 출력하는 마스크 이미지의 특징값이 인접하지 않은 다른 게이트 컨볼루션 레이어로의 입력을 나타낸 것이다.The generative neural network 162 is characterized by using a gated convolution layer instead of a general convolution layer. A general convolution layer outputs a different feature value for a feature value input from a previous convolution layer. In contrast, the gate convolution layer outputs other feature values for features input from the previous gate convolution layer and feature values for the mask image. That is, there is a difference in that a general convolution layer outputs one data and a gate convolution layer outputs two data. One of the two pieces of data output from the gate convolution layer is a feature value of a mask image, and the feature value of the mask image is input to another non-adjacent gate convolution layer. In FIG. 8 , a dotted line arrow indicates an input to another gate convolution layer in which the feature values of the mask image output by the current gate convolution layer are not adjacent.

판별적 신경망(164)은 스펙트럼 정규화(Spectral normalization)가 적용된 게이트 컨볼루션 레이어(gated convolution layer)로 이루어진 patchGAN 형태이다. 판별적 신경망(164)을 학습시키는 과정은 데이터셋(dataset)에 대하여 일반적인 생성적 적대 신경망(GANs)을 학습시키는 방법과 동일하다.The discriminant neural network 164 has a patchGAN form consisting of a gated convolution layer to which spectral normalization is applied. The process of training the discriminant neural network 164 is the same as the method of training general generative adversarial networks (GANs) for a dataset.

다만, 판별적 신경망(164)의 학습에 사용되는 손실 변수(Loss)는 일반적인 손실(Loss)와는 차이가 있다. 여기서, 손실(Loss)는 원본 얼굴 이미지와 생성적 신경망(162)이 생성한 새로운 얼굴 이미지와의 차이를 의미한다. However, a loss variable (Loss) used for learning of the discriminant neural network 164 is different from a general loss (Loss). Here, loss means a difference between an original face image and a new face image generated by the generative neural network 162 .

본 발명의 실시 예에 따른 생성적 신경망(162)을 학습시키는데 사용되는 파라미터(L_G)는 아래의 수학식 1과 같고, 판별적 신경망(164)을 학습시키는데 사용되는 파라미터(L_D)는 아래의 수학식 2와 같다.The parameters (L _G ) used to train the generative neural network 162 according to an embodiment of the present invention are as shown in Equation 1 below, and the parameters (L _D ) used to learn the discriminant neural network 164 are as follows: It is the same as Equation 2 of

파라미터 L_G는 생성적 신경망(162)의 레이어(layer)를 학습시키는 손실(loss)이다. 여기서, 생성적 신경망(162)의 레이어(layer)는 도 8에 도시된 바와 같이, 다수의 게이트 컨볼루션 레이어(Gated convolution layer), 다수의 확장된 게이트 컨볼루션 레이어(Dilated gated convolution layer) 및 다수의 디컨볼루션 레이어(Deconvolution layer)을 포함한다. The parameter L _G is a loss for learning a layer of the generative neural network 162 . Here, as shown in FIG. 8, the layers of the generative neural network 162 include a plurality of gated convolution layers, a plurality of dilated gated convolution layers, and a plurality of gated convolution layers. It includes a deconvolution layer of

파라미터 L_D는 판별적 신경망(164)의 레이어(layer)를 학습시키는 손실(loss)이다. 여기서, 판별적 신경망(164)의 레이어(layer)는 다수의 스펙트럼 정규화(Spectral Normalization: SN) 컨볼루션 레이어를 포함한다.The parameter L _D is a loss for learning a layer of the discriminant neural network 164 . Here, a layer of the discriminant neural network 164 includes a plurality of spectral normalization (SN) convolutional layers.

수학식 1에서 L_per _-pixel은 아래의 수학식 3과 같다.In Equation 1, L _per _-pixel is equal to Equation 3 below.

여기서, M은 마스크 이미지에 포함된 지워진 영역을 나타내는 1채널의 픽셀값으로서, 예를 들어, '1'일 수 있다. 이때, 마스크 이미지에서 지워진 영역을 제외한 나머지 영역의 각 픽셀값은 '0'이다. I_gen은 생성적 신경망(162)이 지워진 영역을 갖는 원본 얼굴 이미지, 상기 지워진 영역만을 갖는 마스크 이미지, 스케치 이미지, 색깔 이미지 및 노이즈 이미지를 입력받아서 생성한 새로운 얼굴 이미지, 즉, 원본 얼굴 이미지에서 지워진 영역이 다른 이미지로 채워진 이미지를 의미하고, I_gen는 생성적 신경망(162)이 생성한 새로운 얼굴 이미지를 3차원 벡터 공간에서 표현한 3차원 벡터값일 수 있다. I_gt는 지워진 영역이 없는 원본 얼굴 이미지를 의미하고, I_gt는 원본 얼굴 이미지를 3차원 벡터 공간에서 표현한 3차원 벡터값일 수 있다. L_per _-pixel은 생성적 신경망에서 생성한 새로운 얼굴 이미지와 원본 얼굴 이미지 간의 거리 L1를 계산하는 파라미터 이다. 거리 L은 새로운 얼굴 이미지의 픽셀과 상기 픽셀에 대응하는 원본 얼굴 이미지의 픽셀 간의 거리이다. α는 원본 얼굴 이미지 내에서 지워진 영역에 대한 손실(loss)를 강화시키기 위한(줄이기 위한) 가중치(weight)이다. ⊙는 요소 별 곱셈 연산(Element-wise multiplication)을 나타내는 기호이고,

는 이미지 크기에 따른 총 픽셀 수로서, 예컨대, 786432(= 512 x 512 x 3)일 수 있다.Here, M is a pixel value of one channel representing an erased area included in the mask image, and may be, for example, '1'. At this time, each pixel value of the area other than the erased area in the mask image is '0'. I _gen is a new face image generated by the generative neural network 162 by receiving an original face image having an erased region, a mask image having only the erased region, a sketch image, a color image, and a noise image, that is, a new face image erased from the original face image. An image in which an area is filled with other images, and I _gen may be a 3D vector value expressing a new face image generated by the generative neural network 162 in a 3D vector space. I _gt means an original face image without an erased region, and I _gt may be a 3D vector value expressing the original face image in a 3D vector space. L _per _-pixel is a parameter that calculates the distance L1 between the new face image created by the generative neural network and the original face image. The distance L is the distance between a pixel of the new face image and a pixel of the original face image corresponding to the pixel. α is a weight for enhancing (reducing) the loss of the erased area in the original face image. ⊙ is a symbol representing element-wise multiplication,

is the total number of pixels according to the image size, and may be, for example, 786432 (= 512 x 512 x 3).

수학식 1에서 L_percept는 아래의 수학식 4와 같다.In Equation 1, L _percept is equal to Equation 4 below.

L_percept는 스타일 손실(style-loss)로 활용되는 손실(loss) 중 하나이다. θ_q는 대량의 훈련 데이터 셋에 대한 이미지 분류(classification)을 위하여 학습된 생성적 신경망의 q번째 레이어(layer)의 특징을 의미한다. 즉, L_percept는 기존의 학습된 네트워크를 활용하여 픽셀 단위로 계산된 특징에 대한 손실(loss)이다. N_θ_q(I_gt)에서 N은 픽셀 총 개수이고, θ_q는 생성적 신경망과는 다른 기존의 이미지 분류를 위해 대량의 이미지를 학습한 인공신경망의 q번째 특징맵이다. 즉, N_θ_q(I_gt)는 원본 얼굴 이미지를 이미지 분류 인공신경망에 통과시켜서 얻은 q번째 특징맵의 총 픽셀 수이다.

는 L1 거리에 대한 손실(loss)로서, 모든 픽셀 차이의 절대값의 합이다. I_comp는 지워진 영역을 갖는 원본 얼굴 이미지와 생성적 신경망(162)에 의해 생성된 새로운 얼굴 이미지(I_gen)에서 상기 지워진 영역에 대응하는 부분만을 추출한 이미지를 합성한 합성 이미지이다. 즉, I_gen는 생성적 신경망(162)의 예측 및/또는 추론 과정을 통해 생성된 이미지를 의미하는 것인 반면, I_comp는 예측 및/또는 추론 과정이 아니라 지워진 영역을 갖는 원본 이미지와 생성적 신경망(162)에 의해 생성된 이미지(I_gen)로부터 추출된 이미지(상기 지워진 영역에 대응하는 이미지)를 합성하는 합성 과정을 통해 생성된 이미지를 의미한다. I_comp는 3차원 벡터 공간에서 표현되는 3차원 벡터 값일 수 있다.L _percept is one of the losses used as style-loss. θ _q denotes a feature of the q-th layer of a generative neural network learned for image classification on a large training data set. That is, L _percept is a loss for a feature calculated in units of pixels using an existing learned network. In N_θ _q (I_gt), N is the total number of pixels, and θ _q is the q-th feature map of an artificial neural network that has learned a large amount of images for image classification, which is different from generative neural networks. That is, N_θ _q (I_gt) is the total number of pixels of the qth feature map obtained by passing the original face image through the image classification artificial neural network.

is the loss for the L1 distance and is the sum of the absolute values of all pixel differences. I _comp is a composite image obtained by synthesizing an original face image having an erased region and an image obtained by extracting only a part corresponding to the erased region from the new face image I _gen generated by the generative neural network 162 . That is, I _gen denotes an image generated through the prediction and/or inference process of the generative neural network 162, whereas I _comp is not a prediction and/or inference process but an original image having an erased area and a generative It refers to an image generated through a synthesis process of synthesizing an image (an image corresponding to the erased area) extracted from an image (I _gen ) generated by the neural network 162 . I _comp may be a 3D vector value expressed in a 3D vector space.

수학식 1에서 Lstyle(I_gen)은 아래의 수학식 5와 같다.In Equation 1, Lstyle (I _gen ) is equal to Equation 5 below.

L_style은 대량의 얼굴 데이터 셋에 대하여 이미지 분류(classification)를 위하여 학습된 네트워크의 q번째 layer 특징에 대한 손실(loss)이다. 이러한 L_style을 계산하기 위해 Gram matrix가 활용될 수 있다. C_q는 q번째 레이어의 채널 수이고, N_q는 q번째 레이어의 전체 픽셀 수이고, G_q는 q번째 레이어의 Gram matrix 값을 나타낸다.L _style is the loss of the feature of the qth layer of the network learned for image classification on a large face data set. Gram matrix can be utilized to calculate this L _style . C _q is the number of channels in the q-th layer, N _q is the total number of pixels in the q-th layer, and G _q represents the Gram matrix value of the q-th layer.

추가로, 생성적 신경망(162)을 학습시키데 분산 손실(variance loss)이 이용된다. 분산 손실(variance loss)은 판별적 신경망(164)이 생성적 신경망(162)로부터 입력된 새로운 얼굴 이미지 내의 모드 픽셀을 1 pixel만큼 강제로 이동시켜서 획득한 새로운 얼굴 이미지와 원본 얼굴 이미지 간의 차이를 나타내는 손실(loss)이다. 이러한 분산 손실(variance loss)은 아래의 수학식 6과 같다.Additionally, variance loss is used to train the generative neural network 162. The variance loss represents the difference between the original face image and the new face image obtained by the discriminant neural network 164 forcibly moving the mode pixel in the new face image input from the generative neural network 162 by 1 pixel. It is a loss. This variance loss is shown in Equation 6 below.

분산 손실(variance loss)은 L_tv _- _col, L_tv _-row를 포함하고, 위의 수학식 6에서 아래 첨자 comp는 원본 얼굴 이미지에서 지워진 영역이 사용자 입력에 따른 이미지로 채워진 이미지를 의미한다. N_comp는 생성적 신경망(162)이 생성한 새로운 얼굴 이미지의 전체 픽셀 수이다.The variance loss includes L _tv _-col _and L _tv _-row , and in Equation 6 above, the subscript comp means an image in which a region erased from an original face image is filled with an image according to a user input. N _comp is the total number of pixels of the new face image generated by the generative neural network 162 .

생성적 신경망(162)은 위와 같은 분산 손실(variance loss)을 이용하여 블러링에 대하여 강인하도록 학습된다.The generative neural network 162 is trained to be robust against blurring using the above variance loss.

마지막으로 판별적 신경망(164)이 훈련 데이터에 수렴하지 못하도록 gradient penalty term(L_GP)이 아래와 같이 추가된다. 여기서, 판별적 신경망(164)이 훈련 데이터에 수렴하지 못하게 한다는 의미는 판별적 신경망(164)이 생성적 신경망(162)보다 더 높은 레벨로 학습되어서는 안됨을 의미하는 것이다. 즉, 생성적 신경망(162)과 판별적 신경망(164)은 서로 적대적인 관계를 유지하도록 동일한 레벨로 학습되어야 함을 의미한다. Finally, a gradient penalty term (L _GP ) is added as follows to prevent the discriminant neural network 164 from converging on the training data. Here, the meaning of preventing the discriminant neural network 164 from converging on training data means that the discriminant neural network 164 should not be trained to a higher level than the generative neural network 162 . That is, it means that the generative neural network 162 and the discriminant neural network 164 must be trained at the same level to maintain an adversarial relationship with each other.

이상 설명한 바와 같이, 파라미터 L_G와 파라미터 L_D를 감소시키는 방향으로 생성적 신경망(162)과 판별적 신경망(164)을 각각 학습시키는 과정이 완료되면, 이후, 이미지 생성부(130)는 생성적 신경망(162)만을 이용하여 새로운 얼굴 이미지를 생성한다.As described above, when the process of learning the generative neural network 162 and the discriminant neural network 164 in the direction of decreasing the parameter L _G and the parameter L _D is completed, thereafter, the image generator 130 generates a A new face image is generated using only the neural network 162 .

생성적 신경망(162)은 크기가 가볍기 때문에, 새로운 얼굴 이미지를 생성하는데 걸리는 시간은 일반적인 CPU를 기준으로 2초 이내이다.Since the generative neural network 162 is light in size, it takes less than 2 seconds to generate a new face image based on a general CPU.

도 9는 본 발명의 실시 예에 따른 이미지 수정 방법을 보여주는 흐름도이다.9 is a flowchart illustrating an image correction method according to an embodiment of the present invention.

도 9를 참조하면, S910에서, 생성적 적대 신경망(160)을 학습시키는 학습 과정이 수행된다. 이 학습 과정은 서로 적대적인(Adversarial) 관계를 유지하도록 생성적 신경망과 판별적 신경망을 학습시키는 과정이다. 아래에서 설명하겠지만, 생성적 신경망(162)과 판별적 신경망(164)의 학습이 완료되면, 생성적 신경망만을 이용하여 원본 이미지에 합성될 이미지가 예측되고, 예측된 이미지와 원본 이미지가 합성된다. 생성적 신경망(162)을 학습시키기 위해, 사전에 수집된 대량의 훈련 데이터가 이용된다. 대량의 훈련 데이터는 훈련용 원본 이미지들, 상기 지워진 영역만을 포함하는 훈련용 마스크 이미지들, 상기 지워진 영역에 다양한 모양이 스케치된 훈련용 스케치 이미지들, 상기 지워진 영역에 다양한 색깔이 채워진 훈련용 색깔 이미지들을 포함한다. 추가로, 상기 지워진 영역의 노이즈를 나타내는 노이즈 이미지가 더 포함될 수 있다. 판별적 신경망(164)과 적대적 관계를 유지하도록, 생성적 신경망(162)을 학습시키는데, 상기 판별적 신경망(164)으로부터 출력되는 손실(L_G)이 이용될 수 있다. 생성적 신경망(162)을 학습시키는 과정은 상기 손실(L_G)을 줄이는 방향으로 학습시키는 과정이다. 이때, 상기 손실(L_G)은, 상기 대량의 훈련 데이터에 포함된 훈련용 마스크 이미지 내에서 지워진 영역의 픽셀값(M)과 상기 원본 이미지와 상기 새로운 이미지 사이의 픽셀 차이값(I_gen-I_gt)을 포함한다. 이에 대한 설명은 전술한 수학식 1 및 3에 대한 설명으로 대신한다. 또한 생성적 신경망(162)을 학습시키는 과정은 상기 대량의 훈련 데이터에 포함된 훈련용 마스크 이미지 내에서 지워진 영역의 픽셀값(M)과 상기 원본 이미지와 상기 새로운 이미지 사이의 픽셀 차이값(I_gen-I_gt)을 이용하여, 상기 생성적 신경망에 포함된 게이트 컨볼루션 레이어(gated convolution layer)를 학습시키는 과정일 수 있다.Referring to FIG. 9 , in S910, a learning process of training the generative adversarial neural network 160 is performed. This learning process is a process of learning the generative neural network and the discriminant neural network to maintain an adversarial relationship with each other. As will be described below, when the learning of the generative neural network 162 and the discriminant neural network 164 is completed, an image to be synthesized with the original image is predicted using only the generative neural network, and the predicted image and the original image are synthesized. To train the generative neural network 162, a large amount of previously collected training data is used. A large amount of training data includes original training images, training mask images including only the erased area, training sketch images in which various shapes are sketched in the erased area, and training color images filled with various colors in the erased area. include them Additionally, a noise image representing noise of the erased area may be further included. In order to train the generative neural network 162 to maintain an adversarial relationship with the discriminant neural network 164, a loss (L _G ) output from the discriminant neural network 164 may be used. The process of learning the generative neural network 162 is a process of learning in a direction of reducing the loss (L _G ). In this case, the loss (L _G ) is a pixel difference value (I _gen -I _gt ). A description thereof is replaced with the description of Equations 1 and 3 above. In addition, the process of learning the generative neural network 162 is a pixel difference value (I _gen ) between the pixel value (M) of the erased area in the training mask image included in the large amount of training data and the original image and the new image This may be a process of learning a gated convolution layer included in the generative neural network using -I _gt .

이어, S920에서, 생성적 적대 신경망(160)에 대한 학습 과정이 완료되면, 즉, 생성적 신경망(162)과 판별적 신경망(164)의 학습이 완료되면, 학습이 완료된 생성적 신경망(162)을 저장부(150)에 저장하는 과정이 수행된다. 이렇게 함으로써, 이미지 생성부(130)는 생성적 신경망(162)의 접근이 가능하다.Subsequently, in S920, when the learning process for the generative adversarial neural network 160 is completed, that is, when the learning of the generative neural network 162 and the discriminant neural network 164 is completed, the learning-completed generative neural network 162 A process of storing in the storage unit 150 is performed. By doing this, the image generator 130 can access the generative neural network 162 .

이어, S930에서, 원본 이미지(10)에 대한 전처리 과정이 수행되고, 이러한 전처리 과정을 통해 원본 이미지(10)로부터 상기 학습이 완료된 생성적 신경망(162)으로 입력되는 정보가 생성된다. 생성적 신경망(162)으로 입력되는 정보는 상기 원본 이미지 내에서 사용자 입력에 의해 지워진 영역만을 포함하는 마스크(mask) 이미지, 상기 사용자 입력에 의해 상기 지워진 영역에 스케치된 모양만을 포함하는 스케치 이미지 및 상기 사용자 입력에 의해 상기 지워진 영역에 칠해진 색깔만을 포함하는 색깔 이미지를 포함한다. 이때, 생성적 신경망(162)으로 입력되는 정보는 상기 원본 이미지 내에서 사용자 입력에 의해 지워진 영역만을 포함하는 마스크(mask) 이미지로만 구성될 수도 있다. 이처럼 본 발명에서는 사용자에 의해 지워진 영역의 위치 정보만을 포함하고 있는 마스크 이미지만을 생성적 신경망(162)에 입력하는 것만으로도 원본 이미지의 일부를 사용자 의도에 맞게 수정할 수 있다.Subsequently, in S930, a pre-processing process is performed on the original image 10, and through this pre-processing process, information input from the original image 10 to the generative neural network 162 on which the learning is completed is generated. Information input to the generative neural network 162 includes a mask image including only a region erased by a user input in the original image, a sketch image including only a shape sketched in the region erased by the user input, and the and a color image including only the color painted on the area erased by a user input. In this case, the information input to the generative neural network 162 may consist of only a mask image including only a region erased by a user input in the original image. As described above, according to the present invention, a part of an original image can be modified according to the user's intention by simply inputting only the mask image containing only the location information of the region erased by the user to the generative neural network 162 .

이어, S940에서, 이미지 생성부(130)가 학습이 완료된 상기 생성적 적대 신경망, 즉, 학습이 완료된 생성적 신경망(162)을 이용하여 상기 마스크(mask) 이미지, 스케치 이미지 및 색깔 이미지를 분석하여, 그 분석 결과에 따라 상기 원본 이미지 내에서 지워진 영역에 합성될 이미지를 예측하는 과정이 수행된다. Next, in S940, the image generator 130 analyzes the mask image, the sketch image, and the color image using the generative adversarial network that has been learned, that is, the generative neural network 162 that has been learned. , a process of predicting an image to be synthesized in the erased area in the original image according to the analysis result is performed.

이어, S950에서, 이미지 생성부(130)가 상기 예측된 이미지를 상기 원본 이미지 내에서 상기 지워진 영역에 합성하여 새로운 이미지를 생성하고, 표시부(140)가 상기 생성된 새로운 이미지를 표시하여, 사용자에게 제공된다. Subsequently, in S950, the image generator 130 synthesizes the predicted image with the erased area in the original image to generate a new image, and the display unit 140 displays the generated new image to the user. Provided.

도 10 내지 12는 본 발명의 이미지 수정 방법에 따라 생성된 합성 이미지들의 예시한 도면들이다.10 to 12 are diagrams illustrating composite images generated according to the image correction method of the present invention.

도 10의 (A) 및 (B)에 도시한 바와 같이, 본 발명에 따르면, 원본 얼굴 이미지에서 입 모양, 눈 형상을 수정하여 새로운 이미지를 자동으로 생성할 수 있다. 또한, 도 10의 (C)에 도시한 바와 같이, 원본 얼굴 이미지에서 안경을 지워서 새로운 이미지를 자동으로 생성할 수 있다. 여기서 중요한 점은 생성적 신경망의 입력 정보(Free-form input)를 생성하기 위해 지워진 영역에 스케치되는 모양은 전문가 수준의 스케치를 필요로 하지 않는 점이다. 즉, 본 발명은 대량의 훈련 데이터를 이용하여 사전에 학습시킨 생성적 신경망을 이용하기 때문에, 마스크 이미지로부터 지워진 영역의 위치 정보와 일반인 수준으로 스케치된 모양만으로 원본 얼굴 이미지에 합성될 이미지의 추론과 예측이 가능하다. 이것은 마스크 이미지, 스케치 이미지 또는 색깔 이미지와 같은 단순한 이미지 입력만을 이용하여 원본 얼굴 이미지를 사실적으로 수정할 수 있음을 의미한다.As shown in (A) and (B) of FIG. 10 , according to the present invention, a new image can be automatically generated by correcting the shape of the mouth and eyes in the original face image. Also, as shown in (C) of FIG. 10 , a new image may be automatically generated by deleting the glasses from the original face image. The important point here is that the shape sketched in the erased area to create the free-form input of the generative neural network does not require expert-level sketching. That is, since the present invention uses a generative neural network trained in advance using a large amount of training data, it is possible to infer the image to be synthesized with the original face image only with the location information of the area erased from the mask image and the shape sketched at the level of the general public. Predictable. This means that original face images can be realistically modified using only simple image inputs such as mask images, sketch images or color images.

또한, 본 발명에 따르면, 도 11의 (A)에 도시된 바와 같이, 헤어 스타일의 일부를 수정하거나 (B)에 도시된 바와 같이, 헤어 스타일의 전부를 수정할 수 있다.Further, according to the present invention, as shown in (A) of FIG. 11, part of the hair style may be corrected, or as shown in (B), all of the hairstyle may be corrected.

또한, 본 발명에 따르면, 도 12의 (A)에 도시된 바와 같이, 원본 얼굴 이미지에서 귀 영역의 일부 영역을 지우고, 그 지운 영역에 사용자가 간단하게 스케치한 귀걸이 모양을 생성적 신경망의 입력 정보로 구성하면, 원본 얼굴 이미지에 귀걸이 이미지가 사실적으로 합성된 이미지를 자동으로 생성할 수 있다.In addition, according to the present invention, as shown in (A) of FIG. 12, a partial region of the ear region is erased from the original face image, and an earring shape simply sketched by the user is input to the erased region as input information of the generative neural network. When configured, it is possible to automatically generate an image in which the original face image and the earring image are realistically synthesized.

또한, 도 12의 (B)에 도시된 바와 같이, 원본 얼굴 이미지에 포함된 귀걸이를 다른 형상의 귀걸이로 수정하는 것도 가능할 것이다.In addition, as shown in (B) of FIG. 12 , it will be possible to modify the earrings included in the original face image to earrings of a different shape.

이상에서 설명된 이미지 수정 시스템은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. The image correction system described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components.

예를 들어, 실시 예들에서 설명된 전처리부(120), 이미지 생성부(130), 학습부(160)와 같은 구성요소들은, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 그래픽 프로세서(GPU), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서로 구현될 수 있다. For example, components such as the pre-processing unit 120, the image generating unit 130, and the learning unit 160 described in the embodiments are, for example, a processor, a controller, an arithmetic logic unit (ALU), a graphic processor (GPU), digital signal processor (digital signal processor), microcomputer, FPGA (field programmable gate array), PLU (programmable logic unit), may be implemented as a microprocessor.

또한, 이미지 수정 시스템은 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 이미지 수정 시스템은 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. Also, the image modification system may execute an operating system (OS) and one or more software applications running on the operating system. The image modification system may also access, store, manipulate, process and create data in response to execution of the software.

실시 예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

generating a mask image including only erased areas in the original image by performing a preprocessing process on the original image;
predicting an image to be synthesized in the erased area in the mask image using a generative adversarial network including a generative neural network and a discriminant neural network; and
generating a new image by combining the predicted image with the erased region in the original image;
As a loss function for training the generative neural network,
The loss function is
The first feature map obtained from the image generated by the generative neural network based on an artificial neural network different from the generative neural network, the original image having the erased area, and the erased image extracted from the new image generated by the generative neural network. When defining a second feature map obtained by synthesizing an image corresponding to a region based on the artificial neural network and a third feature map obtained by using the original image based on the artificial neural network, the first feature map and the first feature map are defined. 3 An image correction method calculated based on a loss for a distance between feature maps and a loss for a distance between the second feature map and the third feature map.

In paragraph 1,
The predicting step is
and predicting the image to be synthesized using the generative neural network.

In paragraph 1,
Prior to generating the mask image, the step of learning the generative adversarial neural network using a large amount of training data is further included,
The large amount of training data,
Original images for training, training mask images including only the erased area, training sketch images in which various shapes are sketched in the erased area, and training color images filled with various colors in the erased area How to edit images.

In paragraph 1,
Prior to generating the mask image, the step of learning the generative adversarial neural network using a large amount of training data is further included,
The step of learning the generative adversarial neural network,
learning a generative neural network using the large amount of training data; and
Training a discriminative neural network to maintain an adversarial relationship with the generative neural network;
The predicting step is
and predicting the image to be synthesized using the generative neural network for which learning has been completed.

In paragraph 4,
Training the generative neural network,
Learning the generative neural network using a loss (L _G ) output from the discriminant neural network;
The loss (L _G ) is,
Image correction method comprising a pixel value (M) of an erased area in the training mask image included in the large amount of training data and a pixel difference value (I _gen -I _gt ) between the original image and the new image. .

In paragraph 4,
Training the generative neural network,
The generative neural network using a pixel value (M) of an erased area in the training mask image included in the large amount of training data and a pixel difference value (I _gen -I _gt ) between the original image and the new image An image correction method that is a step of learning a gated convolution layer included in.

By performing a pre-processing process on the original image, a mask image including only the area erased by the user input in the original image, a sketch image including only the shape sketched in the area erased by the user input, and the user input a pre-processing unit generating a color image including only the color painted on the erased area by;
Using a generative adversarial network including a generative neural network and a discriminant neural network, an image to be synthesized in the erased area is predicted from the mask image, the sketch image, and the color image, and the predicted image an image generating unit generating a new image from the original image by synthesizing the erased area with the erased region; and
A display unit for displaying the new image;
As a loss function for training the generative neural network,
The loss function is
The first feature map obtained from the image generated by the generative neural network based on an artificial neural network different from the generative neural network, the original image having the erased area, and the erased image extracted from the new image generated by the generative neural network. When defining a second feature map obtained by synthesizing an image corresponding to a region based on the artificial neural network and a third feature map obtained by using the original image based on the artificial neural network, the first feature map and the first feature map are defined. 3 An image correction system calculated based on a loss for a distance between feature maps and a loss for a distance between the second feature map and the third feature map.

In paragraph 7,
The image generator,
and predicting the image to be synthesized using the generative neural network.

In paragraph 7,
a learning unit for learning the generative adversarial network using a large amount of training data; and
Further comprising a storage unit for storing the generative adversarial network for which learning has been completed by the learning unit,
The large amount of training data,
Original images for training, training mask images including only the erased area, training sketch images in which various shapes are sketched in the erased area, and training color images filled with various colors in the erased area Image correction system.

In paragraph 9,
The learning unit,
Training the generative adversarial network including a generative neural network and a discriminative neural network to maintain an adversarial relationship with each other;
the storage unit,
An image correction system wherein only the generative neural network for which learning has been completed is stored.

In paragraph 10,
The learning unit,
Learning the generative neural network in the direction of reducing the loss (L _G ) output from the discriminant neural network;
The loss (L _G ) is,
An image correction system comprising a pixel value (M) of an erased area in the training mask image included in the large amount of training data and a pixel difference value (I _gen -I _gt ) between the original image and the new image. .

In paragraph 10,
The learning unit,
The generative neural network using a pixel value (M) of an erased area in the training mask image included in the large amount of training data and a pixel difference value (I _gen -I _gt ) between the original image and the new image An image correction system that trains a gated convolution layer included in.

Learning a generative adversarial network composed of a generative neural network and a discriminant neural network in an antagonistic relationship with each other;
storing the generative neural network for which learning has been completed in a storage unit;
From the original image through a pre-processing process, a mask image including only the area erased by the user input in the original image, a sketch image including only the shape sketched in the area erased by the user input, and the user input generating a color image containing only the color painted on the erased area;
predicting an image to be synthesized in the erased area from the mask image, the sketch image, and the color image using a generative neural network stored in the storage unit; and
generating a new image from the original image by combining the predicted image with the erased region;
As a loss function for training the generative neural network,
The loss function is
The first feature map obtained from the image generated by the generative neural network based on an artificial neural network different from the generative neural network, the original image having the erased area, and the erased image extracted from the new image generated by the generative neural network. When defining a second feature map obtained by synthesizing an image corresponding to a region based on the artificial neural network and a third feature map obtained by using the original image based on the artificial neural network, the first feature map and the first feature map are defined. 3 An image correction method calculated based on a loss for a distance between feature maps and a loss for a distance between the second feature map and the third feature map.

In paragraph 13,
The learning step is
A step of learning a generative neural network and a discriminative neural network to maintain an adversarial relationship with each other,
Training the generative neural network,
In the generative neural network, using the pixel value (M) of the erased area in the training mask image included in the large amount of training data and the pixel difference value (I _gen -I _gt ) between the original image and the new image An image correction method that is a step of learning a gated convolution layer included.

In paragraph 13,
In the step of predicting an image to be synthesized in the erased area,
The image to be synthesized is
The image correction method is a realistic image predicted according to a shape sketched in the erased area.

In paragraph 13,
Wherein the original image is a face image.