KR102240885B1

KR102240885B1 - Method and Apparatus for Conversing Image Based on Generative Adversarial Network

Info

Publication number: KR102240885B1
Application number: KR1020190144426A
Authority: KR
Inventors: 변혜란; 황선희
Original assignee: 연세대학교 산학협력단
Priority date: 2019-11-12
Filing date: 2019-11-12
Publication date: 2021-04-14

Abstract

Disclosed are an image conversion method based on a generative adversarial neural network learning and a device therefor. The image conversion method according to an embodiment of the present invention may comprise: a data input step of receiving source data including a source image and attribute information for conversion from the source image; a generation processing step of generating at least one feature vector based on the source data, and generating and outputting a target image based on the at least one feature vector; and a differential processing step of performing image conversion by processing classification of each of the first feature vector and the second feature vector among the target image and the at least one feature vector. Therefore, the present invention is capable of providing an effect for which image conversion can be handled with fairness.

Description

Method and Apparatus for Conversing Image Based on Generative Adversarial Network {Method and Apparatus for Conversing Image Based on Generative Adversarial Network}

본 발명은 생성적 적대 신경망 학습을 기반으로 영상을 변환하는 방법 및 그를 위한 장치에 관한 것이다. The present invention relates to a method and apparatus for transforming an image based on generative adversarial neural network learning.

이 부분에 기술된 내용은 단순히 본 발명의 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The content described in this section merely provides background information on the embodiments of the present invention and does not constitute the prior art.

인공 지능(AI: Artificial Intelligence) 시스템의 개발이 계속 가속화되면서 교육, 법률, 미디어 또는 농업과 같은 대부분의 연구 분야에서 인적 노동에 대한 수요를 줄이기 위한 연구가 진행되고 있습니다. 특히, 안면 인식 및 물체 감지와 같은 컴퓨터 비전 분야에서 인공지능 알고리즘은 인간의 성능을 능가했으며, 많은 연구자들은 인공지능 시스템이 다양한 연구 분야에서 속도와 정확성 측면에서 인간의 능력을 능가 할 것으로 예상한다. 인공지능 시스템이 고성능으로 동작하게 하는 중요한 요소는 빅 데이터의 증가이며, 데이터의 양과 성능은 비례하는 경향이 있다.As the development of artificial intelligence (AI) systems continues to accelerate, research is underway to reduce the demand for human labor in most fields of study, such as education, law, media or agriculture. In particular, in computer vision fields such as facial recognition and object detection, artificial intelligence algorithms have outperformed humans, and many researchers predict that artificial intelligence systems will outperform humans in terms of speed and accuracy in various research fields. An important factor that makes AI systems operate at high performance is the increase of big data, and the amount and performance of data tend to be proportional.

반면, 인공지능 시스템은 사람이 수집 한 데이터를 기반으로 하며 해당 알고리즘과 응용 프로그램도 사람이 개발하므로 개별 연구자들의 성별, 인종 및 반문화의 차별 또는 편견은 AI 시스템으로 직접 또는 간접적으로 표출될 수 있다. On the other hand, artificial intelligence systems are based on data collected by humans, and the corresponding algorithms and applications are also developed by humans, so discrimination or prejudice between individual researchers' gender, race, and counterculture can be directly or indirectly expressed by the AI system.

편의성, 생산성 및 효율성에만 중점을 둔 기존의 기계 학습 방법과 달리 이러한 편견은 사람이 수집 한 데이터 세트에 포함될 수 있으므로, 인공지능 시스템은 알고리즘의 공정성, 책임 성 및 투명성으로 윤리적 측면을 고려하여 개발되어야 한다.Unlike traditional machine learning methods that focus only on convenience, productivity, and efficiency, these biases can be included in human-collected data sets, so AI systems should be developed with ethical aspects in mind with the fairness, accountability and transparency of algorithms. do.

머신 러닝 및 AI 시스템에 대한 최근의 연구에 따르면 윤리적으로 불공정한 결과가 의사 결정에 발생할 수 있으며, 성별, 인종, 연령 또는 지위 등과 같은 다양한 요인에 대한 편견을 완화하기 위해 공정한 알고리즘에 대한 연구가 진행되고 있다. Recent research on machine learning and AI systems suggests that ethical unfair results can occur in decision making, and studies on fair algorithms are underway to mitigate biases on various factors such as gender, race, age or status. Has become.

편향된 데이터에서 얻은 알고리즘이 의도적이거나 의도하지 않은 차별을 생성 할 수 있지만 의사 결정 모델은 다양한 응용 분야에서 널리 사용된다. 특히, 연구 분야 중에서 컴퓨터 비전 알고리즘은 방법의 공정성을 고려하지 않고 모바일 응용 프로그램, 비디오 감시 시스템 또는 운전 보조 도구 등과 같이 일상 생활에서 접할 수 있는 다양한 시스템에 많이 적용된다. 또한 연구자들은 음성을 통한 얼굴 이미지 생성, 얼굴 이미지를 사용한 범죄의 자동 추론 시스템 또는 얼굴 이미지의 성적 방향 인식 방법과 같은 컴퓨터 비전 알고리즘을 개발하고 있다. 높은 인식 정확도에도 불구하고 이러한 인공지능 알고리즘은 필연적으로 윤리적 문제를 피해야 하고 미래의 인공지능 시스템의 개발을 두려워하게 만든다. 이러한 이유로 알고리즘의 공정성, 책임 성 및 투명성을 보장하는 신뢰할 수 있는 인공지능 기술에 대한 관심이 커지고 있다.Although algorithms obtained from biased data can produce intentional or unintended discrimination, decision models are widely used in a variety of applications. In particular, among the research fields, computer vision algorithms are widely applied to various systems that can be encountered in everyday life, such as mobile applications, video surveillance systems, or driving assistance tools, without considering the fairness of the method. In addition, researchers are developing computer vision algorithms such as generating face images through voice, automatic inference systems for crimes using face images, or methods for recognizing the sexual orientation of face images. Despite the high recognition accuracy, these artificial intelligence algorithms inevitably avoid ethical issues and make the development of future artificial intelligence systems fearful. For this reason, there is a growing interest in reliable artificial intelligence technology that guarantees fairness, accountability and transparency of algorithms.

다양한 연구 분야 중에서 컴퓨터 비전의 공정하게 학습 또는 동작을 수행하는 방법에 대한 연구가 진행되고 있다. 예를 들어, 이미지에서 이미지로의 변환은 스케치를 컬러 이미지로 또는 실제 이미지를 이미지로 매핑하는 등 소스 도메인 이미지를 대상 도메인으로 매핑하는 작업이며, 지도 학습 및 비지도 학습 방식을 통해 성공적인 결과로 이미지 매핑의 성능을 높이려는 다양한 연구가 진행되고 있다. Among various research fields, research is being conducted on how to fairly learn or perform operations in computer vision. For example, image-to-image conversion is an operation of mapping a source domain image to a target domain, such as mapping a sketch to a color image or an actual image to an image. Various studies are being conducted to improve the performance of mapping.

그러나, 도 1a 및 도 1b에 도시된 바와 같이, 데이터 세트가 바이어스되는 경우 종래의 방법에서 불공정 매핑 결과가 쉽게 발생할 수 있다. 종래의 방법은 얼굴 속성을 편집하도록 설계되었지만, 여러 속성이 대부분 한 성별의 이미지에 포함되므로 대상 속성 편집 대신 메이크업 적용 또는 콧수염 추가와 같은 원치 않는 정보를 매핑하는 문제가 발생한다. However, as shown in FIGS. 1A and 1B, when the data set is biased, unfair mapping results may easily occur in the conventional method. Although the conventional method is designed to edit face attributes, since most of the attributes are included in images of one gender, there arises a problem of mapping unwanted information such as applying makeup or adding a mustache instead of editing target attributes.

다시 말해, 종래의 일반적인 데이터 변환 기술은 보호변수를 고려하지 않고, 불공정한 스타일 변환을 수행한다(StarGAN). 도 1c를 참조하면, 입력 영상의 스타일을 대머리로 변환하는 예시의 경우, 기존 기술을 이용하면 대머리를 생성함과 동시에 얼굴을 남성으로 변형하게 된다(예: 수염, 눈모양 등을 변경). 한편, 반대로 대머리를 제거하는 경우, 여성 얼굴(예: 메이크업 등) 생성하는 결과 초래하게 된다. 이에, 공정성(Fairness)을 고려한 데이터 변환 기술이 필요하다. In other words, the conventional general data conversion technology performs unfair style conversion without considering the protection variable (StarGAN). Referring to FIG. 1C, in the case of converting the style of an input image to baldness, using an existing technology creates baldness and transforms a face into a man (eg, changing a beard, eye shape, etc.). On the other hand, if baldness is removed, on the other hand, it results in the creation of a female face (eg, makeup). Accordingly, a data conversion technology in consideration of fairness is required.

본 발명은 생성자와 생성자 연동하는 적어도 3 개의 감별자를 이용하여 보호변수(Protected Attribute)에 대해 공정성으로 고려하여 이미지를 변환하는 생성적 적대 신경망 학습 기반의 이미지 변환 방법 및 그를 위한 장치를 제공하는 데 주된 목적이 있다.The present invention is to provide an image conversion method based on generative adversarial neural network learning and an apparatus for converting an image by considering fairness for a protected variable (Protected Attribute) using at least three discriminators interlocking with the creator. There is a purpose.

본 발명의 일 측면에 의하면, 상기 목적을 달성하기 위한 이미지 변환 방법은, 소스 이미지 및 상기 소스 이미지에서 변환을 위한 속성정보를 포함하는 소스 데이터를 입력 받는 데이터 입력 단계; 상기 소스 데이터를 기반으로 적어도 하나의 특징 벡터를 생성하고, 상기 적어도 하나의 특징 벡터를 기반으로 대상 이미지를 생성하여 출력하는 생성 처리 단계; 및 상기 대상 이미지와 상기 적어도 하나의 특징 벡터 중 제1 특징 벡터 및 제2 특징 벡터 각각에 대한 분류를 처리하여 이미지 변환이 수행되도록 하는 감별 처리 단계를 포함할 수 있다. According to an aspect of the present invention, an image conversion method for achieving the above object includes: a data input step of receiving source data including a source image and attribute information for conversion from the source image; A generation processing step of generating at least one feature vector based on the source data and generating and outputting a target image based on the at least one feature vector; And a discrimination processing step of performing image conversion by processing the classification of each of the first feature vector and the second feature vector among the target image and the at least one feature vector.

또한, 본 발명의 다른 측면에 의하면, 상기 목적을 달성하기 위한 이미지 변환 장치는, 하나 이상의 프로세서; 및 상기 프로세서에 의해 실행되는 하나 이상의 프로그램을 저장하는 메모리를 포함하며, 상기 프로그램들은 하나 이상의 프로세서에 의해 실행될 때, 상기 하나 이상의 프로세서들에서, 소스 이미지 및 상기 소스 이미지에서 변환을 위한 속성정보를 포함하는 소스 데이터를 입력 받는 데이터 입력 단계; 상기 소스 데이터를 기반으로 적어도 하나의 특징 벡터를 생성하고, 상기 적어도 하나의 특징 벡터를 기반으로 대상 이미지를 생성하여 출력하는 생성 처리 단계; 및 상기 대상 이미지와 상기 적어도 하나의 특징 벡터 중 제1 특징 벡터 및 제2 특징 벡터 각각에 대한 분류를 처리하여 이미지 변환이 수행되도록 하는 감별 처리 단계를 포함할 수 있다. In addition, according to another aspect of the present invention, an image conversion apparatus for achieving the above object includes: at least one processor; And a memory for storing one or more programs executed by the processor, wherein when the programs are executed by one or more processors, the one or more processors include a source image and attribute information for conversion from the source image. A data input step of receiving source data to be input; A generation processing step of generating at least one feature vector based on the source data and generating and outputting a target image based on the at least one feature vector; And a discrimination processing step of performing image conversion by processing the classification of each of the first feature vector and the second feature vector among the target image and the at least one feature vector.

또한, 본 발명의 다른 측면에 의하면, 상기 목적을 달성하기 위한 이미지 변환 방법은, 소스 이미지 및 상기 소스 이미지에서 변환을 위한 속성정보를 포함하는 소스 데이터를 입력 받고, 상기 소스 데이터의 제1 특징 벡터를 기반으로 학습된 제1 학습 결과, 상기 소스 데이터의 제2 특징 벡터를 기반으로 학습된 제2 학습 결과 및 상기 제1 특징 벡터와 상기 제2 특징 벡터를 결합한 결합 특징 벡터를 기반으로 학습된 제3 학습 결과를 적용하여 상기 소스 이미지를 변환하여 출력할 수 있다.In addition, according to another aspect of the present invention, an image conversion method for achieving the above object includes receiving source data including a source image and attribute information for conversion from the source image, and receiving a first feature vector of the source data. A first learning result learned based on, a second learning result learned based on a second feature vector of the source data, and a first learned based on a combined feature vector that combines the first feature vector and the second feature vector. 3 The source image may be converted and output by applying the learning result.

이상에서 설명한 바와 같이, 본 발명은 공정성을 고려한 이미지 변환을 처리할 수 있는 효과가 있다. As described above, the present invention has an effect of processing image conversion in consideration of fairness.

또한, 본 발명은 모바일 어플리케이션 등을 통해 얼굴 영상의 스타일 변환을 정확하게 수행할 수 있는 효과가 있다. In addition, the present invention has the effect of accurately performing a style conversion of a face image through a mobile application or the like.

또한, 본 발명은 성별 등과 같은 보호 변수에 차별적인 결과를 초래하는 기계학습의 단점을 보완할 수 있는 효과가 있다. In addition, the present invention has an effect of compensating for the disadvantages of machine learning that cause discriminatory results in protection variables such as gender.

또한, 본 발명은 특정 성별 또는 인종 등에 대한 데이터가 부족한 기계학습 기술의 학습 결과의 정확도를 향상시킬 수 있는 효과가 있다. In addition, the present invention has an effect of improving the accuracy of learning results of machine learning technology that lacks data on a specific gender or race.

도 1a 내지 도 1c는 종래의 이미지 변환 결과물을 나타낸 도면이다.
도 2는 본 발명의 실시예에 따른 생성적 적대 신경망 학습 기반의 이미지 변환 장치를 개략적으로 나타낸 블록 구성도이다.
도 3은 본 발명의 실시예에 따른 프로세서의 학습 동작 구성을 개략적으로 나타낸 블록 구성도이다.
도 4는 본 발명의 실시예에 따른 이미지 변환 장치의 학습 동작을 설명하기 위한 예시도이다.
도 5는 본 발명의 실시예에 따른 프로세서의 이미지 변환 동작 구성을 개략적으로 나타낸 블록 구성도이다.
도 6은 본 발명의 실시예에 따른 이미지 변환 장치의 이미지 변환 동작을 설명하기 위한 예시도이다.
도 7은 본 발명의 실시예에 따른 이미지 변환 장치의 학습 단계 별 결과물을 나타낸 예시도이다.
도 8은 본 발명의 실시예에 따른 인코더의 동작 구성을 나타낸 도면이다.
도 9는 본 발명의 실시예에 따른 디코더의 동작 구성을 나타낸 도면이다.
도 10은 본 발명의 실시예에 따른 제1 감별자의 동작 구성을 나타낸 도면이다.
도 11은 본 발명의 실시예에 따른 제2 감별자의 동작 구성을 나타낸 도면이다.
도 12는 본 발명의 실시예에 따른 제3 감별자의 동작 구성을 나타낸 도면이다.
도 13은 본 발명의 실시예에 따른 이미지 변환 방법을 설명하기 위한 순서도이다. 1A to 1C are diagrams showing conventional image conversion results.
2 is a block diagram schematically showing an image conversion apparatus based on learning a generative adversarial neural network according to an embodiment of the present invention.
3 is a block diagram schematically showing a configuration of a learning operation of a processor according to an embodiment of the present invention.
4 is an exemplary view for explaining a learning operation of the image conversion apparatus according to an embodiment of the present invention.
5 is a block diagram schematically illustrating a configuration of an image conversion operation of a processor according to an embodiment of the present invention.
6 is an exemplary diagram for explaining an image conversion operation of the image conversion apparatus according to an embodiment of the present invention.
7 is an exemplary view showing a result of each learning step of the image conversion apparatus according to an embodiment of the present invention.
8 is a diagram showing an operation configuration of an encoder according to an embodiment of the present invention.
9 is a diagram showing an operation configuration of a decoder according to an embodiment of the present invention.
10 is a diagram showing an operation configuration of a first discriminator according to an embodiment of the present invention.
11 is a diagram showing an operation configuration of a second discriminator according to an embodiment of the present invention.
12 is a diagram showing an operation configuration of a third discriminator according to an embodiment of the present invention.
13 is a flowchart illustrating an image conversion method according to an embodiment of the present invention.

이하, 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다. 또한, 이하에서 본 발명의 바람직한 실시예를 설명할 것이나, 본 발명의 기술적 사상은 이에 한정하거나 제한되지 않고 당업자에 의해 변형되어 다양하게 실시될 수 있음은 물론이다. 이하에서는 도면들을 참조하여 본 발명에서 제안하는 생성적 적대 신경망 학습 기반의 이미지 변환 방법 및 그를 위한 장치에 대해 자세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In describing the present invention, if it is determined that a detailed description of a related known configuration or function may obscure the subject matter of the present invention, a detailed description thereof will be omitted. In addition, a preferred embodiment of the present invention will be described below, but the technical idea of the present invention is not limited or limited thereto, and may be modified and variously implemented by a person skilled in the art. Hereinafter, a method for transforming an image based on learning a generative adversarial neural network proposed by the present invention and an apparatus therefor will be described in detail with reference to the drawings.

도 2는 본 발명의 실시예에 따른 생성적 적대 신경망 학습 기반의 이미지 변환 장치를 개략적으로 나타낸 블록 구성도이다.2 is a block diagram schematically showing an image conversion apparatus based on learning a generative adversarial neural network according to an embodiment of the present invention.

본 실시예에 따른 이미지 변환 장치(100)는 입력부(110), 출력부(120), 프로세서(130), 메모리(140) 및 데이터 베이스(150)를 포함한다. 도 2의 이미지 변환 장치(100)는 일 실시예에 따른 것으로서, 도 2에 도시된 모든 블록이 필수 구성요소는 아니며, 다른 실시예에서 이미지 변환 장치(100)에 포함된 일부 블록이 추가, 변경 또는 삭제될 수 있다. 한편, 이미지 변환 장치(100)는 컴퓨팅 디바이스로 구현될 수 있고, 이미지 변환 장치(100)에 포함된 각 구성요소들은 각각 별도의 소프트웨어 장치로 구현되거나, 소프트웨어가 결합된 별도의 하드웨어 장치로 구현될 수 있다.The image conversion apparatus 100 according to the present embodiment includes an input unit 110, an output unit 120, a processor 130, a memory 140, and a database 150. The image conversion apparatus 100 of FIG. 2 is according to an embodiment, and not all blocks shown in FIG. 2 are essential components, and some blocks included in the image conversion apparatus 100 are added or changed in other embodiments. Or it can be deleted. Meanwhile, the image conversion apparatus 100 may be implemented as a computing device, and each component included in the image conversion apparatus 100 may be implemented as a separate software device or a separate hardware device combined with software. I can.

이미지 변환 장치(100)는 소스 이미지 및 소스 이미지에서 변환을 위한 속성정보를 포함하는 소스 데이터를 입력 받고, 소스 데이터를 생성자와 생성자 연동하는 적어도 3 개의 감별자를 이용하여 보호변수(Protected Attribute)에 대해 공정성으로 고려하여 학습하여 이미지를 변환하는 동작을 수행한다. The image conversion apparatus 100 receives source data including attribute information for conversion from the source image and the source image, and uses at least three discriminators that link the source data with the creator to determine the protected attribute. Considering fairness, it learns and converts the image.

입력부(110)는 이미지 변환 장치(100)의 이미지 변환 동작을 수행하기 위한 신호 또는 데이터를 입력하거나 획득하는 수단을 의미한다. 입력부(110)는 프로세서(130)와 연동하여 다양한 형태의 신호 또는 데이터를 입력하거나, 외부 장치와 연동하여 직접 데이터를 획득하여 프로세서(130)로 전달할 수도 있다. 여기서, 입력부(110)는 소스 이미지, 속성정보 등을 입력하기 위한 모듈로 구현될 수 있으나 반드시 이에 한정되는 것은 아니다. The input unit 110 refers to a means for inputting or acquiring a signal or data for performing an image conversion operation of the image conversion apparatus 100. The input unit 110 may interlock with the processor 130 to input various types of signals or data, or interwork with an external device to directly acquire data and transmit the data to the processor 130. Here, the input unit 110 may be implemented as a module for inputting a source image, attribute information, and the like, but is not limited thereto.

출력부(120)는 프로세서(130)와 연동하여 특징 벡터의 학습결과, 대상 이미지의 학습 결과, 이미지 변환의 결과 등 다양한 정보를 표시할 수 있다. 출력부(120)는 이미지 변환 장치(100)에 구비된 디스플레이(미도시)를 통해 다양한 정보를 표시하는 것이 바람직하나 반드시 이에 한정되는 것은 아니다. The output unit 120 may interwork with the processor 130 to display a variety of information such as a learning result of a feature vector, a learning result of a target image, and a result of image conversion. The output unit 120 preferably displays a variety of information through a display (not shown) provided in the image conversion apparatus 100, but is not limited thereto.

프로세서(130)는 메모리(140)에 포함된 적어도 하나의 명령어 또는 프로그램을 실행시키는 기능을 수행한다.The processor 130 performs a function of executing at least one instruction or program included in the memory 140.

본 실시예에 따른 프로세서(130)는 입력부(110) 또는 데이터 베이스(150)로부터 획득한 소스 데이터를 기반으로 기계학습을 수행하고, 기계학습 결과를 기반으로 소스 이미지의 이미지를 변환하는 동작을 수행한다. The processor 130 according to the present embodiment performs machine learning based on the source data obtained from the input unit 110 or the database 150, and performs an operation of converting the image of the source image based on the machine learning result. do.

프로세서(130)는 소스 이미지 및 속성 정보를 포함하는 소스 데이터를 입력 받고, 소스 데이터를 기반으로 제1 특징 벡터, 제2 특징 벡터, 결합 특징 벡터를 생성한다. 또한, 프로세서(130)는 결합 특징 벡터를 기반으로 소스 이미지의 진위 여부를 판별하기 위한 대상 이미지를 생성한다. 프로세서(130)는 제1 특징 벡터, 제2 특징 벡터 및 대상 이미지 각각에 대해 서로 다른 감별자를 통해 분류를 처리하여 소스 이미지의 변환이 수행되도록 한다. The processor 130 receives source data including a source image and attribute information, and generates a first feature vector, a second feature vector, and a combined feature vector based on the source data. Also, the processor 130 generates a target image for determining whether the source image is authentic or not based on the combined feature vector. The processor 130 processes the classification for each of the first feature vector, the second feature vector, and the target image through different discriminators so that the source image is transformed.

본 실시예에 따른 프로세서(130)의 자세한 동작은 도 3 내지 6에서 설명하도록 한다. Detailed operations of the processor 130 according to the present embodiment will be described with reference to FIGS. 3 to 6.

메모리(140)는　프로세서(130)에 의해 실행 가능한 적어도 하나의 명령어 또는 프로그램을 포함한다. 메모리(140)는　이미지를 생성하는 동작, 특징 벡터 또는 이미지를 분류하는 동작, 이미지를 변환하는 동작 등을 위한 명령어 또는 프로그램을 포함할 수 있다. 또한, 메모리(140)는　학습 결과를 적용하는 동작, 화자를 분리하는 동작 등을 위한 명령어 또는 프로그램을 포함할 수 있다. The memory 140 includes at least one instruction or program executable by the processor 130. The memory 140 may include a command or program for an operation of generating an image, an operation of classifying a feature vector or image, an operation of converting an image, and the like. In addition, the memory 140 may include a command or program for an operation of applying a learning result, an operation of separating a speaker, and the like.

데이터 베이스(150)는 데이터베이스 관리 프로그램(DBMS)을 이용하여 컴퓨터 시스템의 저장공간(하드디스크 또는 메모리)에 구현된 일반적인 데이터구조를 의미하는 것으로, 데이터의 검색(추출), 삭제, 편집, 추가 등을 자유롭게 행할 수 있는 데이터 저장형태를 뜻하는 것으로, 오라클(Oracle), 인포믹스(Infomix), 사이베이스(Sybase), DB2와 같은 관계형 데이타베이스 관리 시스템(RDBMS)이나, 겜스톤(Gemston), 오리온(Orion), O2 등과 같은 객체 지향 데이타베이스 관리 시스템(OODBMS) 및 엑셀론(Excelon), 타미노(Tamino), 세카이주(Sekaiju) 등의 XML 전용 데이터베이스(XML Native Database)를 이용하여 본 발명의 일 실시예의 목적에 맞게 구현될 수 있고, 자신의 기능을 달성하기 위하여 적당한 필드(Field) 또는 엘리먼트들을 가지고 있다.The database 150 refers to a general data structure implemented in a storage space (hard disk or memory) of a computer system using a database management program (DBMS), and searches (extracts), deletes, edits, and adds data. It refers to a data storage format in which you can freely perform data storage, such as Oracle, Infomix, Sybase, and a relational database management system (RDBMS) such as DB2, or Gemston, Orion. Orion), O2, and other object-oriented database management system (OODBMS), Excelon (Excelon), Tamino (Tamino), Sekaiju (XML Native Database), such as using a dedicated XML database (XML Native Database) of an embodiment of the present invention. It can be implemented according to the purpose, and has appropriate fields or elements to achieve its own function.

본 실시예에 따른 데이터베이스(400)는 이미지 변환과 관련된 데이터를 저장하고, 기 저장된 이미지 변환과 관련된 데이터를 제공할 수 있다. The database 400 according to the present embodiment may store data related to image conversion and may provide previously stored data related to image conversion.

데이터베이스(400)에 저장된 데이터는 소스 이미지, 속성 정보, 특징 벡터, 학습 결과 등에 대한 데이터일 수 있다. 데이터베이스(140)는 이미지 변환 장치(100) 내에 구현되는 것으로 기재하고 있으나 반드시 이에 한정되는 것은 아니며, 별도의 데이터 저장장치로 구현될 수도 있다.The data stored in the database 400 may be data on a source image, attribute information, feature vectors, and learning results. The database 140 is described as being implemented in the image conversion device 100, but is not limited thereto, and may be implemented as a separate data storage device.

도 3은 본 발명의 실시예에 따른 프로세서의 학습 동작 구성을 개략적으로 나타낸 블록 구성도이다. 3 is a block diagram schematically showing a configuration of a learning operation of a processor according to an embodiment of the present invention.

본 실시예에 따른 이미지 변환 장치(100)에 포함된 프로세서(130)는 기계 학습을 기반으로 이미지 변환을 처리하는 동작을 수행한다. 여기서, 기계 학습은 생성적 적대 신경망(GAN: Generative Adversarial Network)을 이용한 학습인 것이 바람직하나 반드시 이에 한정되는 것은 아니다. The processor 130 included in the image conversion apparatus 100 according to the present embodiment performs an operation of processing image conversion based on machine learning. Here, machine learning is preferably learning using a generative adversarial network (GAN), but is not limited thereto.

이미지 변환 장치(100)에 포함된 프로세서(130)는 소스 이미지 및 속성 정보를 포함하는 소스 데이터를 입력 받아 제1 특징 벡터, 제2 특징 벡터, 대상 이미지를 생성하여 출력하는 모델 및 제1 특징 벡터, 제2 특징 벡터, 대상 이미지 각각에 대해 서로 다른 감별자를 통해 분류를 처리하는 모델 등을 기반으로 이미지 변환 동작이 수행되도록 하며, 이미지 변환을 수행하는 모든 기기 및 소프트웨어에 탑재될 수 있다. The processor 130 included in the image conversion device 100 receives source data including a source image and attribute information, and generates and outputs a first feature vector, a second feature vector, and a target image, and a first feature vector. , The image conversion operation is performed based on the second feature vector, a model that processes classification through different discriminators for each of the target images, and may be installed in all devices and software that perform image conversion.

본 실시예에 따른 프로세서(130)는 생성 처리부(200) 및 감별 처리부(202)를 포함한다. 여기서, 생성 처리부(200)는 인코더(210), 특징 벡터 처리부(220) 및 디코더(230)를 포함하고, 감별 처리부(202)는 제1 감별자(240), 제2 감별자(250) 및 제3 감별자(260)를 포함할 수 있다. 도 3의 프로세서(130)는 일 실시예에 따른 것으로서, 도 3에 도시된 모든 블록이 필수 구성요소는 아니며, 다른 실시예에서 프로세서(130)에 포함된 일부 블록이 추가, 변경 또는 삭제될 수 있다. 한편, 프로세서(130)에 포함된 각 구성요소들은 각각 별도의 소프트웨어 장치로 구현되거나, 소프트웨어가 결합된 별도의 하드웨어 장치로 구현될 수 있다.The processor 130 according to the present embodiment includes a generation processing unit 200 and a discrimination processing unit 202. Here, the generation processing unit 200 includes an encoder 210, a feature vector processing unit 220, and a decoder 230, and the differentiation processing unit 202 includes a first discriminator 240, a second discriminator 250, and A third discriminator 260 may be included. The processor 130 of FIG. 3 is according to an embodiment, and not all blocks shown in FIG. 3 are essential components, and some blocks included in the processor 130 may be added, changed, or deleted in other embodiments. have. Meanwhile, each component included in the processor 130 may be implemented as a separate software device, or may be implemented as a separate hardware device combined with software.

생성 처리부(200)는 소스 이미지 및 소스 이미지에서 변환을 위한 속성정보를 포함하는 소스 데이터를 입력 받고, 소스 데이터를 기반으로 적어도 하나의 특징 벡터를 생성하고, 적어도 하나의 특징 벡터를 기반으로 대상 이미지를 생성하여 출력한다. 이하, 생성 처리부(200)에 포함된 구성요소 각각에 대해 설명하도록 한다. The generation processing unit 200 receives source data including a source image and attribute information for transformation from the source image, generates at least one feature vector based on the source data, and generates a target image based on the at least one feature vector. And print it. Hereinafter, each of the components included in the generation processing unit 200 will be described.

인코더(210)는 초기 이미지 및 속성정보를 기반으로 제1 특징 벡터 및 제2 특징 벡터를 생성하고, 제1 특징 벡터 및 제2 특징 벡터가 결합된 결합 특징 벡터를 출력한다. The encoder 210 generates a first feature vector and a second feature vector based on the initial image and attribute information, and outputs a combined feature vector in which the first feature vector and the second feature vector are combined.

특징 벡터 처리부(220)는 제1 특징 벡터 및 제2 특징 벡터를 입력 받고, 제1 특징 벡터 및 제2 특징 벡터 각각을 서로 다른 감별자로 출력하는 동작을 수행한다. The feature vector processor 220 receives the first feature vector and the second feature vector and outputs each of the first feature vector and the second feature vector to different discriminators.

특징 벡터 처리부(220)는 인코더(210)로부터 제1 특징 벡터 및 제2 특징 벡터를 입력 받을 수 있으나 반드시 이에 한정되는 것은 아니다. 예를 들어, 특징 벡터 처리부(220)는 인코더(210)로부터 결합 특징 벡터를 입력 받고, 결합 특징 벡터에 포함된 제1 특징 벡터 및 제2 특징 벡터를 추출할 수도 있다. The feature vector processing unit 220 may receive the first feature vector and the second feature vector from the encoder 210, but is not limited thereto. For example, the feature vector processor 220 may receive a combined feature vector from the encoder 210 and extract a first feature vector and a second feature vector included in the combined feature vector.

특징 벡터 처리부(220)는 제1 특징 벡터를 제1 감별자(240)로 전달하고, 제2 특징 벡터를 제2 감별자(250)로 전달한다. 또한, 특징 벡터 처리부(220)는 제1 특징 벡터 및 제2 특징 벡터를 결합한 결합 특징 벡터를 디코더(230)로 전달한다. The feature vector processing unit 220 transfers the first feature vector to the first discriminator 240 and the second feature vector to the second discriminator 250. In addition, the feature vector processing unit 220 transmits a combined feature vector obtained by combining the first feature vector and the second feature vector to the decoder 230.

한편, 특징 벡터 처리부(220)는 인코더(210)와 별도의 구성인 것으로 기재하고 있으나 반드시 이에 한정되는 것은 아니며, 인코더(210)에 포함된 특징 벡터 처리 모듈 또는 기능으로 구현될 수 있다. Meanwhile, the feature vector processing unit 220 is described as being a separate component from the encoder 210, but is not limited thereto, and may be implemented as a feature vector processing module or a function included in the encoder 210.

디코더(230)는 결합 특징 벡터를 입력받고, 결합 특징 벡터를 기반으로 속성정보가 변환된 대상 이미지를 생성하여 출력하는 동작을 수행한다. 디코더(230)는 생성된 대상 이미지를 제3 감별자(260)로 전달한다. The decoder 230 receives a combined feature vector and generates and outputs a target image from which attribute information is converted based on the combined feature vector. The decoder 230 transmits the generated target image to the third discriminator 260.

생성 처리부(200)는 기본적으로 오토인코더(AE: AutoEncoder)의 구조로 구현될 수 있다. 예를 들어, 생성 처리부(200)에서 인코더(210)는 오토인코더(AE)의 인코더(Encoder)와 대응되는 동작을 수행하고, 디코더(230)는 오토인코더(AE)의 디코더(Decoder)에 대응되는 동작을 수행할 수 있다. The generation processing unit 200 may be basically implemented in a structure of an auto encoder (AE). For example, in the generation processing unit 200, the encoder 210 performs an operation corresponding to the encoder of the autoencoder (AE), and the decoder 230 corresponds to the decoder of the autoencoder (AE). You can perform an operation that is

감별 처리부(202)는 대상 이미지와 적어도 하나의 특징 벡터 중 제1 특징 벡터 및 제2 특징 벡터 각각에 대한 분류를 처리하여 이미지 변환이 수행되도록 한다. 이하, 감별 처리부(202)에 포함된 구성요소 각각에 대해 설명하도록 한다. The discrimination processing unit 202 processes the classification for each of the first feature vector and the second feature vector among the target image and at least one feature vector to perform image conversion. Hereinafter, each of the components included in the discrimination processing unit 202 will be described.

제1 감별자(240)는 특징 벡터 처리부(220)로부터 적어도 하나의 특징 벡터 중 제1 특징 벡터를 입력 받고, 제1 특징 벡터에 대한 분류를 처리한다. 구체적으로, 제1 감별자(240)는 제1 특징 벡터를 입력 받고, 소스 이미지의 제1 특징 벡터가 참 신호에 해당하도록 분류를 처리하고, 소스 이미지의 제2 특징 벡터 또는 제1 특징 벡터를 제외한 나머지 특징벡터가 거짓 신호에 해당하도록 분류를 처리한다. The first discriminator 240 receives a first feature vector from among at least one feature vector from the feature vector processor 220 and processes the classification of the first feature vector. Specifically, the first discriminator 240 receives the first feature vector, processes the classification so that the first feature vector of the source image corresponds to the true signal, and determines the second feature vector or the first feature vector of the source image. Classification is processed so that the other feature vectors, which are excluded, correspond to false signals.

본 실시예에 따른 제1 감별자(240)는 생성 처리부(200)와 연동하여 제1 특징 벡터가 참 신호에 해당하도록 분류하기 위하여 생성적 적대 신경망(GAN: Generative Adversarial Network)을 기반으로 학습을 수행할 수 있다. The first discriminator 240 according to the present embodiment performs learning based on a generative adversarial network (GAN) in order to classify the first feature vector to correspond to a true signal in connection with the generation processor 200. You can do it.

제2 감별자(250)는 특징 벡터 처리부(220)로부터 적어도 하나의 특징 벡터 중 제2 특징 벡터를 입력 받고, 제2 특징 벡터에 대한 분류를 처리한다. 구체적으로, 제2 감별자(250)는 제2 특징 벡터를 입력 받고, 소스 이미지의 제2 특징 벡터가 참 신호에 해당하도록 분류를 처리하고, 소스 이미지의 제1 특징 벡터 또는 제2 특징 벡터를 제외한 나머지 특징벡터가 거짓 신호에 해당하도록 분류를 처리한다. The second discriminator 250 receives a second feature vector from among at least one feature vector from the feature vector processor 220 and processes the classification of the second feature vector. Specifically, the second discriminator 250 receives the second feature vector, processes the classification so that the second feature vector of the source image corresponds to a true signal, and determines the first feature vector or the second feature vector of the source image. Classification is processed so that the other feature vectors, which are excluded, correspond to false signals.

본 실시예에 따른 제2 감별자(250)는 생성 처리부(200)와 연동하여 제2 특징 벡터가 참 신호에 해당하도록 분류하기 위하여 생성적 적대 신경망(GAN: Generative Adversarial Network)을 기반으로 학습을 수행할 수 있다. The second discriminator 250 according to the present embodiment performs learning based on a generative adversarial network (GAN) in order to classify the second feature vector to correspond to a true signal in connection with the generation processor 200. You can do it.

제3 감별자(260)는 디코더(230)로부터 대상 이미지를 입력 받고, 대상 이미지에 대한 분류를 처리한다. 구체적으로, 제3 감별자(260)는 대상 이미지에 대한 제1 특징 벡터 및 제2 특징 벡터를 참 신호에 해당하도록 분류하고, 소스 이미지와 대상 이미지를 비교하여 대상 이미지의 진위 여부(Real or Fake)에 대한 분류를 처리한다. The third discriminator 260 receives the target image from the decoder 230 and processes the classification of the target image. Specifically, the third discriminator 260 classifies the first feature vector and the second feature vector for the target image to correspond to a true signal, and compares the source image and the target image to determine whether the target image is authentic or not (Real or Fake). ) To handle classification.

본 실시예에 따른 제3 감별자(260)는 생성 처리부(200)와 연동하여 제1 특징 벡터 및 제2 특징 벡터가 참 신호에 해당하도록 분류하기 위하여 생성적 적대 신경망(GAN: Generative Adversarial Network)을 기반으로 학습을 수행할 수 있다. The third discriminator 260 according to the present embodiment is a generative adversarial network (GAN) in order to classify the first feature vector and the second feature vector to correspond to the true signal in connection with the generation processing unit 200. You can perform learning based on.

도 4는 본 발명의 실시예에 따른 이미지 변환 장치의 학습 동작을 설명하기 위한 예시도이다. 4 is an exemplary view for explaining a learning operation of the image conversion apparatus according to an embodiment of the present invention.

도 4를 참조하면, 이미지 변환 장치(100)에 포함된 인코더(210)는 소스 데이터(입력 데이터)의 공정한 표현을 수행하기 위한 동작을 수행한다. 즉, 인코더(210)는 얼굴 영상에 대한 소스 이미지와 소스 이미지에서 변환할 속성정보를 포함하는 소스 데이터를 입력 받고, 소스 데이터를 두 개의 특징 벡터(제1 특징 벡터 및 제2 특징 벡터)로 출력한다. 여기서, 인코더(210)는 복수의 컨볼루션(Convolution)으로 구성될 수 있다. 본 실시예에 따른 특징 벡터 처리부(220)는 인코더(210)에 포함된 형태로 구현될 수 있다. Referring to FIG. 4, an encoder 210 included in the image conversion apparatus 100 performs an operation for performing a fair representation of source data (input data). That is, the encoder 210 receives a source image for a face image and source data including attribute information to be converted from the source image, and outputs the source data as two feature vectors (a first feature vector and a second feature vector). do. Here, the encoder 210 may be configured with a plurality of convolutions. The feature vector processing unit 220 according to the present embodiment may be implemented in a form included in the encoder 210.

이미지 변환 장치(100)에 포함된 디코더(230)는 공정성을 고려한 특징 벡터로부터 스타일 변환을 수행하는 동작을 수행한다. 디코더(230)는 두 개의 특징 벡터(제1 특징 벡터 및 제2 특징 벡터)를 입력 받고, 특징 벡터를 기반으로 속성 정보가 변환된 얼굴 영상에 대한 대상 이미지를 출력한다. The decoder 230 included in the image conversion apparatus 100 performs an operation of performing style conversion from a feature vector in consideration of fairness. The decoder 230 receives two feature vectors (a first feature vector and a second feature vector) and outputs a target image for a face image whose attribute information has been converted based on the feature vector.

이미지 변환 장치(100)에 포함된 제1 감별자(240)는 인코더(210)로부터 출력된 특징 벡터의 성별정보 보존을 위한 성별 정보 보존 감별자를 의미한다. 제1 감별자(240)는 인코더(210)로부터 출력된 제1 특징 벡터(φ1)를 입력 받고, 인코더(210)에 입력된 소스 이미지의 얼굴 영상의 제1 특징(성별 정보)를 정확하게 분류하고, 변환을 위한 속성정보(제2 특징 또는 제1 특징을 제외한 나머지 특징)는 잘못 분류하는 동작을 수행한다. The first discriminator 240 included in the image conversion device 100 refers to a gender information preservation discriminator for preserving gender information of the feature vector output from the encoder 210. The first discriminator 240 receives the first feature vector φ1 output from the encoder 210, and accurately classifies the first feature (gender information) of the face image of the source image input to the encoder 210. , The attribute information for transformation (remaining features excluding the second feature or the first feature) is erroneously classified.

이미지 변환 장치(100)에 포함된 제2 감별자(250)는 인코더(210)로부터 출력된 특징 벡터의 변환 스타일 정보 보존을 위한 속성 정보 보존 감별자를 의미한다. 제2 감별자(250)는 인코더(210)로부터 출력된 제2 특징 벡터(φ2)를 입력 받고, 인코더(210)에 입력된 소스 이미지의 얼굴 영상의 제2 특징(변환 속성 정보)를 정확하게 분류하고, 변환을 위한 성별 정보(제1 특징 또는 제2 특징을 제외한 나머지 특징)는 잘못 분류하는 동작을 수행한다. The second discriminator 250 included in the image conversion apparatus 100 refers to an attribute information preservation discriminator for preserving transform style information of the feature vector output from the encoder 210. The second discriminator 250 receives the second feature vector (φ2) output from the encoder 210 and accurately classifies the second feature (transformation attribute information) of the face image of the source image input to the encoder 210 And, the gender information for conversion (remaining features excluding the first feature or the second feature) is erroneously classified.

이미지 변환 장치(100)에 포함된 제3 감별자(260)는 이미지의 진위 여부 판별을 위한 생성적 적대 신경망 학습을 위한 감별자를 의미한다. 제3 감별자(260)는 디코더(230)로부터 속성 정보가 변환된 대상 이미지의 얼굴 영상에 대해 변환된 속성정보와 성별 정보가 올바르게 출력되도록 학습하고, 대상 이미지의 얼굴 영상의 진위 여부(Real or Fake)를 적대적으로 학습하여 영상의 품질 개선한다. The third discriminator 260 included in the image conversion apparatus 100 refers to a discriminator for learning a generative adversarial neural network to determine whether an image is authentic. The third discriminator 260 learns to correctly output the converted attribute information and gender information for the face image of the target image from which the attribute information is converted from the decoder 230, and whether the face image of the target image is authentic or not (Real or Fake) is learned hostile to improve the quality of the video.

이하, 본 실시예에 따른 이미지 변환 장치(100)에서 성별 차별 없이 이미지를 변환하는 동작을 예를 들어 설명하도록 한다. 본 실시예에 따른 이미지 변환 장치(100)는 공정성을 고려하여 이미지 데이터를 변환하기 위하여 비지도 학습 방식을 사용한다. Hereinafter, an operation of converting an image without gender discrimination in the image conversion apparatus 100 according to the present embodiment will be described as an example. The image conversion apparatus 100 according to the present embodiment uses an unsupervised learning method to convert image data in consideration of fairness.

얼굴 이미지는 이미지 간 이미지 변환 모델을 학습 하기 위해 사용될 수 있으며, 대부분의 얼굴 이미지는 실제 성별을 기준으로 수집되어 성별, 인종 등에 따른 데이터 불균형에 대한 편향성을 포함할 수 있다. 예를 들어, 소스 데이터의 데이터 세트는 얼굴 속성과 관련된 40 개의 이진 레이블이 있는 202,599 개의 얼굴 이미지가 포함될 수 있다. The face image may be used to train an image conversion model between images, and most of the face images are collected based on the actual gender, and thus may include bias against data imbalances according to gender, race, and the like. For example, a data set of source data may include 202,599 face images with 40 binary labels related to face attributes.

얼굴 속성을 생성하는 동안 차별적이지 않지만 성별에 불균형인 데이터 세트의 9 가지 대상 속성을 수동으로 선택한다. 몇 개의 이미지를 사용한 표현 학습의 어려움으로 인해 대부분의 이미지가 5o 'ClockShadow, WearingLipstick 또는 WearingEarings와 같은 한 성별에 대해서만 존재하는 속성은 제외한다.During facial attribute creation, we manually select nine target attributes from the dataset that are not discriminatory but are gender disproportionate. Due to the difficulty of learning expressions using a few images, most of the images exclude attributes that exist only for one gender, such as 5o'ClockShadow, WearingLipstick, or WearingEarings'.

이미지 변환 장치(100)의 목표는 성별에 대해 공정하게 비지도 방식으로 이미지 변환 모델을 학습하는 것이므로 목표 속성에서 성별을 제외한다. 여기서, 선택된 9 개의 바이어스된 속성과 해당 이미지 수는 [표 1]와 같다. Since the goal of the image conversion apparatus 100 is to learn an image conversion model in an unsupervised manner for gender, gender is excluded from the target attribute. Here, the selected nine biased properties and the number of corresponding images are shown in [Table 1].

본 발명에서는 비지도된 다중 도메인 이미지 간 변환을 위한 공정한 표현 학습 방법을 제안한다. 기본적인 속성은 소스 이미지의 얼굴 속성 예를 들어, 검은 색에서 금발 머리, 젊은얼굴에서 노인얼굴, 불행한 표정에서 행복한 표정 등에 대한 얼굴 속성이 매핑되도록 한다. In the present invention, a fair expression learning method for conversion between unsupervised multi-domain images is proposed. The basic attributes are the facial attributes of the source image, such as black to blond hair, young to old, and unhappy to happy facial attributes are mapped.

일반적인 이미지 변환 장치는 하나의 생성기(G)와 하나의 감별기(D)로 구성되며, 이는 소정의 특징에 대해 편향적으로 학습될 수 있다. 예를 들어, 입력 이미지 x와 대상 속성 a_t가 주어지면 생성기는 대상 이미지를 실제 이미지로 생성하려고 시도하고, 감별기는 실제 이미지와 가짜 이미지를 구별하려고 시도한다. 여기서 a_t는 요소가 대상 속성 중 하나로 설정된 이진 벡터이다. 구체적으로, 일반적인 이미지 변환 장치의 생성적 적대 학습에서 적대적 학습의 손실은 [수학 식 1]과 같이 정의될 수 있다. A general image conversion apparatus is composed of one generator (G) and one discriminator (D), which can be biasedly learned about a predetermined feature. For example, given an input image x and a target attribute a _t , the generator attempts to generate the target image as a real image, and the discriminator attempts to distinguish between the real image and the fake image. Where a _t is a binary vector whose element is set as one of the target attributes. Specifically, the loss of hostile learning in the generative hostile learning of a general image conversion device may be defined as [Equation 1].

여기서

는 결합(concatenation)을 나타내고, c_t는 W × H × T의 크기로 확장 된 벡터이고, W와 H는 x의 너비와 높이를 나타내고 T는 대상 속성의 수를 나타낸다.here

Denotes concatenation, c _t is a vector expanded to the size of W × H × T, W and H indicate the width and height of x, and T indicates the number of target attributes.

이미지 변환 장치에서 감별자의 보조 분류기를 사용하는 동안 생성된 이미지의 목표 속성과 입력 이미지 a_x의 속성은 [수학식 2]를 기반으로 분류된다.The target attribute of the image generated while using the subclassifier of the discriminator in the image conversion device and the attribute of the input image a _x are classified based on [Equation 2].

비지도 모델을 학습하기 위해 통합 모델을 기반으로 원본 이미지 x를 재구성합니다. c t와 동일한 조건부 벡터 인 대상 원래 속성 c x를 사용하여 생성 된 이미지 x t가 주어지면 수학 식 3으로 정의 된 다음주기 일관성 손실을 학습합니다.To train the unsupervised model, we reconstruct the original image x based on the unified model. Given an image x t, which was created using the target original property c x, which is a conditional vector equal to c t, we learn the next periodic coherence loss defined by Equation 3.

이미지 변환 장치는 비지도 모델을 학습하기 위해 통합 모델을 기반으로 원본 이미지 x를 재구성한다. c_t와 동일한 조건부 벡터인 대상 원래 속성 c_x를 사용하여 생성 된 이미지 x_t가 주어지면, [수학식 3]으로 정의된 다음 주기의 일관성 손실을 학습한다. The image conversion device reconstructs the original image x based on the unified model in order to learn the unsupervised model. _{Given the image x t} generated using the target original property c _x , which is the same conditional vector as c _t , the consistency loss of the next period defined by [Equation 3] is learned.

이미지 변환 장치에서는 생성적 적대 신경망의 손실 함수를 최소화함으로써, 얼굴 이미지의 향상된 변화 결과를 얻을 수 있다. 하지만, 이미지 변환 장치의 모델이 바이어스된 데이터 세트에 의해 훈련된 경우 불공정한 결과가 발생할 수 있다. In the image conversion device, by minimizing the loss function of the generative adversarial neural network, an improved change result of the face image can be obtained. However, when the model of the image conversion device is trained by a biased data set, unfair results may occur.

이에, 본 실시예에 따른 이미지 변환 장치(100)는 소스 데이터(입력 얼굴 이미지 및 대상의 속성 정보)와 대상 데이터(편집된 얼굴 이미지) 간의 매핑을 학습하도록 하여 비지도된 이미지 간의 변환을 수행한다. 본 발명의 이미지 변환 장치(100)는 성별 차이에 대한 공정성을 고려하여 이미지 변환을 수행하는 것을 목적으로 한다. Accordingly, the image conversion apparatus 100 according to the present embodiment learns mapping between source data (input face image and target attribute information) and target data (edited face image) to perform conversion between unsupervised images. . The image conversion apparatus 100 of the present invention aims to perform image conversion in consideration of fairness with respect to gender differences.

이미지 변환 장치(100)에서, 소스 이미지 x와 변환할 속성정보(수정된 조건부 벡터) c_t가 대상 속성의 베이스 라인으로 나타내며, φ 1 및 φ 2는 인코딩된 특징 벡터, x_t는 대상 이미지, g 및 y는 보호된 속성 및 진위에 대한 이진 레이블을 나타낸다.In the image conversion apparatus 100, the source image x and attribute information (modified conditional vector) c _t to be converted are represented as a baseline of the target attribute, φ 1 and φ 2 are encoded feature vectors, x _t is the target image, g and y represent the protected attribute and the binary label for authenticity.

이미지 변환 장치(100)는 하나의 인코더, 하나의 디코더 및 3 개의 감별자로 구성된다. 여기서, 하나의 인코더는 x

c → φ 1 , φ 2하는 동작을 수행하고, 하나의 디코더는 φ 1

φ 2 → x_t하는 동작을 수행한다. 또한, 3 개의 감별자 중 제1 감별자(D1)는 φ1 → a_wrong, g_correct로 분류하는 동작을 수행하고, 제2 감별자(D2)는 φ2 → a_correct, g_wrong로 분류하는 동작을 수행하며, 제3 감별자(D3)는 xt → y, a_t, g로 분류하는 동작을 수행한다. The image conversion apparatus 100 includes one encoder, one decoder, and three discriminators. Here, one encoder is x

c → φ 1, φ 2 is performed, and one decoder is φ 1

It performs an operation of φ 2 → x _t. In addition, among the three discriminators, the first discriminator (D1) _{performs an operation classifying φ1 → a wrong} , g _correct , and the second discriminator (D2) classifies _{φ2 → a correct} , g _wrong. And the third discriminator D3 performs an operation of classifying _{xt → y, a t, and g.}

이미지 변환 장치(100)는 소스 이미지 x와 변환할 속성정보 c_t가 입력되면, 두 입력을 결합하여 소스 데이터를 생성한다. 이후, 소스 데이터는 인코더로 공급되고, 공정성을 고려한 표현을 학습하기 위하여 인코더의 매핑 중에 성별 정보를 유지한다.When the source image x and the attribute information c _t to be converted are input, the image conversion apparatus 100 combines the two inputs to generate source data. Thereafter, the source data is supplied to the encoder, and gender information is maintained during the mapping of the encoder in order to learn an expression considering fairness.

인코더의 두 개의 출력(φ 1 및 φ 2)는 서로 다른 목적을 갖는 2 개의 감별기(D 1 및 D 2) 각각에 공급되고, 결합된 출력(φ 1

φ 2)는 디코더로 공급된다. 디코더로 공급된 출력(φ 1

φ 2)은 디코더에서 소스 이미지의 속성 정보(a_t)를 갖는 대상 이미지(x_t)로 생성된다. 또한, 디코더와 연동하는 감별기(D3)는 적대적 학습을 통해 실제 소스 이미지와 유사한 대상 이미지의 진위 여부를 분류한다. The two outputs (φ 1 and φ 2) of the encoder are fed to each of the two discriminators (D 1 and D 2) with different purposes, and the combined output (φ 1

φ 2) is supplied to the decoder. Output supplied to the decoder (φ 1

φ 2) is generated by the decoder as a target image (x _t ) _{having attribute information (a t) of the source image.} In addition, the discriminator D3 interworking with the decoder classifies the authenticity of the target image similar to the actual source image through hostile learning.

이미지 변환 장치(100)의 학습 모델의 구성은 도 4와 같으며, 각 구성요소의 신경망 구조는 도 7의 표와 같이 구현될 수 있다. 도 4에서는, 여자 얼굴 이미지로부터 대머리 이미지 생성의 예를 가진 이미지 변환 방법의 전체 구조를 나타낸다. 본 실시예에 따른 이미지 변환 장치(100)는 인코더(ENC), 디코더(DEC), 속성 혼동 감별기 (D1), 성별 혼동 감별기(D2) 및 감별기(D3)의 5 가지 구성 요소를 포함한다. The configuration of the learning model of the image conversion apparatus 100 is shown in FIG. 4, and the neural network structure of each component may be implemented as shown in the table of FIG. 7. 4 shows the overall structure of an image conversion method with an example of generating a bald head image from a woman's face image. The image conversion apparatus 100 according to the present embodiment includes five components of an encoder (ENC), a decoder (DEC), an attribute confusion classifier (D1), a gender confusion classifier (D2), and a classifier (D3). do.

본 실시예에 따른 이미지 변환 장치(100)는 공정한(Fairness) 표현의 학습을 위한 이미지 변환 동작을 수행한다. The image conversion apparatus 100 according to the present embodiment performs an image conversion operation for learning a fairness expression.

종래의 이미지 변환 모델은 훈련 데이터 세트의 학습 편향으로 인해 원치 않는 성별 변환이 수행되어 불공평하게 학습이 수행될 수 있다. In the conventional image transformation model, unwanted gender transformation may be performed due to a learning bias of the training data set, so that learning may be performed unfairly.

그러나, 본 발명의 이미지 변환 장치(100)는 대상 속성만 변환하고 입력 이미지의 성별 정보를 보존하는 매핑을 학습하는 것을 목표로 한다. 예를 들어, 남성 및 여성 이미지 모두에 대한 변환 및 공정한 표현을 위해, 이미지 변환 장치(100)는 소스 데이터의 인코딩된 정보에서 속성 매핑과 무관한 성별 보존 기능을 포함한다. 이미지 변환 장치(100)는 공정한 표현 학습을 사용하기 위해 다음과 같이 두 가지 다른 목적을 기반으로 인코더가 동작하도록 한다. However, the image conversion apparatus 100 of the present invention aims to learn mapping that converts only the target attribute and preserves gender information of the input image. For example, for conversion and fair expression of both male and female images, the image conversion apparatus 100 includes a gender preservation function irrelevant to attribute mapping in encoded information of source data. The image conversion apparatus 100 causes the encoder to operate based on two different purposes as follows in order to use fair expression learning.

이미지 변환 장치(100)에 포함된 인코더(210)는 입력 이미지의 성별 정보를 유지하면서 기능이 대상 속성 매핑과 관련이 없는 속성 혼동 기능(제1 특징 벡터(φ 1)) 및 대상 속성과 관련이 있지만 기능 내에서 성별 정보를 제외하는 성별 혼동 기능(제2 특징 벡터(φ 2))으로 동작할 수 있다.The encoder 210 included in the image conversion device 100 maintains the gender information of the input image, and the function is related to the attribute confusion function (first feature vector (φ 1)) and the target attribute that are not related to the target attribute mapping. However, it can operate as a gender confusion function (second feature vector (φ 2)) that excludes gender information from within the function.

인코더(210)의 기능을 통해 출력된 제1 특징 벡터 및 제2 특징 벡터는 [수학식 4]와 같은 손실 함수를 사용하여 제1 감별자(240) 및 제2 감별자(250)로 전달되어 표현을 학습한다. The first feature vector and the second feature vector output through the function of the encoder 210 are transferred to the first discriminator 240 and the second discriminator 250 using a loss function such as [Equation 4]. Learn expressions.

본 실시예에 따른 이미지 변환 장치(100)는 공정한(Fairness) 표현의 학습을 위하여 적대적 학습을 수행한다. The image conversion apparatus 100 according to the present embodiment performs hostile learning in order to learn fairness expression.

인코더(210)에 포함된 네트워크는 다중 작업 학습을 통해 속성 혼동 기능과 성별 혼동 기능을 추출한다. 인코더(210)는 디코더(230)으로 속성 혼동 기능과 성별 혼동 기능을 제공하여 대상 속성으로 성별 보존 얼굴 이미지를 생성한다. 여기서, Wasserstein GAN loss를 적용한 수학식 5를 기반으로 공정한 이미지를 생성할 수 있다. The network included in the encoder 210 extracts the attribute confusion function and the gender confusion function through multi-task learning. The encoder 210 provides an attribute confusion function and a gender confusion function to the decoder 230 to generate a gender-preserving face image as a target attribute. Here, a fair image may be generated based on Equation 5 to which the Wasserstein GAN loss is applied.

본 실시예에 따른 이미지 변환 장치(100)는 공정한(Fairness) 표현의 학습을 위하여 보조 분류기(Auxiliary Classifier)를 포함한다. The image conversion apparatus 100 according to the present embodiment includes an auxiliary classifier for learning fairness expression.

이미지 변환 장치(100)에 포함된 제3 감별자(260)는 ACGAN(uxiliary Classifier GAN)을 기반으로 속성 라벨 및 성별 라벨 각각에 대한 두 개의 보조 분류기를 추가로 포함한다. 따라서, 제3 감별자(260)는 실제 이미지와 가짜 이미지, 성별 및 속성을 분류하는 세 가지 작업을 수행한다. 여기서, 보조 분류기에서는 [수학식 6]과 같은 손실 함수가 사용될 수 있다. The third discriminator 260 included in the image conversion apparatus 100 further includes two auxiliary classifiers for each of the attribute label and the gender label based on an auxiliary classifier GAN (ACGAN). Accordingly, the third discriminator 260 performs three tasks of classifying real images, fake images, gender, and attributes. Here, in the auxiliary classifier, a loss function such as [Equation 6] may be used.

본 실시예에 따른 이미지 변환 장치(100)는 공정한(Fairness) 표현의 학습을 위하여 사이클 일관성을 제약하여 동작한다. The image conversion apparatus 100 according to the present embodiment operates by limiting cycle consistency in order to learn fairness expression.

본 실시예에 따른 이미지 변환 장치(100)는 비지도 학습 방식으로 매핑을 학습한다. 이에 따라, 본 발명에서는 짝을 이루지 않은 데이터-데이터 변환 모델을 학습하는 효과가 입증된 주기 일관성 제약을 적용한다. 그러나, 모델이 단일 생성기(200)를 기반으로 동작하기 때문에 생성된 이미지의 원본 이미지를 기준으로 재구성하고, 이미지 변환 장치(100)는 수학식 7을 기반으로 사이클 일관선 손실을 계산하여 적용한다. The image conversion apparatus 100 according to the present embodiment learns mapping in an unsupervised learning method. Accordingly, in the present invention, the periodicity consistency constraint, which has proven the effect of learning an unpaired data-data conversion model, is applied. However, since the model operates based on the single generator 200, the generated image is reconstructed based on the original image, and the image conversion apparatus 100 calculates and applies the cycle consistency loss based on Equation 7.

본 실시예에 따른 이미지 변환 장치(100)는 공정한(Fairness) 표현의 학습을 위하여 지각 손실(Perceptual Loss)을 고려하여 동작한다. The image conversion apparatus 100 according to the present embodiment operates in consideration of perceptual losses in order to learn fairness expression.

이미지 변환 장치(100)는 콘텐츠 손실과 스타일 손실을 결합하여 더 나은 품질의 이미지를 생성할 수 있는 뛰어난 능력을 보여준 지각 손실을 가져온다. 여기서, 콘텐츠 손실은 소스 이미지와 대상 생성 이미지에서 계산되는 반면, 스타일 손실은 두 가지 입력 이미지(콘텐츠 이미지와 대상 스타일 이미지)가 제공된다. 콘텐츠 손실은 [수학식 8]을 기반으로 계산되고, 스타일 손실은 [수학식 9] 기반으로 계산된다. The image conversion apparatus 100 results in perceptual loss showing excellent ability to generate images of better quality by combining content loss and style loss. Here, the content loss is calculated from the source image and the target generated image, while the style loss is provided with two input images (content image and target style image). Content loss is calculated based on [Equation 8], and style loss is calculated based on [Equation 9].

그러나, 이 모델에서는 대상 참조 이미지를 사용하지 않고 하나의 소스 이미지만 입력으로 사용합니다. 따라서 소스 이미지 x와 재구성된 이미지 x 사이의 스타일 손실을 계산한다. 여기서 i와 j는 피처 추출을 위해 선택한 레이어를 나타내며 이전 작업과 동일한 레이어를 사용합니다.However, this model does not use the target reference image, only one source image as input. So we compute the style loss between the source image x and the reconstructed image x. Where i and j represent the layer selected for feature extraction, we will use the same layer as the previous operation.

본 실시예에 따른 이미지 변환 장치(100)는 공정한(Fairness) 표현의 학습을 위한 학습 네트워크를 통해 학습 동작을 수행한다. The image conversion apparatus 100 according to the present embodiment performs a learning operation through a learning network for learning a fairness expression.

본 실시예에 따른 이미지 변환 장치(100)는 전술한 복수의 손실 함수(수학식 1 내지 9)를 모두 학습하여 Wasserstein 거리를 기준으로 제안된 모델을 학습한다. 이미지 변환 장치(100)에서 공정한 표현의 학습을 위한 모델의 최종 목적 함수는 [수학식 10]과 같다. The image conversion apparatus 100 according to the present embodiment learns all of the aforementioned loss functions (Equations 1 to 9) to learn the proposed model based on the Wasserstein distance. The final objective function of the model for learning of a fair expression in the image conversion device 100 is as shown in [Equation 10].

도 5는 본 발명의 실시예에 따른 프로세서의 이미지 변환 동작 구성을 개략적으로 나타낸 블록 구성도이다. 5 is a block diagram schematically illustrating a configuration of an image conversion operation of a processor according to an embodiment of the present invention.

본 실시예에 따른 이미지 변환 장치(100)에 포함된 프로세서(130)는 기 학습된 학습 결과를 적용하여 이미지를 변환하는 동작을 수행한다. 프로세서(130)는 입력 영상 획득부(510), 학습 결과 적용부(520) 및 영상 출력 처리부(530)를 포함할 수 있다. The processor 130 included in the image conversion apparatus 100 according to the present embodiment performs an operation of converting an image by applying a previously learned learning result. The processor 130 may include an input image acquisition unit 510, a learning result application unit 520, and an image output processing unit 530.

입력 영상 획득부(510)는 소스 이미지 및 소스 이미지에서 변환을 위한 속성정보를 포함하는 소스 데이터를 입력 받는다. The input image acquisition unit 510 receives source data including a source image and attribute information for transformation from the source image.

학습 결과 적용부(520)는 소스 데이터의 제1 특징 벡터를 기반으로 학습된 제1 학습 결과, 소스 데이터의 제2 특징 벡터를 기반으로 학습된 제2 학습 결과 및 제1 특징 벡터와 상기 제2 특징 벡터를 결합한 결합 특징 벡터를 기반으로 생성된 대상 이미지를 통해 학습된 제3 학습 결과를 적용하여 소스 이미지를 변환한다. The learning result application unit 520 includes a first learning result learned based on a first feature vector of source data, a second learning result learned based on a second feature vector of source data, a first feature vector, and the second The source image is transformed by applying the third learning result learned through the target image generated based on the combined feature vector that combines the feature vectors.

영상 출력 처리부(530)는 학습 결과 적용부(520)에서 변환된 변환 이미지를 출력하는 동작을 수행한다. The image output processing unit 530 performs an operation of outputting the converted image converted by the learning result application unit 520.

도 6은 본 발명의 실시예에 따른 이미지 변환 장치의 이미지 변환 동작을 설명하기 위한 예시도이다. 6 is an exemplary diagram for explaining an image conversion operation of the image conversion apparatus according to an embodiment of the present invention.

이미지 변환 장치(100)는 사용자에 의해 생성된 소스 이미지(x_t) 및 속성 정보(c_x)를 입력 받는다. 여기서, 소스 이미지는 대머리의 여성 이미지이고, 속성 정보는 매력적임 및 젊음에 대한 조건 정보를 가지는 것으로 가정한다.The image conversion apparatus 100 receives a source image (x _t ) and attribute information (c _x ) generated by a user. Here, it is assumed that the source image is an image of a bald-headed woman, and the attribute information has condition information about attractiveness and youth.

이미지 변환 장치(100)는 기 학습된 결과들을 기반으로 본 발명의 변환 모델을 적용하여 이미지(x)를 변환한다. The image conversion apparatus 100 converts the image x by applying the conversion model of the present invention based on the previously learned results.

이미지 변환 장치(100)는 변환된 이미지(x)를 출력한다. 여기서, 변환된 이미지(x)는 여성에 대한 성별이 유지되면서 대머리에서 일반 머리로 변환된 이미지를 의미한다. The image conversion device 100 outputs the converted image x. Here, the converted image (x) refers to an image converted from bald head to normal head while maintaining gender for a woman.

도 7은 본 발명의 실시예에 따른 이미지 변환 장치의 학습 단계 별 결과물을 나타낸 예시도이다.7 is an exemplary view showing a result of each learning step of the image conversion apparatus according to an embodiment of the present invention.

도 7은 이미지 변환 장치(100)에 포함된 각 구성요소의 신경망 구조를 나타낸다. 7 shows a neural network structure of each component included in the image conversion apparatus 100.

도 7의 (a)는 인코더(210)의 신경망 구조를 나타내고, 도 7의 (b)는 디코더(230)의 신경망 구조를 나타낸다. 또한, 도 7의 (c)는 제1 감별자(240) 및 제2 감별자(250)의 신경망 구조를 나타내며, 도 7의 (d)는 제3 감별자(260)의 신경망 구조를 나타낸다. 이미지 변환 장치(100)에 포함된 각 구성요소의 컨볼루션 연결 구조는 도 8 내지 도 12에 도시되어 있다. FIG. 7A shows a neural network structure of the encoder 210, and FIG. 7B shows a neural network structure of the decoder 230. In addition, (c) of FIG. 7 shows the structure of the neural network of the first and second discriminators 240 and 250, and (d) of FIG. 7 shows the structure of the neural network of the third discriminator 260. The convolutional connection structure of each component included in the image conversion apparatus 100 is illustrated in FIGS. 8 to 12.

도 8은 본 발명의 실시예에 따른 인코더의 동작 구성을 나타낸 도면이다. 8 is a diagram showing an operation configuration of an encoder according to an embodiment of the present invention.

도 8을 참조하면, 이미지 변환 장치(100)의 인코더(210)는 3 개의 RGB 데이터로 구성된 소스 이미지(실제 영상)과 9 개의 속성에 대한 속성 정보를 입력 받는다. Referring to FIG. 8, the encoder 210 of the image conversion apparatus 100 receives a source image (real image) composed of three RGB data and attribute information on nine attributes.

인코더(210)는 3 개의 컨볼루션(Convolution) 필터를 거치고, 1 개의 레지듀얼(Residual) 필터를 5회 반복하여 통과한 후 2 개의 컨볼루션 필터 각각을 통해 제1 특징 벡터 및 제2 특징 벡터를 출력한다. The encoder 210 passes through three convolution filters, passes through one residual filter five times, and then generates a first feature vector and a second feature vector through each of two convolution filters. Print it out.

인코더(210)는 제1 특징 벡터를 제1 감별자(240)으로 전달하고, 제2 특징 벡터를 제2 감별자(250)으로 전달한다. 또한, 인코더(210)는 제1 특징 벡터 및 제2 특징 벡터를 결합한 결합 특징 벡터를 디코더(230)로 전달한다. The encoder 210 transfers the first feature vector to the first discriminator 240 and the second feature vector to the second discriminator 250. In addition, the encoder 210 transmits a combined feature vector obtained by combining the first feature vector and the second feature vector to the decoder 230.

도 9는 본 발명의 실시예에 따른 디코더의 동작 구성을 나타낸 도면이다.9 is a diagram showing an operation configuration of a decoder according to an embodiment of the present invention.

도 9를 참조하면, 이미지 변환 장치(100)의 디코더(230)는 인코더(210)로부터 제1 특징 벡터 및 제2 특징 벡터를 결합한 결합 특징 벡터를 입력 받는다. Referring to FIG. 9, the decoder 230 of the image conversion apparatus 100 receives a combined feature vector obtained by combining the first feature vector and the second feature vector from the encoder 210.

디코더(230)는 결합 특징 벡터를 서로 다른 2 개의 디컨볼루션(Deconvolution) 필터를 통과 시켜 대상 이미지를 출력한다. 여기서, 대상 이미지는 소스 이미지에서 속성 정보가 변경된 이미지를 의미한다. The decoder 230 outputs a target image by passing the combined feature vector through two different deconvolution filters. Here, the target image means an image in which attribute information has been changed in the source image.

도 10은 본 발명의 실시예에 따른 제1 감별자의 동작 구성을 나타낸 도면이고, 도 11은 본 발명의 실시예에 따른 제2 감별자의 동작 구성을 나타낸 도면이다.10 is a diagram showing an operation configuration of a first discriminator according to an embodiment of the present invention, and FIG. 11 is a view showing an operation configuration of a second discriminator according to an embodiment of the present invention.

도 10을 참조하면, 이미지 변환 장치(100)의 제1 감별자(240)는 인코더(210)로부터 제1 특징 벡터를 입력 받고, 3 개의 컨볼루션(Convolution) 필터를 통과한 후 제1 특징 벡터가 참 신호에 대응되도록 학습하는 컨볼루션 필터와 제1 특징 벡터를 제외한 나머지 속성 정보가 거짓 신호에 대응되도록 학습하는 컨볼루션 필터 각각을 통과하여 제1 학습 결과를 출력한다. Referring to FIG. 10, the first discriminator 240 of the image conversion apparatus 100 receives a first feature vector from the encoder 210, passes through three convolution filters, and then passes the first feature vector. A first learning result is output by passing through each of a convolution filter that learns to correspond to a true signal and a convolution filter that learns so that the rest of the attribute information except the first feature vector corresponds to a false signal.

도 11을 참조하면, 이미지 변환 장치(100)의 제2 감별자(250)는 인코더(210)로부터 제2 특징 벡터를 입력 받고, 3 개의 컨볼루션(Convolution) 필터를 통과한 후 제2 특징 벡터가 참 신호에 대응되도록 학습하는 컨볼루션 필터와 제2 특징 벡터를 제외한 나머지 속성 정보가 거짓 신호에 대응되도록 학습하는 컨볼루션 필터 각각을 통과하여 제2 학습 결과를 출력한다.Referring to FIG. 11, the second discriminator 250 of the image conversion apparatus 100 receives a second feature vector from the encoder 210, passes through three convolution filters, and then passes the second feature vector. A second learning result is output by passing through each of a convolution filter that learns to correspond to a true signal and a convolution filter that learns so that the rest of the attribute information excluding the second feature vector corresponds to a false signal.

도 12는 본 발명의 실시예에 따른 제3 감별자의 동작 구성을 나타낸 도면이다.12 is a diagram showing an operation configuration of a third discriminator according to an embodiment of the present invention.

도 12를 참조하면, 이미지 변환 장치(100)의 제3 감별자(260)는 소스 이미지(실제 영상)와 디코더(230)에서 생성된 대상 이미지(생성된 영상)를 입력 받고, 6 개의 컨볼루션 필터를 통과한 후 이미지의 진위 여부를 판별하기 위한 컨볼루션 필터, 대상 이미지의 제1 특징 정보가 소스 이미지의 제1 특징 정보와 동일하게 되도록 학습하기 위한 컨볼루션 필터 및 대상 이미지의 제2 특징 정보가 소스 이미지의 제2 특징 정보와 동일하게 되도록 학습하기 위한 컨볼루션 필터 각각을 통과하여 제3 학습 결과를 출력한다.Referring to FIG. 12, the third discriminator 260 of the image conversion apparatus 100 receives a source image (real image) and a target image (generated image) generated by the decoder 230, and six convolutions A convolution filter to determine the authenticity of the image after passing through the filter, a convolution filter to learn to make the first feature information of the target image the same as the first feature information of the source image, and the second feature information of the target image A third learning result is output by passing through each of the convolution filters for learning to be the same as the second feature information of the source image.

도 13은 본 발명의 실시예에 따른 이미지 변환 방법을 설명하기 위한 순서도이다. 13 is a flowchart illustrating an image conversion method according to an embodiment of the present invention.

이미지 변환 장치(100)는 소스 이미지 및 소스 이미지에서 변환을 위한 속성정보를 포함하는 소스 데이터를 입력 받는다(S1310).The image conversion apparatus 100 receives source data including a source image and attribute information for conversion from the source image (S1310).

이미지 변환 장치(100)는 소스 데이터를 기반으로 제1 특징 벡터 및 제2 특징 벡터를 생성한다(S1320). The image conversion apparatus 100 generates a first feature vector and a second feature vector based on the source data (S1320).

이미지 변환 장치(100)는 제1 특징 벡터 및 제2 특징 벡터를 기반으로 대상 이미지를 생성하여 출력한다(S1330). 이미지 변환 장치(100)는 제1 특징 벡터를 제1 감별자(240)로 전달하고, 제2 특징 벡터를 제2 감별자(250)로 전달한다. 또한, 이미지 변환 장치(100)는 대상 이미지를 제3 감별자(260)으로 전달한다. The image conversion apparatus 100 generates and outputs a target image based on the first feature vector and the second feature vector (S1330). The image conversion apparatus 100 transfers the first feature vector to the first discriminator 240 and transfers the second feature vector to the second discriminator 250. Also, the image conversion device 100 transmits the target image to the third discriminator 260.

이미지 변환 장치(100)는 제1 감별자(240)에서, 제1 특징 벡터에 대한 분류를 수행한다(S1340). 구체적으로, 제1 감별자(240)는 제1 특징 벡터를 입력 받고, 소스 이미지의 제1 특징 벡터가 참 신호에 해당하도록 분류를 처리하고, 소스 이미지의 제2 특징 벡터 또는 제1 특징 벡터를 제외한 나머지 특징벡터가 거짓 신호에 해당하도록 분류를 처리한다.The image conversion apparatus 100 classifies the first feature vector by the first discriminator 240 (S1340). Specifically, the first discriminator 240 receives the first feature vector, processes the classification so that the first feature vector of the source image corresponds to the true signal, and determines the second feature vector or the first feature vector of the source image. Classification is processed so that the other feature vectors, which are excluded, correspond to false signals.

이미지 변환 장치(100)는 제2 감별자(250)에서, 제2 특징 벡터에 대한 분류를 수행한다(S1350). 구체적으로, 제2 감별자(250)는 제2 특징 벡터를 입력 받고, 소스 이미지의 제2 특징 벡터가 참 신호에 해당하도록 분류를 처리하고, 소스 이미지의 제1 특징 벡터 또는 제2 특징 벡터를 제외한 나머지 특징벡터가 거짓 신호에 해당하도록 분류를 처리한다.The image conversion apparatus 100 classifies the second feature vector by the second discriminator 250 (S1350). Specifically, the second discriminator 250 receives the second feature vector, processes the classification so that the second feature vector of the source image corresponds to a true signal, and determines the first feature vector or the second feature vector of the source image. Classification is processed so that the other feature vectors, which are excluded, correspond to false signals.

이미지 변환 장치(100)는 제3 감별자(240)에서, 대상 이미지에 대한 분류를 수행한다(S1360). 구체적으로, 제3 감별자(260)는 대상 이미지에 대한 제1 특징 벡터 및 제2 특징 벡터를 참 신호에 해당하도록 분류하고, 소스 이미지와 대상 이미지를 비교하여 대상 이미지의 진위 여부(Real or Fake)에 대한 분류를 처리한다.The image conversion apparatus 100 classifies the target image by the third discriminator 240 (S1360). Specifically, the third discriminator 260 classifies the first feature vector and the second feature vector for the target image to correspond to a true signal, and compares the source image and the target image to determine whether the target image is authentic or not (Real or Fake). ) To handle classification.

이미지 변환 장치(100)는 복수의 학습 결과를 기반으로 소스 이미지의 이미지 변환을 수행하여 결과 이미지를 출력한다(S1370). The image conversion apparatus 100 outputs a result image by performing image conversion of a source image based on a plurality of learning results (S1370).

도 13에서는 각 단계를 순차적으로 실행하는 것으로 기재하고 있으나, 반드시 이에 한정되는 것은 아니다. 다시 말해, 도 13에 기재된 단계를 변경하여 실행하거나 하나 이상의 단계를 병렬적으로 실행하는 것으로 적용 가능할 것이므로, 도 13은 시계열적인 순서로 한정되는 것은 아니다.In FIG. 13, it is described that each step is sequentially executed, but the present invention is not limited thereto. In other words, since it is possible to change and execute the steps illustrated in FIG. 13 or execute one or more steps in parallel, FIG. 13 is not limited to a time series order.

도 13에 기재된 본 실시예에 따른 이미지 변환 방법은 애플리케이션(또는 프로그램)으로 구현되고 단말장치(또는 컴퓨터)로 읽을 수 있는 기록매체에 기록될 수 있다. 본 실시예에 따른 이미지 변환 방법을 구현하기 위한 애플리케이션(또는 프로그램)이 기록되고 단말장치(또는 컴퓨터)가 읽을 수 있는 기록매체는 컴퓨팅 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치 또는 매체를 포함한다.The image conversion method according to the present embodiment illustrated in FIG. 13 may be implemented as an application (or program) and recorded on a recording medium that can be read by a terminal device (or computer). The application (or program) for implementing the image conversion method according to the present embodiment is recorded and the recording medium that can be read by the terminal device (or computer) is any type of recording device that stores data that can be read by the computing system or Includes the medium.

이상의 설명은 본 발명의 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명의 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명의 실시예들은 본 발명의 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the embodiments of the present invention, and those of ordinary skill in the technical field to which the embodiments of the present invention belong to, various modifications and modifications without departing from the essential characteristics of the embodiments of the present invention Transformation will be possible. Accordingly, the embodiments of the present invention are not intended to limit the technical idea of the embodiment of the present invention, but to explain the technical idea, and the scope of the technical idea of the embodiment of the present invention is not limited by these embodiments. The scope of protection of the embodiments of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the rights of the embodiments of the present invention.

100: 이미지 변환 장치
110: 입력부 120: 출력부
130: 프로세서 140: 메모리
150: 데이터 베이스
200: 생성 처리부 202: 감별 처리부
210: 인코더 220: 특징 벡터 처리부
230: 디코더 240: 제1 감별자
250: 제2 감별자 260: 제3 감별자100: image conversion device
110: input unit 120: output unit
130: processor 140: memory
150: database
200: generation processing unit 202: discrimination processing unit
210: encoder 220: feature vector processing unit
230: decoder 240: first discriminator
250: second discriminator 260: third discriminator

Claims

An image conversion method performed by a computing device including at least one processor and a memory storing at least one program executed by the processor,
The computing device,
A data input step of receiving source data including a source image and attribute information for conversion from the source image;
A generation processing step of generating at least one feature vector based on the source data and generating and outputting a target image based on the at least one feature vector; And
Performing a discrimination processing step of performing image conversion by processing classification for each of the first feature vector and the second feature vector among the target image and the at least one feature vector,
In the generation processing step, the first feature vector and the second feature vector are generated based on the source image and the attribute information, and a combined feature vector obtained by combining the first feature vector and the second feature vector is output. An encoding step to perform; A feature for receiving the combined feature vector, extracting the first feature vector and the second feature vector included in the combined feature vector, and outputting each of the first feature vector, the second feature vector, and the combined feature vector Vector processing step; And a decoding step of receiving the combined feature vector and generating and outputting a target image from which the attribute information is converted based on the combined feature vector.

delete

The method of claim 1,
The generation processing step,
Outputting the first feature vector included in the combined feature vector as a first discriminator, outputting the second feature vector as a second discriminator, and outputting the target image as a third discriminator,
The first discriminator, the second discriminator, and the third discriminator are different discriminators.

The method of claim 1,
The discrimination processing step,
A first discrimination processing step of processing the classification of the first feature vector among the at least one feature vector;
A second discrimination processing step of processing the classification of the second feature vector among the at least one feature vector; And
A third discrimination processing step of processing the classification of the target image
Image conversion method comprising a.

The method of claim 4,
The first discrimination processing step,
Receiving the first feature vector and performing classification processing so that the first feature vector of the source image corresponds to a true signal.

The method of claim 5,
The first discrimination processing step,
And classifying the second feature vector of the source image to correspond to a false signal.

The method of claim 4,
The second discrimination processing step,
Receiving the second feature vector and performing classification processing so that the second feature vector of the source image corresponds to a true signal.

The method of claim 5,
The second discrimination processing step,
And classifying the first feature vector of the source image to correspond to a false signal.

The method of claim 4,
The third discrimination processing step,
An image transformation, characterized in that classifying the first feature vector and the second feature vector for the target image to correspond to a true signal, and classifying whether the target image is authentic or not by comparing the source image and the target image Way.

The method of claim 4,
The first discrimination processing step performs a generative adversarial network (GAN) learning in order to classify the first feature vector to correspond to a true signal in conjunction with the generation processing step,
The second discrimination processing step performs generative hostile neural network (GAN) learning to classify the second feature vector to correspond to a true signal in conjunction with the generation processing step,
The third discrimination processing step includes performing generative hostile neural network (GAN) learning to classify the authenticity of the target image in connection with the generation processing step.

A device that converts an image of an input source image,
One or more processors; And
And a memory for storing one or more programs executed by the processor, and when the programs are executed by one or more processors, in the one or more processors,
A data input step of receiving source data including a source image and attribute information for conversion from the source image;
A generation processing step of generating at least one feature vector based on the source data and generating and outputting a target image based on the at least one feature vector; And
Performing operations including a discrimination processing step of performing image conversion by processing classification for each of the first feature vector and the second feature vector among the target image and the at least one feature vector,
In the generation processing step, the first feature vector and the second feature vector are generated based on the source image and the attribute information, and a combined feature vector obtained by combining the first feature vector and the second feature vector is output. An encoding step to perform; A feature for receiving the combined feature vector, extracting the first feature vector and the second feature vector included in the combined feature vector, and outputting each of the first feature vector, the second feature vector, and the combined feature vector Vector processing step; And a decoding step of receiving the combined feature vector and generating and outputting a target image from which the attribute information is converted based on the combined feature vector.

delete

The method of claim 11,
The generation processing step,
Outputting the first feature vector included in the combined feature vector as a first discriminator, outputting the second feature vector as a second discriminator, and outputting the target image as a third discriminator,
The first discriminator, the second discriminator, and the third discriminator are different discriminators.

The method of claim 11,
The discrimination processing step,
A first discrimination processing step of processing the classification of the first feature vector among the at least one feature vector;
A second discrimination processing step of processing the classification of the second feature vector among the at least one feature vector; And
A third discrimination processing step of processing the classification of the target image
Image conversion device comprising a.

delete