KR20220129433A

KR20220129433A - Method and apparatus for generating and editing images using contrasitive learning and generative adversarial network

Info

Publication number: KR20220129433A
Application number: KR1020210076556A
Authority: KR
Inventors: 박재식; 강민국
Original assignee: 포항공과대학교 산학협력단
Priority date: 2021-03-16
Filing date: 2021-06-14
Publication date: 2022-09-23
Also published as: KR102477700B1

Abstract

Disclosed are a method and an apparatus for generating and editing images using contrastive learning and a generative adversarial network. The method for generating and editing images using contrastive learning and a generative adversarial network comprises: a step of extracting common features from an input image input into a discriminator to generate common feature data; a step of projecting the common feature data to the dimension of class embedding of a real image; and a step of operating to minimize a conditional contrastive loss based on the common feature data and class embeddings, or to make a conditional contrastive loss converge. The operating step discriminates the veracity of the input image, generates a fake image, or changes a portion of the input image based on the data-data relationship and the data-class relationship among multiple image embeddings in the same dimension using the conditional contrastive loss. Therefore, the present invention can effectively generate images with excellent performance or edit images using contrastive learning.

Description

METHOD AND APPARATUS FOR GENERATING AND EDITING IMAGES USING CONTRASITIVE LEARNING AND GENERATIVE ADVERSARIAL NETWORK

본 발명은 조건부 이미지 생성 기술에 관한 것으로, 보다 상세하게는, 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 방법과 장치에 관한 것이다.The present invention relates to conditional image generation technology, and more particularly, to an image generation and editing method and apparatus utilizing contrast learning and adversarial generative neural networks.

적대적 생성 신경망(Generative adversarial network, GAN)은 이안 굿펠로우(Ian Goodfellow)가 실질적으로 처음 제안한 방법으로 생성자와 판별자를 적대적인 방법으로 학습시켜 데이터가 가진 패턴을 학습하고, 이를 통해 새로운 데이터를 생성하는 기술이다.Generative adversarial network (GAN) is a method that was actually first proposed by Ian Goodfellow. It learns the patterns of data by learning the generator and discriminator in an adversarial way, and then generates new data through this method. to be.

적대적 생성 신경망은 컴퓨터 비전 분야에서 이미지 합성 및 편집을 위한 기술로 많이 사용되고 있으며, 인터넷에서 크롤링한 이미지 데이터 세트를 학습하여 학습 과정에서 보지 못한 새로운 이미지를 생성하는데 사용하는 것도 가능하다.Adversarial generative neural networks are widely used as techniques for image synthesis and editing in the field of computer vision.

적대적 생성 신경망을 이용하면 도 1에 나타낸 바와 같이 얼굴 이미지 합성과 같은 작업을 수행하거나, 도 2에 나타낸 바와 같이 화풍 변환과 같은 작업을 수행할 수 있다. 즉, 도 1은 적대적 생성 신경망으로 생성한 얼굴들로써, 학습 데이터에는 존재하지 않는 새롭게 생성된 얼굴들을 나타낸다. 그리고 도 2는 적대적 생성 신경망을 통해 화풍을 변환한 그림들을 나타낸다.If the adversarial generative neural network is used, it is possible to perform a task such as synthesizing a face image as shown in FIG. 1 or performing a task such as a painting style transformation as shown in FIG. 2 . That is, FIG. 1 shows newly generated faces that do not exist in the training data as faces generated by the adversarial generative neural network. And Fig. 2 shows the pictures transformed by the adversarial generative neural network.

한편, 적대적 생성 신경망의 이미지 생성 성능과는 별개로, 적대적 생성 신경망을 학습시키기 위해서는 엄청난 양의 데이터와 안정적인 구조의 생성자, 판별자 및 섬세한 초기 설정값이 필요하다. 그래서 적대적 생성 신경망의 학습 불안정성을 완화하기 위해 많은 안정화 기법들이 제안되고 있다.On the other hand, apart from the image generation performance of adversarial generative neural networks, training an adversarial generative neural network requires a huge amount of data, stable structure generators, discriminators, and delicate initial settings. Therefore, many stabilization techniques have been proposed to alleviate the learning instability of adversarial generative neural networks.

일례로 학습할 이미지의 정답 레이블 예를 들어 개, 고양이, 사슴, 고슴도치 등을 힌트로 주어 적대적 생성 신경망에서의 이미지 생성을 안정화하는 방법이 제안되었다. 그러나, 이러한 기존의 조건부 이미지 생성 작업은 여전히 판별자의 과적합(overfitting) 문제가 적지 않게 발생하고, 일관성 정규화를 위해 데이터를 추가로 제공해야 하는 등 여전해 개선해야 할 문제가 있다.As an example, a method of stabilizing image generation in an adversarial generative neural network has been proposed by giving the correct label of the image to be trained, for example, dog, cat, deer, hedgehog, etc. as a hint. However, such an existing conditional image generation task still has problems that need to be improved, such as the overfitting of the discriminator and the need to additionally provide data for consistency normalization.

이와 같이 기존의 문제들을 개선할 수 있는 새로운 적대적 생성 신경망을 활용한 조건부 이미지 생성 모델이 요구되고 있다.As such, there is a demand for a conditional image generation model using a new adversarial generative neural network that can improve existing problems.

본 발명은 기존의 조건부 이미지 생성 모델의 문제점을 해결하기 위해 도출된 것으로, 본 발명의 목적은 새로운 모델인 대조 학습을 통한 적대적 생성 신경망을 활용하여 좀더 효과적으로 이미지를 생성하거나 편집할 수 있는 방법 및 장치를 제공하는데 있다.The present invention was derived to solve the problems of the existing conditional image generation model, and an object of the present invention is to use a new model, an adversarial generating neural network through contrast learning, to create or edit an image more effectively. is to provide

본 발명의 다른 목적은 대조 학습을 통한 적대적 생성 신경망을 활용하여 다양한 종류의 적대적 생성 신경망 기반의 이미지 생성 및 편집 방법과 장치를 제공하는 데 있다.Another object of the present invention is to provide a method and apparatus for generating and editing images based on various types of adversarial generative neural networks by utilizing adversarial generative neural networks through contrast learning.

상기 기술적 과제를 해결하기 위한 본 발명의 일 측면에 따른 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 방법은, 판별자에 입력되는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성하는 단계; 공통 특징 데이터를 진짜 이미지의 클래스 임베딩의 차원으로 사영하는 단계; 및 공통 특징 데이터 및 클래스 임베딩에 의한 조건부 대조 손실을 최소화하거나 조건부 대조 손실이 수렴하도록 동작하는 단계;를 포함하며, 상기 동작하는 단계는 조건부 대조 손실을 이용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별하거나 진짜 이미지로서 판별되는 가짜 이미지를 생성하거나 입력 이미지의 일부를 변경한다.An image generation and editing method using contrast learning and an adversarial generating neural network according to an aspect of the present invention for solving the above technical problem includes the steps of extracting common features from an input image input to a discriminator to generate common feature data ; projecting the common feature data into the dimension of class embeddings of the real image; and minimizing the conditional contrast loss due to common feature data and class embeddings or operating so that the conditional contrast loss converges, wherein the operating includes: data between multiple image embeddings in the same dimension using the conditional contrast loss Based on the data relationship and the data-class relationship, the authenticity of the input image is determined, a fake image that is determined as a real image is generated, or a part of the input image is changed.

일실시예에서, 상기 동작하는 단계는. 데이터-데이터 관계 및 데이터-클래스 관계가 표현되는 임베딩 공간에서 정점 임베딩 또는 목적 임베딩과 유사도가 높은 다른 임베딩을 당겨 더 가까이 위치하도록 하고 유사도가 낮은 또 다른 임베딩을 밀어 더 멀리 위치하도록 동작한다.In one embodiment, the operating step. In the embedding space where the data-data relationship and data-class relationship are expressed, another embedding with a high similarity to the vertex embedding or the objective embedding is pulled closer to be positioned, and another embedding with a low similarity is pushed to be positioned further away.

일실시예에서, 이미지 생성 및 편집 방법은 입력 이미지의 진위를 속이고 대조 손실이 적은 사실적 가짜 이미지를 생성하는 단계를 더 포함할 수 있다.In one embodiment, the image creation and editing method may further comprise the step of deceiving the authenticity of the input image and generating a realistic fake image with low contrast loss.

일실시예에서, 이미지 생성 및 편집 방법은 입력 이미지로서 가짜 이미지를 판별자에 입력하거나 입력 이미지로서 가짜 이미지와 진짜 이미지를 판별자에 입력하는 단계를 더 포함할 수 있다.In an embodiment, the image creation and editing method may further include inputting a fake image as an input image to the discriminator or inputting a fake image and a real image as input images to the discriminator.

상기 기술적 과제를 해결하기 위한 본 발명의 다른 측면에 따른 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 장치는, 디스플레이; 하나 이상의 카메라; 하나 이상의 프로세서; 및 하나 이상의 프로세서에 의해 실행되도록 구성된 하나 이상의 프로그램을 저장하는 메모리;를 포함하며, 하나 이상의 프로그램은 전술한 실시예들 중 어느 하나의 방법을 수행하기 위한 명령어들을 포함한다.In accordance with another aspect of the present invention for solving the above technical problem, there is provided an image generating and editing apparatus using contrast learning and an adversarial generating neural network, a display; one or more cameras; one or more processors; and a memory storing one or more programs configured to be executed by one or more processors, wherein the one or more programs include instructions for performing the method of any one of the above-described embodiments.

상기 기술적 과제를 해결하기 위한 본 발명의 또 다른 측면에 따른 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 장치는, 전술한 실시예들 중 어느 하나의 방법을 수행하기 위한 명령어들을 포함하는 하나 이상의 프로그램이 기록된 컴퓨터 판독가능 저장 매체를 포함한다. 여기서, 하나 이상의 프로그램은 하나 이상의 카메라 및 디스플레이를 갖춘 컴퓨팅 장치의 하나 이상의 프로세서에 의해 실행될 수 있다.An apparatus for generating and editing an image using contrast learning and an adversarial generating neural network according to another aspect of the present invention for solving the above technical problem is one including instructions for performing any one of the above-described embodiments and a computer-readable storage medium in which the above program is recorded. Here, one or more programs may be executed by one or more processors of a computing device having one or more cameras and displays.

상기 기술적 과제를 해결하기 위한 본 발명의 또 다른 측면에 따른 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 방법은, 조건부 대조 손실을 사용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별하는 단계; 및 입력 이미지의 진위를 속이고 대조 손실이 적은 사실적 가짜 이미지를 생성하는 단계를 포함한다.An image generation and editing method utilizing contrast learning and adversarial generative neural networks according to another aspect of the present invention for solving the above technical problem is a data-data relationship between multiple image embeddings in the same dimension using conditional contrast loss and determining the authenticity of the input image based on the data-class relationship; and deceiving the authenticity of the input image and generating a realistic fake image with low contrast loss.

일실시예에서, 상기 조건부 대조 손실은 입력 데이터에서 공통적 특징을 추출한 공통 특징 데이터를 진짜 데이터의 클래스 임베딩의 차원으로 사영한 것에 대응할 수 있다.In an embodiment, the conditional contrast loss may correspond to projecting common feature data obtained by extracting common features from input data as a dimension of class embedding of real data.

일실시예에서, 상기 조건부 대조 손실은 데이터-데이터 관계 및 데이터-클래스 관계가 표현되는 임베딩 공간에서 정점 임베딩 또는 목적 임베딩과 유사도가 높은 다른 임베딩을 당겨 더 가까이 위치하도록 하고 유사도가 낮은 또 다른 임베딩을 밀어 더 멀리 위치하도록 처리되거나 학습될 수 있다.In one embodiment, the conditional contrast loss pulls another embedding with a high similarity to a vertex embedding or an objective embedding in an embedding space in which data-data relationship and data-class relationship are expressed to be located closer to each other, and another embedding with low similarity is used. It can be processed or learned to push and position further.

상기 기술적 과제를 해결하기 위한 본 발명의 또 다른 측면에 따른 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 장치는, 판별자에 입력되는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성하는 특징 추출 모델; 공통 특징 데이터를 진짜 이미지의 클래스 임베딩의 차원으로 사영하는 프로젝션 모델; 및 공통 특징 데이터 및 클래스 임베딩에 의한 조건부 대조 손실을 최소화하거나 조건부 대조 손실이 수렴하도록 동작하는 최적화 모델을 포함한다. 최적화 모델은 조건부 대조 손실을 이용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별하거나 진짜 이미지로 판별되는 가짜 이미지를 생성하거나 입력 이미지의 일부를 변경한 편집 이미지를 출력한다.An image generating and editing apparatus utilizing contrast learning and adversarial generating neural network according to another aspect of the present invention for solving the above technical problem extracts common features from an input image input to a discriminator to generate common feature data feature extraction model; a projection model that projects common feature data into the dimension of class embeddings of real images; and an optimization model that operates to minimize the conditional contrast loss due to common feature data and class embedding or to converge the conditional contrast loss. The optimization model uses conditional contrast loss to determine the authenticity of an input image based on data-data and data-class relationships between multiple image embeddings in the same dimension, or to generate a fake image that is determined to be real, or to generate a fake image that is Outputs the edited edited image.

일실시예에서, 상기 최적화 모델은. 데이터-데이터 관계 및 데이터-클래스 관계가 표현되는 임베딩 공간에서 정점 임베딩 또는 목적 임베딩과 유사도가 높은 다른 임베딩을 당겨 더 가까이 위치하도록 하고 유사도가 낮은 또 다른 임베딩을 밀어 더 멀리 위치하도록 동작하거나 학습될 수 있다.In one embodiment, the optimization model is . In the embedding space where the data-data relationship and data-class relationship are expressed, it can be operated or learned to pull another embedding with high similarity to the vertex embedding or objective embedding to be positioned closer and push another embedding with low similarity to be positioned further away. have.

일실시예에서, 이미지 생성 및 편집 장치는, 입력 이미지의 진위를 속이고 대조 손실이 적은 사실적 가짜 이미지를 생성하는 생성자를 더 포함할 수 있다. 가짜 이미지와 진짜 이미지는 입력 이미지로서 판별자에 입력될 수 있다.In one embodiment, the image creation and editing apparatus may further include a generator that deceives the authenticity of the input image and generates a realistic fake image with low contrast loss. The fake image and the real image may be input to the discriminator as input images.

상기 기술적 과제를 해결하기 위한 본 발명의 또 다른 측면에 따른 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 장치는, 조건부 대조 손실을 사용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 상기 입력 이미지의 진위를 판별하는 판별자; 및 입력 이미지의 진위를 속이고 대조 손실이 적은 사실적 이미지를 생성하는 생성자를 포함한다. 판별자는 조건부 대조 손실을 이용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별하거나 입력 이미지에 기초하여 가짜 이미지를 생성하거나 입력 이미지의 일부를 변경하여 출력한다.An image generation and editing apparatus utilizing contrast learning and adversarial generative neural network according to another aspect of the present invention for solving the above technical problem is a data-data relationship between multiple image embeddings in the same dimension using conditional contrast loss and a discriminator for determining the authenticity of the input image based on a data-class relationship; and a constructor that deceives the authenticity of the input image and generates a photorealistic image with low contrast loss. Discriminators use conditional contrast loss to determine the authenticity of an input image based on data-data relationships and data-class relationships between multiple image embeddings in the same dimension, or create fake images based on input images, or alter parts of an input image. to output

일실시예에서, 이미지 생성 및 편집 장치는 생성자를 판별자로부터 분리시키고 샘플이 입력되는 입력부를 판별자에 연결하는 생성자 분리부 또는 입력전환부를 더 포함하고, 입력 이미지로서 판별자에 입력되는 샘플을 신경망을 통해 처리하여 새로운 이미지를 생성하거나 입력 이미지를 편집한 편집 이미지를 출력할 수 있다.In an embodiment, the image generating and editing apparatus further comprises a generator separating unit or an input conversion unit that separates the generator from the discriminator and connects the input unit to which the sample is inputted to the discriminator, and receives the sample input to the discriminator as an input image. It can be processed through a neural network to generate a new image or to output an edited image obtained by editing an input image.

본 발명에 의하면, 조건부 이미지 생성을 위한 새로운 대조적대신경망(Contrastive Generative Adversarial Networks, ContraGAN)을 제공한다. ContraGAN은 데이터-클래스 관계 및 데이터-데이터 관계 모두에 기초하는 조건부 대조 손실(2C 손실) 즉, 대조 학습을 이용하여 우수한 성능으로 그리고 효율적으로 이미지를 생성하거나 이미지를 편집할 수 있다. 실제로, ContraGAN은 Tiny ImageNet 및 ImageNet 데이터 세트에서 각각 7.3% 및 7.7%까지 최첨단 결과를 개선한다는 것을 실험적으로 확인하였다.According to the present invention, a novel Contrastive Generative Adversarial Networks (ContraGAN) for conditional image generation is provided. ContraGAN uses conditional contrast loss (2C loss), that is, contrast learning based on both data-class relationships and data-data relationships, to create images or edit images with good performance and efficiency. Indeed, it has been experimentally confirmed that ContraGAN improves state-of-the-art results by 7.3% and 7.7%, respectively, on Tiny ImageNet and ImageNet datasets.

또한, 본 발명의 대조 학습을 이용하는 대조적대신경망에 의하면, 판별자의 과적합 문제를 완화하는 데 크게 기여할 수 있고, 일관성 정규화를 위한 데이터 증가없이 적대적 생성 신경망에서 유리한 결과를 얻을 수 있으며, 대용량의 데이터를 사용하여 일관성 정규화를 적용하는 경우에도 우수한 이미지 생성 혹은 이미지 편집 결과를 얻을 수 있다.In addition, according to the contrastive adversarial neural network using the contrast learning of the present invention, it can greatly contribute to alleviating the overfitting problem of the discriminator, and can obtain favorable results in the adversarial generative neural network without data increase for consistency normalization, and a large amount of data Excellent image creation or image editing results can be obtained even when consistency normalization is applied using

도 1은 기존의 적대적 생성 신경망(Generative Adversarial Networks, GAN)으로 생성한 얼굴로서, 학습데이터에는 존재하지 않는 새로운 얼굴을 생성한 결과에 대한 예시도이다.
도 2는 기존의 적대적 생성 신경망을 통해 생성한 화풍 변환 그림들에 대한 예시도이다.
도 3a는 비교예의 조건부 이미지 생성 모델의 일례로써, 추가 분류기 적대적 생성 신경망(auxiliary classifier GAN, ACGAN)의 주요 구조를 나타낸 도면이다.
도 3b는 비교예의 조건부 이미지 생성 모델의 다른 예로써, 사영 판별자 적대적 생성 신경망(GAN with projection discriminator, ProjGAN)을 각각 나타낸다.
도 4는 본 발명의 일실시예에 따른 대조적대신경망의 주요 구조를 나타낸 도면이다.
도 5는 도 4의 대조적대신경망의 대조 학습 알고리즘을 설명하기 위한 개략도이다.
도 6은 본 실시예에 따른 대조적대신경망의 메트릭 학습 결과를 비교예들과 함께 나타낸 도면이다.
도 7은 도 4의 대조적대신경망의 훈련 알고리즘에 대한 예시도이다.
도 8은 본 실시예에 따른 대조적대신경망을 사용하여 이미지넷 데이터 세트를 학습하고 생성한 결과를 예시한 도면이다.
도 9는 본 실시예에 따른 대조적대신경망의 여러 성능에 대한 스펙트럼 분석 결과를 비교예(ProjGAN)의 대응 결과와 대비하여 나타낸 그래프이다.
도 10은 본 발명의 다른 실시예에 따른 대조적대신경망을 활용하는 이미지 생성 및 편집 장치에 대한 블록도이다.
도 11은 도 10의 이미지 생성 및 편집 장치에서 대조적대신경망을 이용하여 선택적으로 이미지를 학습하거나 이미지를 생성 혹은 편집하는 과정을 설명하기 위한 블록도이다.
도 12는 본 발명의 또 다른 실시예에 따른 대조적대신경망을 활용하는 이미지 생성 및 편집 방법에 대한 흐름도이다.
도 13은 도 12의 이미지 생성 및 편집 방법의 변형 실시예를 설명하기 위한 흐름도이다.
도 14는 도 12의 이미지 생성 및 편집 방법에서 훈련 모드를 설명하기 위한 흐름도이다.
도 15는 본 발명의 또 다른 실시예에 따른 대조적대신경망을 활용하는 이미지 생성 및 편집 방법에서 신규 이미지를 생성하는 과정을 설명하기 위한 흐름도이다.
도 16은 본 발명의 또 다른 실시예에 따른 이미지 생성 및 편집 방법에서 화풍 변환된 이미지를 생성하는 과정을 설명하기 위한 흐름도이다.
도 17은 본 발명의 또 다른 실시예에 따른 이미지 생성 및 편집 방법에서 얼굴 이미지에 기초하여 아바타를 생성하는 과정을 설명하기 위한 흐름도이다.1 is a face generated by the existing generative adversarial networks (GAN), and is an exemplary diagram of a result of generating a new face that does not exist in the training data.
FIG. 2 is an exemplary diagram of style-transformation pictures generated through an existing adversarial generative neural network.
3A is a diagram illustrating the main structure of an auxiliary classifier GAN (ACGAN) as an example of a conditional image generation model of a comparative example.
3B is another example of the conditional image generation model of the comparative example, which shows a GAN with projection discriminator (ProjGAN), respectively.
4 is a diagram showing the main structure of a contrastive adversarial neural network according to an embodiment of the present invention.
5 is a schematic diagram for explaining the contrast learning algorithm of the contrastive neural network of FIG. 4 .
6 is a view showing the metric learning results of the contrastive adversarial network according to the present embodiment together with comparative examples.
7 is an exemplary diagram of a training algorithm of the contrastive neural network of FIG. 4 .
8 is a diagram illustrating a result of learning and generating an ImageNet data set using a contrastive adversarial neural network according to the present embodiment.
9 is a graph showing the results of spectral analysis for various performances of the contrastive adversarial network according to the present embodiment in comparison with the corresponding results of the comparative example (ProjGAN).
10 is a block diagram of an apparatus for generating and editing an image using a contrastive adversarial network according to another embodiment of the present invention.
FIG. 11 is a block diagram for explaining a process of selectively learning an image or generating or editing an image using a contrastive neural network in the image generating and editing apparatus of FIG. 10 .
12 is a flowchart of an image creation and editing method using a contrastive adversarial network according to another embodiment of the present invention.
13 is a flowchart for explaining a modified embodiment of the image creation and editing method of FIG. 12 .
14 is a flowchart for explaining a training mode in the image creation and editing method of FIG. 12 .
15 is a flowchart for explaining a process of generating a new image in an image creation and editing method using a contrastive adversarial network according to another embodiment of the present invention.
16 is a flowchart illustrating a process of generating a style-converted image in an image creation and editing method according to another embodiment of the present invention.
17 is a flowchart illustrating a process of generating an avatar based on a face image in an image generating and editing method according to another embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention. In describing each figure, like reference numerals have been used for like elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는 데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. "및/또는"이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, B, etc. may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. The term "and/or" includes a combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다When a component is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but it is understood that other components may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that there is no other element in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It is to be understood that this does not preclude the possibility of the presence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

본 발명의 구체적인 실시예를 설명하기에 앞서 본 발명의 창안 배경을 간략히 설명하면 다음과 같다.Before describing specific embodiments of the present invention, a brief description of the invention background of the present invention is as follows.

먼저 기존의 조건부 이미지 생성은 클래스 레이블 정보를 사용하여 다양한 이미지를 생성하는 작업이다. 많은 조건부 GAN(Generative Adversarial Networks)이 현실적인 결과를 보였지만 이러한 기존 방법은 이미지 임베딩과 해당 레이블 임베딩 즉, 데이터-클래스 관계(data-to-class relations) 간의 쌍 기반(pair-based) 관계를 조건 손실로 간주한다.First, the existing conditional image generation is the operation of generating various images using class label information. Although many conditional generative adversarial networks (GANs) have shown realistic results, these existing methods reduce the pair-based relationship between image embeddings and corresponding label embeddings, i.e. data-to-class relations, to loss of condition. consider

이에 본 실시예에서는 동일한 배치 즉, 데이터-데이터 관계(data-to-data relations)에서 여러 이미지 임베딩 간의 관계와 데이터-클래스 관계를 함께 고려한 조건부 대조 손실을 이용하는 대조적대신경망(ContraGAN)을 제안한다. 여기서, ContraGAN의 판별자(discriminator)는 주어진 샘플의 진위를 판별하고 훈련 이미지 간의 관계를 학습하기 위한 대조적 대상을 최소화한다. 그리고 생성자(generator)는 진위를 속이고 대조 손실이 적은 사실적인 이미지를 생성하려고 동작한다.Accordingly, the present embodiment proposes a ContraGAN (ContraGAN) using a conditional contrast loss in consideration of the relationship between multiple image embeddings and the data-class relationship in the same arrangement, that is, data-to-data relations. Here, ContraGAN's discriminator determines the authenticity of a given sample and minimizes the contrasting object for learning the relationship between training images. And the generator works to deceive authenticity and produce realistic images with low contrast loss.

전술한 본 발명의 구성을 기존의 조건부 GAN(conditional generative adversarial netwokrs)에 기초하여 좀더 구체적으로 설명하면 다음과 같다.The configuration of the present invention described above will be described in more detail based on the existing conditional generative adversarial networks (GANs).

조건부 이미지 생성은 웹 크롤링된 이미지와 이미지의 카테고리 예컨대, 고양이, 개 등의 이미지 종류에 대한 정보를 활용하여 적대적 생성 신경망을 안정적으로 학습시키는 방법 중 하나이다.Conditional image generation is one of methods for stably training an adversarial generated neural network by using information about web crawled images and image categories such as cats and dogs.

가장 대표적인 방법은 미야토 등(Miyato et al.)에 의해 제안된 사영 판별자(projection discriminator)를 구비한 GAN인데 이는 주어진 이미지 카테고리를 판별자의 특징맵에 사영하여 적대적 학습을 수행하는 모델이다.The most representative method is a GAN with a projection discriminator proposed by Miyato et al., which is a model that performs adversarial learning by projecting a given image category onto the feature map of the discriminator.

하지만, 사영 판별자는 적대적 학습의 과적합 문제(overfitting)에 취약하며 모든 이미지를 서로 독립적이라고 가정하고 학습하기 때문에 이미지들 사이의 유의미한 상관관계를 학습하기 어려운 문제를 가지고 있다.However, the projective discriminator is vulnerable to the overfitting problem of adversarial learning and has a problem in that it is difficult to learn meaningful correlations between images because all images are assumed to be independent of each other and learned.

이에 본 발명에서는 위의 두 가지 문제 즉, 과적합 문제와 상관관계 학습이 어려운 문제를 해결하기 위해 새로운 프레임워크인 대조 학습을 통한 적대적 생성 신경망(간략히 '대조적대신경망')을 제안한다.Accordingly, the present invention proposes a new framework, an adversarial generative neural network through contrast learning (simply a 'contrastive adversarial neural network') to solve the above two problems, namely, the overfitting problem and the difficult problem of correlation learning.

대조적대신경망은 자기 지도 대조 학습에서 널리 사용되는 대조 손실을 유의미하게 개선한 것으로 효율적인 학습을 위해 이미지 증강을 이미지 카테고리 특징으로 대체하고, 거짓양성(false positive)이 학습에 주는 효과를 없애기 위해 양성 덧셈 기술을 추가한 것이다.The contrastive neural network significantly improves contrast loss, which is widely used in self-supervised contrast learning, and replaces image augmentation with image category features for efficient learning, and positive addition to eliminate the effect of false positives on learning. technology has been added.

이하, 본 발명의 바람직한 실시예를, 첨부한 도면들을 참조하여 보다 상세하게 설명한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings.

도 3a는 비교예의 조건부 이미지 생성 모델의 일례로써, 추가 분류기 적대적 생성 신경망(auxiliary classifier GAN, ACGAN)의 주요 구조를 나타낸 도면이다. 도 3b는 비교예의 조건부 이미지 생성 모델의 다른 예로써, 사영 판별자 적대적 생성 신경망(GAN with projection discriminator, ProjGAN)을 각각 나타낸다. 그리고 도 4는 본 발명의 일실시예에 따른 대조적대신경망의 주요 구조를 나타낸 도면이다.3A is a diagram illustrating the main structure of an auxiliary classifier GAN (ACGAN) as an example of a conditional image generation model of a comparative example. 3B is another example of the conditional image generation model of the comparative example, which shows a GAN with projection discriminator (ProjGAN), respectively. And FIG. 4 is a diagram showing the main structure of a contrastive adversarial neural network according to an embodiment of the present invention.

기존의 조건부(conditional) GAN의 일반적인 접근은 생성자(generator)와 판별자(discriminator)에 레이블 정보(label information)를 입력하는 것이다. 예를 들어, 조건부 GAN의 일종인 ACGAN(auxiliary classifier GAN)은 판별자의 합성곱 레이어들 상에 추가 분류기를 부착하여 이미지의 클래스를 판별하도록 구성된다. 그리고, 조건부 GAN의 또 다른 형태인 프로젝션 GAN(ProjGAN)은 클래스 임베딩과 내적(inner product)을 활용하여 적대적 손실을 최적화하도록 구성된다.A general approach of the existing conditional GAN is to input label information into a generator and a discriminator. For example, ACGAN (auxiliary classifier GAN), which is a kind of conditional GAN, is configured to determine the class of an image by attaching an additional classifier on the convolutional layers of the discriminator. And, another form of conditional GAN, projection GAN (ProjGAN), is configured to optimize adversarial loss by utilizing class embedding and inner product.

즉, 도 3(a)를 참조하면, 비교예의 ACGAN의 판별자(이하 메인 판별자)는 두 개의 분류기와 보조 분류기(auxiliary classifer)를 구비한다. 본 명세서에서는 보조 분류기와의 구별을 위해 메인 판별자의 두 개의 분류기를 특징 추출기(Dφ₁)와 판별자(Dφ₂)로 각각 지칭하기로 하고, 보조 분류기를 간략히 분류기(classifier)라고 지칭하기로 한다. 보조 분류기는 추가 분류기로도 지칭될 수 있다.That is, referring to FIG. 3A , the discriminator of the ACGAN of the comparative example (hereinafter, the main discriminator) includes two classifiers and an auxiliary classifer. In this specification, two classifiers of the main discriminator will be referred to as a feature extractor (Dφ ₁ ) and a discriminator (Dφ ₂ ) to distinguish them from the auxiliary classifier, and the auxiliary classifier will be briefly referred to as a classifier. . A secondary classifier may also be referred to as an additional classifier.

특징 추출기(Dφ₁)는 입력 이미지(input image(x))의 데이터에서 공통된 특징을 추출하도록 기능한다. 특징 추출기(Dφ₁)에 의해 입력 이미지 데이터로부터 공통된 특징이 추출된 데이터(Dφ₁(x))(이하 공통 특징 데이터)는 판별자(Dφ₂)의 입력과 분류기(classifier)의 입력으로 각각 전달된다.The feature extractor Dφ ₁ functions to extract common features from data of the input image(x). Data (Dφ ₁ (x)) (hereinafter, common feature data) from which common features are extracted from the input image data by the feature extractor Dφ ₁ is transmitted to the input of the discriminator Dφ ₂ and the input of the classifier, respectively do.

판별자(Dφ₂)는 공통 특징 데이터((Dφ₁(x))를 통해 입력 이미지(input image(x))가 진짜(real)인지 혹은 가짜(fake)인지를 판별한다. 판별자(Dφ₂)의 출력은 적대적 손실(adversarial loss)에 대응된다.The discriminator Dφ ₂ determines whether the input image (x) is real or fake through the common feature data (Dφ ₁ (x)) _. ) corresponds to the adversarial loss.

분류기(classifier)는 공통 특징 데이터(Dφ₁(x))를 통해 입력 이미지의 범주를 분류한다. 분류기(classifier)의 출력은 원본 또는 진짜 이미지(input label (y))와의 차이에 기초하는 분류 손실(classification loss)에 대응된다.The classifier classifies the category of the input image through the common feature data (Dφ ₁ (x)). The output of the classifier corresponds to a classification loss based on the difference from the original or real image (input label (y)).

ACGAN의 메인 판별자 또는 판별자(Dφ₂)는 진짜 이미지에 대응하는 입력 레이블(input label(y))을 기준으로 손실 값이 거짓(falt)이 되도록 하는 방향으로 학습된다.The main discriminator or discriminator (Dφ ₂ ) of ACGAN is learned in such a way that the loss value becomes false based on the input label (input label(y)) corresponding to the real image.

이러한 ACGAN으로 생성되는 데이터는 다른 분류기에 넣어도 범주 분류가 잘 수행되는 경향이 있다. 즉, ACGAN은 보조 분류기가 생성자를 가이드하여 클래스 분류가 잘되는 이미지들을 합성하거나 이치에 맞는 데이터를 생성하는데 적합하다.The data generated by such ACGAN tends to be classified well even when put into other classifiers. In other words, ACGAN is suitable for synthesizing images with good class classification or generating reasonable data as the auxiliary classifier guides the constructor.

ACGAN의 생성자는 일반적인 조건부 GAN의 경우와 유사하게 클래스(class) 등의 레이블 정보와 노이즈(noise)를 합쳐 가짜 이미지 또는 가짜 데이터를 생성한다. 가짜 데이터는 입력 이미지를 포함한 데이터와 함께 판별자에 입력된다.ACGAN generator generates fake image or fake data by combining label information such as class and noise, similar to the case of general conditional GAN. The fake data is input to the discriminator along with the data including the input image.

ACGAN의 목적함수는 기존 조건부 GAN의 판별자의 목적함수와 동일할 수 있다. 다시 말해 ACGAN의 목적함수는 해당 데이터가 진짜인지 가짜인지를 판별해내는 것과 해당 데이터의 범주를 분류하는 것에 해당할 수 있다.The objective function of the ACGAN may be the same as the objective function of the discriminator of the existing conditional GAN. In other words, the objective function of ACGAN can correspond to determining whether the data is real or fake and classifying the data category.

이러한 ACGAN은 이미지 클래스들 사이의 밀고 당기는 정보를 학습하는데 유용하지만, 클래스의 개수가 많아지면 학습이 오래 걸리고 학습이 어려운 문제가 있다. 이에 본 실시예에서는 이미지 데이터와 클래스 간의 관계 외에 이미지 데이터와 이미지 데이터 간의 관계를 함께 활용하는 대조 학습(contrastive learning)을 통해 세밀한 많은 데이터 세트에서도 이미지 합성을 잘 수행하는 신경망을 제공한다.Such ACGAN is useful for learning push and pull information between image classes, but if the number of classes increases, learning takes a long time and learning is difficult. Accordingly, the present embodiment provides a neural network that performs image synthesis well even in a large number of detailed data sets through contrastive learning that utilizes the relationship between image data and image data together in addition to the relationship between image data and classes.

또한, 도 3(b)에 도시한 바와 같이, 비교예의 프로젝션 GAN(ProjGAN)은 특징 추출기(Dφ₁)에서 추출된 공통 특징 데이터(Dφ₁(x))와, 진짜 이미지(input label (y))의 클래스 임베딩(class embedding(e))과의 내적(inner product)을 통해 ACGAN의 학습과 성능을 향상시키도록 구성된다.In addition, as shown in Fig. 3(b), the projection GAN (ProjGAN) of the comparative example includes the common feature data (Dφ ₁ (x)) extracted from the feature extractor (Dφ ₁ ) and the real image (input label (y) ) is configured to improve the learning and performance of ACGAN through the inner product with the class embedding(e).

여기서 클래스 임베딩(class embedding)은 진짜 이미지(input label(y))에서 필요한 정보를 보존하면서 진짜 이미지의 패턴을 찾거나 벡터화하는 과정을 지칭한다. 클래스 임베딩은 진짜 이미지의 스케일이 너무 커서 값의 분포 범위가 넓어지면, 값을 정하기가 어렵기 때문에 L2 거리(distance)를 구한 후에 이것으로 나누어서 정규화(narmalization)하는 과정을 포함해야 한다.Here, class embedding refers to the process of finding or vectorizing the pattern of the real image while preserving the necessary information in the real image (input label(y)). Class embedding should include the process of normalizing by dividing the L2 distance after finding the L2 distance because it is difficult to determine the value if the scale of the real image is too large and the value distribution range is wide.

이와 같이, 프로젝션 GAN은 판별자의 과적합(overfitting) 문제가 적지 않게 발생하고, 일관성 정규화를 위해 데이터를 추가로 제공해야 하는 등의 문제가 있다.As such, in the projection GAN, a problem of overfitting the discriminator occurs not a little, and there are problems such as additional data must be provided for consistency normalization.

한편, 도 4에 도시한 바와 같이, 본 실시예에 따른 대조적대신경망(contrastive generative adversarial network, ContraGAN)은, ACGAN과 ProjGAN의 조건부 기능(conditioning functions)이 훈련 샘플의 데이터-클래스 관계만을 고려하는 쌍 기반 손실(pair-based losses)로써 해석될 수 있다는 점에 착안하여 ACGAN의 장점과 ProjGAN의 장점을 효과적으로 조합한다. 즉, ContraGAN은 ACGAN에서 입력 이미지의 정보를 활용할 수 있다는 장점과 ProjGAN에서 학습이 쉬운 장점을 함께 고려한 조건부 대조 손실 즉, 대조 학습을 활용한다.On the other hand, as shown in FIG. 4 , in the contrastive generative adversarial network (ContraGAN) according to the present embodiment, the conditional functions of ACGAN and ProjGAN are the data of the training sample - a pair that considers only the class relationship It effectively combines the strengths of ACGAN and ProjGAN with the fact that they can be interpreted as pair-based losses. In other words, ContraGAN utilizes conditional contrast loss, that is, contrast learning, considering both the advantage of using the information of the input image in ACGAN and the advantage of easy learning in ProjGAN.

다시 말해서, 본 실시예에 따른 대조적대신경망은 특징 추출기(Dφ₁, 10)에서 입력 이미지(input image (x))의 공통된 특징을 추출한 것에 대응하는 공통 특징 데이터(Dφ₁(x))와, 진짜 이미지(input label (y))를 클래스 임베딩 모델(class embedding (e), 40)을 통해 얻은 원본 특성 벡터를 이용하여 조건부 대조 손실(conditional contrastive loss, 2C loss)을 형성하고 이를 이용한다.In other words, the contrastive adversarial network according to this embodiment has common feature data Dφ ₁ (x) corresponding to extracting common features of the input image (x) from the feature extractors Dφ ₁ , 10, and A conditional contrastive loss (2C loss) is formed using the original feature vector obtained through the class embedding model (class embedding (e), 40) of the real image (input label (y)) and used.

이때, 조건부 대조 손실(2C loss)을 형성하는데 있어, 공통 특징 데이터(Dφ₁(x))의 차원은 원본 특성 벡터의 차원과 기본적으로 상이하다. 따라서 본 실시예에서는 프로젝션 모델(projection (h), 30)을 통해 공통 특징 데이터(Dφ₁(x))를 원본 특성 벡터에 투영시켜 동일한 차원으로 변환시킬 수 있다.In this case, in forming the conditional contrast loss (2C loss), the dimension of the common feature data Dφ ₁ (x) is fundamentally different from the dimension of the original feature vector. Accordingly, in the present embodiment, the common feature data Dφ ₁ (x) may be projected onto the original feature vector through the projection model (projection (h), 30 ) to be converted into the same dimension.

조건부 대조 손실(2C loss)은 데이터-데이터 관계 및 데이터-클래스 관계 모두를 고려한다. 데이터-데이터 관계를 활용하기 위해 자가지도 학습(self-supervised learning)이나 메트릭 학습(metric learning)에 사용되는 손실 함수를 적절하게 활용할 수 있다.The conditional contrast loss (2C loss) considers both data-data relationships and data-class relationships. In order to utilize the data-data relationship, a loss function used in self-supervised learning or metric learning can be appropriately utilized.

즉, 본 실시예의 ContraGAN에서는 판별자와 생성자에 메트릭 학습이나 자기지도 학습 목표를 추가하여 레이블에 따른 임베디드 이미지 특징들 간의 거리를 명시적으로 제어하도록 구현된다. 메트릭 학습 손실의 적절한 후보로는 대조 손실(contrastive loss), 3중 손실(triplet loss), 4중 손실(quadruplet loss) 및 N쌍 손실(N-pair loss) 등이 사용될 수 있다. 3중 손실(triplet loss)이나 4중 손실(quadruplet loss)의 처리에는 더 많은 훈련 복잡도가 요구되고 훈련 시간이 더 길어질 수 있으나, 필요에 따라 채택가능하다.That is, the ContraGAN of this embodiment is implemented to explicitly control the distance between embedded image features according to labels by adding metric learning or self-supervised learning goals to the discriminator and the generator. Suitable candidates for the metric learning loss may include a contrastive loss, a triplet loss, a quadruplet loss, and an N-pair loss. The processing of triplet loss or quadruplet loss requires more training complexity and may take longer training time, but it can be adopted as needed.

또한, 프록시 기반 손실(proxy-based losses)은 훈련 가능한 클래스 임베딩 벡터를 사용하여 마이닝 복잡성을 완화하지만 이러한 손실은 데이터-데이터 관계를 명시적으로 고려하지 않으나, 본 실시예에서는 조건부 대조 손실(2C loss)을 통해 이러한 문제를 해결한다. 즉, 조건부 대조 손실(2C loss)을 이용하는 대조적대신경망(ContraGAN)은, 입력 이미지의 데이터-클래스 관계와 데이터-데이터 간의 관계를 함께 고려함으로써 입력 이미지의 정보를 활용하면서 대용량 데이터에서도 학습이 쉬운 장점을 가진다.Also, proxy-based losses mitigate mining complexity using trainable class embedding vectors, but these losses do not explicitly take into account data-data relationships, but in this example conditional contrast losses (2C loss) ) to solve these problems. In other words, ContraGAN using conditional contrast loss (2C loss) has the advantage that it is easy to learn even with large data while utilizing information from the input image by considering the data-class relationship of the input image and the relationship between the data-data together. have

도 5는 도 4의 대조적대신경망의 대조 학습 알고리즘의 작동 원리를 설명하기 위한 개략도이다. 그리고 도 6은 본 실시예의 대조적대신경망의 메트릭 학습 손실과 비교예들의 메트릭 학습 손실들을 도식화하여 보여준다.FIG. 5 is a schematic diagram for explaining the operation principle of the contrast learning algorithm of the contrastive neural network of FIG. 4 . And FIG. 6 schematically shows the metric learning loss of the contrastive adversarial network of this embodiment and the metric learning loss of the comparative examples.

조건부 GAN에서 모든 학습 손실은 통상 동일한 라벨이 있는 경우 샘플을 수집하고 그렇지 않으면 멀리 유지하도록 설계된다. 비교예의 ProjGAN의 손실 함수를 사용하는 경우, 참조(reference)가 실제 이미지일 때 참조와 해당 클래스 임베딩이 서로 가까워지지만, 그렇지 않은 경우에 멀리 밀려나게 된다.All learning losses in conditional GANs are usually designed to collect samples if they have the same label and keep them away if they don't. In the case of using the loss function of ProjGAN in the comparative example, the reference and the corresponding class embedding are close to each other when the reference is an actual image, but are pushed away if not.

한편, 본 실시예의 대조적대신경망에서는 도 5에 도시한 바와 같이, 판별자가 동일한 클래스의 실제 이미지 임베딩 사이의 거리를 최소화하고, 그렇지 않으면 최대화하여 자체 업데이트를 수행하도록 하며, 클래스 임베딩이 조건부 대조(conditional contrastive, 2C) 손실을 통해 관련되도록 강제함으로써 판별자가 실제 이미지의 세밀한 표현을 학습할 수 있도록 하고, 생성자가 클래스 내 특성(intra-class characteristics) 및 실제 이미지의 고차 표현(higher-order representations)과 같은 판별자의 지식을 활용하여 더욱 사실적인 이미지를 생성할 수 있도록 한다.On the other hand, in the contrastive adversarial network of this embodiment, as shown in Fig. 5, the discriminator minimizes and otherwise maximizes the distance between actual image embeddings of the same class to perform self-updating, and class embeddings are conditional contrastive, 2C) allows discriminators to learn detailed representations of real-world images by forcing them to be related through loss, and allows constructors to learn more detailed representations of real-world images, such as intra-class characteristics and higher-order representations of real-world images. It makes use of the discriminator's knowledge to create a more realistic image.

즉, 도 5에서 빗금 종류나 색상(적색, 청색, 노랑색)은 클래스 레이블을 나타내고, 모양은 역할을 나타낸다. 즉, 원(circle) 모양(C1)은 손실이 적용되는 원본 또는 참조 이미지를 임베딩한 것을, 사각(square) 모양(R1, R2, R3, R4, R5, R6, R7)은 입력 이미지를 임베딩한 것을, 별(star) 모양(S1, S2, S3)은 클래스 레이블을 임베딩한 것을 각각 나타낸다. 그리고 실선 또는 빨간 색상의 선 두께와 점선 또는 파란 색상의 선 두께는 기재된 순서대로 당기는 힘의 강도와 미는 힘의 강도를 각각 나타낸다.That is, in FIG. 5 , the type or color of the hatching (red, blue, yellow) represents the class label, and the shape represents the role. That is, the circle shape (C1) is the embedding of the original or reference image to which the loss is applied, and the square shape (R1, R2, R3, R4, R5, R6, R7) is the embedding of the input image. That is, the star shape (S1, S2, S3) represents the embedding of the class label, respectively. And the thickness of the solid line or red line and the dotted line or blue line thickness indicate the strength of pulling force and strength of pushing force, respectively, in the order described.

본 실시예의 2C 손실(f)의 또 다른 경우와 비교예의 메트릭 학습 손실(a, b, c) 및 또 다른 비교예의 조건부 GAN(d, e)의 비교을 위해 이를 도식화하여 나타내면 도 6과 같다.Another case of the 2C loss (f) of this embodiment, the metric learning loss (a, b, c) of the comparative example, and the conditional GAN (d, e) of another comparative example are schematically shown in FIG. 6 .

도 6에서 메트릭 학습 손실의 (a)는 트리플렛(Triplet), (b)는 P-NCA(proxy-neighborhood component analysis) 및 (c)는 NT-Xent(normalized temperature-scaled cross entropy)를 각각 나타내고, 다른 비교예의 조건부 GAN의 (d)는 ACGAN을, (e)는 ProjGAN을 각각 나타낸다.6, (a) of the metric learning loss represents a triplet, (b) is P-NCA (proxy-neighborhood component analysis), and (c) represents NT-Xent (normalized temperature-scaled cross entropy), respectively, In the conditional GAN of another comparative example, (d) represents ACGAN and (e) represents ProjGAN, respectively.

또한 도 6에서 색상은 클래스 레이블을 나타내고, 모양은 역할을 나타낸다. 즉, 원(circle) 모양은 원본 또는 참조 이미지를 임베딩한 것을, 사각(square) 모양은 입력 이미지를 임베딩한 것을, 다이아몬드(diamond) 모양은 증강된 이미지를 임베딩한 것을, 별(star) 모양은 클래스 레이블을 임베딩한 것을, 삼각형(triangle) 모양은 클래스 레이블을 원-핫 인코딩(one-hot encoding)한 것을 각각 나타낸다. 그리고 실선 또는 빨간 색상의 선 두께와 점선 또는 파란 색상의 선 두께는 기재된 순서대로 당기는 힘의 강도와 미는 힘의 강도를 각각 나타낸다.Also, in FIG. 6 , a color indicates a class label, and a shape indicates a role. That is, the circle shape represents the embedding of the original or reference image, the square shape represents the embedding of the input image, the diamond shape represents the embedding of the augmented image, and the star shape represents the embedding of the input image. The embedding of the class label and the triangle shape represent one-hot encoding of the class label, respectively. And the thickness of the solid line or red line and the dotted line or blue line thickness indicate the strength of pulling force and strength of pushing force, respectively, in the order described.

도 6에 도시한 바와 같이, 비교예의 ACGAN 및 ProjGAN과 달리, 본 실시예의 이미지 생성 및 합성에 사용되는 2C 손실은 입력 이미지들이나 진짜 이미지를 포함한 훈련 샘플들 간의 데이터-클래스(data-to-class) 관계 및 데이터-데이터(data-to-data) 관계를 고려할 수 있다.As shown in Fig. 6, unlike ACGAN and ProjGAN of the comparative example, the 2C loss used for image generation and synthesis of this example is data-to-class between training samples including input images or real images. Relationships and data-to-data relationships can be considered.

전술한 2C 손실을 도입하기 전에, NT-Xent 손실을 가지고 본 실시예의 대응 부분을 표현해 볼 수 있다. 즉, 훈련 이미지들에서 랜덤하게 샘플링된 미니배치와 대응 클래스 레이블들의 집합이 있다고 가정하고, 딥 신경망 인코더와 새로운 단위의 초구(hypersphere) 상에 임베드되는 프로젝션 레이어를 정의할 수 있다. NT-Xent 손실은 비지도 학습을 위한 것으로, 원본 이미지와 증대 이미지 사이의 데이터-데이터 관계를 고려하기 위해 증대 영상을 양성 샘플로 간주할 수 있다.Before introducing the aforementioned 2C loss, the corresponding part of this embodiment can be expressed with NT-Xent loss. That is, assuming that there is a set of randomly sampled mini-batch and corresponding class labels from training images, a deep neural network encoder and a projection layer embedded on a new unit hypersphere can be defined. NT-Xent loss is for unsupervised learning, and augmented images can be considered as positive samples to consider the data-data relationship between the original and augmented images.

하지만, 2C 손실과 비교할 때, NT-Xent는 레이블 정보에서 감독이 없기 때문에 동일한 클래스의 이미지 임베딩을 거의 수집하기 어렵다. 게다가, NT-Xent 손실에는 추가 데이터 증가와 추가 순전파 및 추가 역전파가 필요하다.However, compared with the 2C loss, NT-Xent hardly collects image embeddings of the same class because there is no supervision in the label information. Moreover, NT-Xent loss requires additional data growth and additional forward propagation and additional backpropagation.

이와 같이, 본 실시예에서 사용하는 2C 손실은 라벨 정보에 대한 약한 감독을 활용하고, 추가 데이터 증가와 추가 순전파 및 추가 역전파가 필요하지 않으므로, NT-Xext 손실이 있는 비교예의 모델보다 학습 시간이 최소 몇 배 더 짧아질 수 있다.As such, the 2C loss used in this example utilizes weak supervision of label information, and additional data increase and additional forward propagation and additional backpropagation are not required, so the learning time is shorter than the model of the comparative example with NT-Xext loss. This can be at least several times shorter.

다시 말해, 비교예들(도 7의 (a) 내지 (e) 참조)과 대비할 때, 본 실시예(f)의 2C 손실은 데이터-데이터 관계 및 데이터-클래스 관계를 고려하고 데이터 증가없이 전체 정보를 추론함으로써 관련 객체들 간의 유사도(similarity)를 효과적으로 측정하여 나타낼 수 있다.In other words, when compared with the comparative examples (see (a) to (e) of Fig. 7), the 2C loss of this embodiment (f) considers the data-data relationship and the data-class relationship, and the total information without data increase By inferring , it is possible to effectively measure and represent the similarity between related objects.

도 7은 도 4의 대조적대신경망의 훈련 알고리즘에 대한 예시도이다.7 is an exemplary diagram of a training algorithm of the contrastive neural network of FIG. 4 .

도 7을 참조하면, 본 실시예의 대조적대신경망(ContraGAN)의 훈련을 위한 알고리즘 1(Algorithm 1)의 4행(4:) 및 5행(5:)에 기재된 바와 같이, 판별자 훈련 단계에서 m개의 실제 이미지와 생성자 훈련 단계에서 생성된 m개의 중간 이미지를 사용하여 2C 손실을 계산하는 것을 알 수 있다. 중간 이미지는 공통 특징 이미지 혹은 공통 특징 데이터에 대응된다.Referring to Figure 7, as described in lines 4 (4:) and 5 (5:) of Algorithm 1 for training of the ContraGAN of this embodiment, m in the discriminator training step It can be seen that the 2C loss is calculated using m real images and m intermediate images generated in the generator training step. The intermediate image corresponds to a common feature image or common feature data.

알고리즘 1에 나타낸 바와 같이, 판별자는 동일한 클래스의 실제 이미지 임베딩 사이의 거리를 최소화하고, 그렇지 않으면 최대화하여 자체 업데이트를 수행한다.As shown in Algorithm 1, the discriminator performs self-updating by minimizing, otherwise maximizing the distance between actual image embeddings of the same class.

더욱이, 본 실시예에서는 클래스 임베딩이 2C 손실을 통해 관련되도록 강제함으로써 판별자는 실제 이미지의 세밀한 표현을 학습할 수 있다. 마찬가지로, 생성자는 클래스 내 특성(intra-class characteristics) 및 실제 이미지의 고차 표현(higher-order representations)과 같은 판별자의 지식을 활용하여 더욱 사실적인 이미지를 생성할 수 있다.Moreover, in this embodiment, by forcing the class embeddings to be related through a 2C loss, the discriminator can learn a detailed representation of the real image. Similarly, constructors can utilize the knowledge of discriminators, such as intra-class characteristics and higher-order representations of real-world images, to create more realistic images.

전술한 조건부 대조 손실(2C 손실)을 사용하는 본 실시예의 이미지 생성 및 합성 방법 또는 그 프레임워크(즉, ContraGAN)에서는 조건부 GAN의 일반적인 훈련 절차와 유사하게, 적대 손실을 계산하는 판별자를 훈련하는 판별자 훈련 단계와 생성자 훈련 단계를 갖는다. ContraGAN은 위의 훈련들을 기반으로 진짜 이미지 또는 가짜 이미지 세트를 사용하여 2C 손실을 추가로 계산하도록 구성된다.In the image generation and synthesis method of this embodiment or its framework (i.e. ContraGAN) using the aforementioned conditional contrast loss (2C loss), similar to the general training procedure of conditional GAN, discriminant training discriminator calculating adversarial loss It has a child training phase and a constructor training phase. ContraGAN is configured to further calculate the 2C loss using either real or fake image sets based on the above trainings.

전술한 조건부 대조 손실을 이용하는 대조적대신경망은, 기존의 방법들과는 달리, 이미지들 사이의 공통점 및 차이점을 학습하고 각 카테고리의 세세한 특징을 잘 파악하여 이미지를 생성할 수 있으며, 손실 함수가 셀 수 없을 정도로 많은 이미지 쌍에 의해서 결정되기 때문에 과적합 문제에도 강건한 특징이 있다.Contrastive neural networks using the conditional contrast loss described above, unlike existing methods, learn commonalities and differences between images, and can generate images by well understanding the detailed characteristics of each category, and the loss function cannot be counted. Since it is determined by a large number of image pairs, it is robust to the overfitting problem.

아래의 수학식 1은 본 실시예에서 제안하는 대조적대신경망의 손실함수를 수식으로 표현한 것이다.Equation 1 below expresses the loss function of the contrastive neural network proposed in this embodiment as an equation.

수학식 1에서,

는 이미지,

는 이미지 레이블,

는 레이블 임베딩 모델,

은 이미지 임베딩 모델,

는 하드니스 조절을 위한 하이퍼매개변수를 각각 나타낸다. 또한,

는 레퍼런스 샘플(reference sample, )에 대한 이미지 임베딩 함수를,

는 클래스 임베딩 함수를,

는 클래스 임베딩을,

는 데이터-클래스 관계를,

는 데이터-데이터 관계를 각각 나타낸다.In Equation 1,

is the image,

is the image label,

is the label embedding model,

silver image embedding model,

denotes hyperparameters for hardness control, respectively. In addition,

is the image embedding function for the reference sample,

is the class embedding function,

is the class embedding,

is the data-class relationship,

represents a data-data relationship, respectively.

본 실시예에서 대조적대신경망의 손실함수는, 깊은 신경망 인코더 S(x)와 반지름이 1인 단위 초구(hypersphere, h에 임베딩된 프로젝션 레이어를 정의한 후, 깊은 신경망 인코더(S)를 단위 초구 함수(h())에 반영한 합성 결과을 이용하여 초구에 데이터 공간(data space)을 매핑하도록 구성될 수 있다.In this embodiment, the loss function of the contrastive neural network is defined as a deep neural network encoder S(x) and a projection layer embedded in a hypersphere (h) with a radius of 1, and then the deep neural network encoder (S) is used as a unit hypersphere function ( It can be configured to map the data space to the initial sphere using the synthesis result reflected in h()).

여기서, 대조적대신경망은 판별자의 제1 판별 신경망인 특징 추출기(Dφ₁)의 일부분을 완전 결합 레이어(fully connected layer) 앞에서 인코더 네트워크(S)로 사용하고, Φ로 매개변수화된(marameterized) 다중 레이어 퍼셉트론(multi-layer perceptrons)을 사영 헤드(projection head)(h)로 사용할 수 있다.Here, the contrastive adversarial network uses a part of the feature extractor (Dφ ₁ ), which is the first discriminant neural network of the discriminator, as the encoder network (S) in front of the fully connected layer, and multi-layer parameterized with Φ. Multi-layer perceptrons can be used as the projection head (h).

본 실시예에 따른 학습 알고리즘은 알고리즘을 공평하고 철저하게 분석하기 위해 적대적 생성 신경망 학습을 위한 통합된 소프트웨어 라이브러리를 구축하여 실험하였다.The learning algorithm according to this embodiment was tested by building an integrated software library for learning adversarial generative neural networks in order to analyze the algorithm fairly and thoroughly.

본 실시예의 소프트웨어는 총 18개의 적대적 생성 신경망, 4가지의 학습 기술(병렬분산처리, 혼합 정밀도 학습, 배치 동기화, 배치 통계량 누적), 4가지 분석 기법(이미지 시각화, 최근접 이웃 분석, 선형 보간 분석, 주파수 분석), 마지막으로 4가지 평가 측도(인셉션 점수, 프레쳇 인셉션 거리, 정밀도, 제현율)에서 선택되는 임의의 구성을 가질 수 있다. The software of this example has a total of 18 adversarial generative neural networks, 4 learning techniques (parallel variance processing, mixed precision learning, batch synchronization, batch statistics accumulation), 4 analysis techniques (image visualization, nearest neighbor analysis, linear interpolation analysis) , frequency analysis), and finally, it can have any configuration selected from the four evaluation measures (inception score, prechet inception distance, precision, and salience rate).

18개의 적대적 생성 신경망은 DCGAN, LSGAN, GGAN, WGAN-WC, WGAN-GP, WGAN-DRA, ACGAN, ProjGAN, SNGAN, SAGAN, BigGAN, BigGAN-Deep, CRGAN, ICRGAN, LOGAN, DiffAugGAN, ADAGAN, ContraGAN을 포함하나, 이에 한정되지는 않는다.18 adversarial generative neural networks include DCGAN, LSGAN, GGAN, WGAN-WC, WGAN-GP, WGAN-DRA, ACGAN, ProjGAN, SNGAN, SAGAN, BigGAN, BigGAN-Deep, CRGAN, ICRGAN, LOGAN, DiffAugGAN, ADAGAN, ContraGAN including, but not limited to.

본 실시예의 대조적대신경망의 성능을 비교예의 모델들과 대비하고, 대조적대신경망이 과적합에 강건하다는 실험결과를 나타내면 다음의 표 1과 같다.Table 1 below shows the experimental results showing that the performance of the contralateral neural network of this example is compared with the models of the comparative example, and that the contralateral neural network is robust to overfitting.

표 1은 CIFAR10 데이터 이미지 생성 실험 결과, 프레쳇 인셉션 거리를 제외한 모든 측도는 값이 높으면 성능이 좋다는 것을 나타낸다. 볼드체로 표기한 부분은 본 실시예의 대조 학습을 활용하는 대조적대신경망(ContraGAN)와 그 변형예(CRGAN, ICRGAN, DiffAugGAN)의 결과이다.Table 1 shows the results of the CIFAR10 data image generation experiment, indicating that the higher the value, the better the performance of all measures except the prechet inception distance. The parts indicated in bold are the results of the ContraGAN and its variants (CRGAN, ICRGAN, DiffAugGAN) using the contrast learning of this embodiment.

다음으로, 이미지넷 데이터세트에 대해 이미지 생성 실험을 수행하였고, 그 결과는 아래의 도 8과 같다. 도 8은 본 실시예에 따른 방법에 따라 이미지넷 데이터세트에 대해 학습한 후, 생성한 결과를 나타낸다.Next, an image generation experiment was performed on the ImageNet dataset, and the results are shown in FIG. 8 below. 8 shows the results generated after learning about the ImageNet dataset according to the method according to the present embodiment.

도 8을 참조하면, 학습데이터에 대한 암기 없이 본 실시예의 모델이 이미지를 자연스럽게 생성할 수 있음을 눈으로 확인할 수 있다. 실험 결과는 ContraGAN이 Tiny ImageNet 및 ImageNet 데이터 세트에서 각각 최첨단 모델을 7.3% 및 7.7% 능가하는 것으로 나타났다.Referring to FIG. 8 , it can be visually confirmed that the model of this embodiment can naturally generate an image without memorizing the training data. The experimental results showed that ContraGAN outperformed the state-of-the-art model by 7.3% and 7.7% on Tiny ImageNet and ImageNet datasets, respectively.

또한, 대조 학습이 판별자의 과적합을 완화하는 데 도움이 된다는 것을 실험적으로 보여준다. 공정한 비교를 위해 PyTorch 라이브러리를 사용하여 12개의 최신 GAN을 구현하여 실험하였다.In addition, we experimentally show that control learning helps mitigate discriminator overfitting. For fair comparison, 12 latest GANs were implemented and tested using the PyTorch library.

이와 같이, 본 실시예에서는 동일한 클래스의 샘플 간의 상호 정보에 대한 하한을 최대화하기 위해 새로운 조건부 대조 손실을 이용한다. 즉, 대조적대신경망(ContraGAN)으로 지칭되는 본 실시예의 프레임워크는 훈련 샘플의 클래스 정보와 데이터-데이터 관계를 사용하여 이미지를 합성하도록 구성된다.As such, in this embodiment, a new conditional contrast loss is used to maximize the lower limit of mutual information between samples of the same class. That is, the framework of the present embodiment, referred to as ContraGAN, is configured to synthesize images using class information and data-data relationships of training samples.

또한, ContraGAN의 판별자는 주어진 샘플의 진위를 구별하고 동일한 클래스의 진짜 이미지 임베딩 간의 상호 정보를 최대화한다. 그리고 ContraGAN의 생성자는 이미지를 합성하여 판별자를 속이고 동일한 클래스 이전의 가짜 이미지의 상호 정보를 최대화하도록 구성된다.In addition, ContraGAN's discriminator differentiates the authenticity of a given sample and maximizes the mutual information between true image embeddings of the same class. And ContraGAN's constructor is configured to synthesize images to deceive the discriminator and maximize the mutual information of fake images before the same class.

공정한 비교를 위해 본 발명자는 동일한 조건에서 다양한 방법을 테스트하기 위해 기존의 9가지 최신 접근 방식(비교예들)을 구현하여 비교하였다. 실험 결과, 본 실시예의 ContraGAN이 네트워크 아키텍처 선택에 견고하고, 비교예들(CIFAR10, Tiny ImageNet 등)의 데이터 세트의 각각 최첨단 모델을 데이터 증가없이 3.7% 및 11.2% 능가하는 것을 확인하였다.For fair comparison, the present inventors implemented and compared the existing nine newest approaches (comparative examples) in order to test various methods under the same conditions. As a result of the experiment, it was confirmed that the ContraGAN of this example was robust in network architecture selection, and surpassed the state-of-the-art models of the data sets of the comparative examples (CIFAR10, Tiny ImageNet, etc.) by 3.7% and 11.2% without data increase.

도 9는 본 실시예에 따른 대조적대학습신경망의 스펙트럼 분석 결과를 나타낸다.9 shows the results of spectral analysis of the contrast-versus-learning neural network according to the present embodiment.

도 9의 1열과 2열의 왼쪽 첫번째 그래프들을 참조하면, 훈련 데이터셋을 이용하여 학습한 모델을 가지고 훈련 데이터셋과 검증 데이터셋을 테스트한 결과, 즉 판별자가 샘플의 특정 특징 데이터의 판별하여 샘플의 확실성 혹은 진짜 같음(authenticity)을 결정하여 샘플의 상세를 생성할 때의 분류 정확도(classification accuracy)를 비교한 결과, 비교예의 ProjGAN에서는 약 25000 내지 약 50000 초반의 스텝 구간에서 훈련 정확도(training accuracy)와 검증 정확도(validation accuracy)의 차이가 커서 과적합이 발생한 것으로 판단할 수 있고, 본 실시예의 ContraGAN에서는 약 45000 내지 약 70000 초반의 스텍 구간에서 과적합이 발생한 것으로 볼 수 있다.Referring to the first graphs on the left in columns 1 and 2 of FIG. 9 , as a result of testing the training dataset and the validation dataset with the model learned using the training dataset, that is, the discriminator determines the specific feature data of the sample and As a result of comparing classification accuracy when generating sample details by determining certainty or authenticity, in the comparative example ProjGAN, the training accuracy and It can be determined that overfitting has occurred due to a large difference in validation accuracy, and it can be seen that overfitting occurs in the stack section of about 45000 to about 70000 in the ContraGAN of this embodiment.

위의 스펙트럼 분석 결과에서 보여지듯이, 본 실시예의 contraGAN는 비교예의 ProjGAN에 비해 과적합에 강건함을 확인할 수 있다. 즉, 본 실시예의 대조적대신경망(contraGAN)은 과적합에 강건하며, 따라서 적대적 학습 무너짐이 베이스라인 모델(ProjGAN) 대비 늦게 일어난다는 것을 실험적으로 확인할 수 있다.As shown in the above spectrum analysis results, it can be confirmed that the contraGAN of this example is more robust to overfitting than the ProjGAN of the comparative example. That is, it can be experimentally confirmed that the contraGAN of this embodiment is robust to overfitting, and therefore the adversarial learning collapse occurs later than the baseline model (ProjGAN).

다시 말해서, 비교예의 베이스라인 모델(ProjGAN)의 결과(도 9의 위쪽 열의 왼쪽)와 본 실시예의 모델(ContraGAN, ours)의 결과(도 9의 아래쪽 열의 왼쪽)에서 볼 수 있듯이, 비교예 대비 본 실시예의 대조적대신경망의 훈련 및 검증 정확도(Training and validation accuracy)의 차이가 느리게 커짐을 확인할 수 있다.In other words, as can be seen from the results of the baseline model (ProjGAN) of the comparative example (left in the upper column of FIG. 9) and the results of the model (ContraGAN, ours) of this example (the left of the lower column of FIG. 9), It can be seen that the difference in training and validation accuracy of the contrasting neural network of the embodiment increases slowly.

또한, 도 9의 위쪽 열의 중간과 아래쪽 열의 중간 그래프들에서 볼 수 있듯이, 낮을 수록 성능이 좋을 것을 나타내는 프레쳇 인셉션 점수(FID: frechet inception distance)의 상승이 비교예 대비 본 실시예의 경우가 늦게 일어남을 확인할 수 있다. 이는 본 실시예의 모델이 과적합에 강건하기 때문에 생기는 현상이다.In addition, as can be seen from the middle graphs of the middle and lower columns of FIG. 9 , the increase in frechet inception distance (FID) indicating that the lower the performance is, the higher the frechet inception distance (FID) is later in the case of this example compared to the comparative example. wake up can be confirmed. This is a phenomenon that occurs because the model of this embodiment is robust to overfitting.

마지막으로, 도 9의 위쪽 열의 오른쪽과 아래쪽 열의 오른쪽 그래프들에서 볼 수 있듯이, 비교예의 베이스라인 모델(ProjGAN)과 본 실시예의 모델(ContraGAN)의 스펙트럼 분석 결과를 보면, 비교예 대비 본 실시예의 대조적대신경망의 스펙트럼들이 시각적으로 더욱 안정적이라는 것을 확인할 수 있다.Finally, as can be seen from the graphs on the right of the upper column and the right graph of the lower column of FIG. 9 , looking at the results of spectral analysis of the baseline model (ProjGAN) of the comparative example and the model (ContraGAN) of this example, the contrast between the comparative example and the present example It can be seen that the spectra of the Daeshin light network are visually more stable.

이와 같이, 본 실시예에 따른 대조적대신경망은 세밀한 데이터 세트에서 이미지 합성을 잘 하는 경향이 있고, 특히 이미지 특징 벡터(image feature vector)의 크기 제약으로 인해 학습 붕괴가 일어나지 않아 데이터 증강(data augmentation) 기반 정규화(regularizstion)와 잘 어울리며, FID 기준으로 큰 성능 향상을 보여준다(표 2 참조). As such, the contrastive neural network according to the present embodiment tends to perform image synthesis well in a detailed data set, and in particular, due to the size constraint of an image feature vector, learning collapse does not occur and data augmentation is performed. It goes well with the regularizstion based on it and shows a big performance improvement in terms of FID (see Table 2).

표 2를 참조하면, 본 실시예의 대조적대신경망은 서로 다른 구현예들(A, C, E, H)에서 볼 수 있듯이 대용량 사이즈(large batch size)에 대하여 장점이 있다.Referring to Table 2, the contrastive adversarial network of this embodiment has an advantage with respect to a large batch size, as can be seen in different implementations (A, C, E, H).

또한, 본 실시예의 대조적대신경망의 구현예들(A, C, E, H)의 2C 손실의 효과는, FID 점수 기준으로 기존 대비 상당히 우수하고 특히, 바닐라 네트워크(vanilla networks)(A, C)에서 각각 21.6% 및 11.2% 감소한 것으로 나타났다.In addition, the effect of the 2C loss of the implementations (A, C, E, H) of the contrastive neural network of this embodiment is significantly better than the existing ones based on the FID score, and in particular, the vanilla networks (A, C) showed a decrease of 21.6% and 11.2%, respectively.

또한, APS(augmented positive samples)의 적용 전후를 비교한 결과, APS를 적용한 구현예(F)가 APS를 적용하지 않은 구현예(E)보다 약 12.9% 더 많은 시간(Time)이 걸리는 것을 확인하였다. 이것은 각 클래스 임베딩이 클래스의 대표자가 될 수 있고 해당 이미지를 끌어오는 앵커 역할을 하기 때문이라고 추측된다. 또한, 클래스 임베딩이 없으면 샘플링 상태에 따라 미니 배치(minibatch)의 이미지가 수집되어 학습이 불안정해질 수 있기 때문이라고 판단된다.In addition, as a result of comparing before and after application of APS (augmented positive samples), it was confirmed that the embodiment (F) to which the APS was applied took about 12.9% more time (Time) than the embodiment (E) to which the APS was not applied. . It is speculated that this is because each class embedding can be a representative of the class and acts as an anchor to pull the corresponding image. In addition, it is determined that if there is no class embedding, images of a mini-batch may be collected depending on the sampling state, so that learning may become unstable.

또한, 일관성 정규화(consistency regularization, CR) 성능을 비교한 결과, 구현예들(A, E, G) 및 또 다른 구현예들(C, H, I)은 바닐라 네트워크, 2C 손실 및 CR의 조합이 단순히 바닐라 네트워크(A, C)의 결과, 및 바닐라 네트워크와 2C 손실(E, H)의 조합 결과 중 하나의 FID를 줄일 수 있음을 보여줍니다. 이러한 시너지 효과는 CR을 2C 손실과 함께 사용하고 바닐라 네트워크, 2C 손실 및 CR이 큰 마진으로 바닐라 네트워크와 CR(B, D)를 이기는 경우에 적용 가능하다.In addition, as a result of comparing the consistency regularization (CR) performance, the implementations (A, E, G) and other implementations (C, H, I) showed that the combination of vanilla network, 2C loss, and CR We simply show that we can reduce the FID of either the result of the vanilla network (A, C), and the result of the combination of the vanilla network and the 2C loss (E, H). This synergy is applicable when CR is used with 2C loss and vanilla network, 2C loss and CR wins vanilla network and CR(B, D) by a large margin.

도 10은 본 발명의 다른 실시예에 따른 대조적대신경망을 활용하는 이미지 생성 및 편집 장치에 대한 블록도이다. 도 11은 도 10의 이미지 생성 및 편집 장치에서 대조적대신경망을 이용하여 선택적으로 이미지를 학습하거나 이미지를 생성 혹은 편집하는 과정을 설명하기 위한 블록도이다. 도 12는 본 발명의 또 다른 실시예에 따른 대조적대신경망을 활용하는 이미지 생성 및 편집 방법에 대한 흐름도이다. 도 13은 도 12의 이미지 생성 및 편집 방법의 변형 실시예를 설명하기 위한 흐름도이다. 그리고 도 14는 도 12의 이미지 생성 및 편집 방법에서 훈련 모드를 설명하기 위한 흐름도이다.10 is a block diagram of an apparatus for generating and editing an image using a contrastive adversarial network according to another embodiment of the present invention. FIG. 11 is a block diagram for explaining a process of selectively learning an image or generating or editing an image using a contrastive neural network in the image generating and editing apparatus of FIG. 10 . 12 is a flowchart of an image creation and editing method using a contrastive adversarial network according to another embodiment of the present invention. 13 is a flowchart for explaining a modified embodiment of the image creation and editing method of FIG. 12 . And FIG. 14 is a flowchart for explaining a training mode in the image creation and editing method of FIG. 12 .

도 10을 참조하면, 이미지 생성 및 편집 장치(100)는, 적어도 하나의 프로세서(processor, 110) 및 적어도 하나의 프로세서(110)가 일련의 단계들을 수행하도록 지시하는 명령어들(instructions)을 저장하는 메모리(memory, 120)를 포함하거나 이러한 구성요소들을 포함하는 컴퓨팅 장치에 포함될 수 있다.Referring to FIG. 10 , the image generating and editing apparatus 100 stores instructions for instructing at least one processor 110 and at least one processor 110 to perform a series of steps. It may be included in a computing device that includes or includes a memory 120 .

또한, 이미지 생성 및 편집 장치(100)는, 유선, 무선 또는 유무선 네트워크를 통해 외부 장치와 신호 및 데이터를 주고받는 송수신 장치(transceiver, 130)를 포함할 수 있고, 입력 인터페이스, 출력 인터페이스 또는 입출력 인터페이스를 구비하는 인터페이스(140)와, 저장 장치(150)를 더 포함할 수 있다.In addition, the image generating and editing apparatus 100 may include a transceiver 130 that exchanges signals and data with an external device through a wired, wireless, or wired/wireless network, and may include an input interface, an output interface, or an input/output interface. It may further include an interface 140 having a , and a storage device 150 .

이미지 생성 및 편집 장치(100)에 포함되는 각각의 구성 요소들은 내부 네크워크 라인 혹은 버스(bus, 160)에 의해 서로 연결되어 신호 및 데이터를 주고받을 수 있다.Each of the components included in the image generating and editing apparatus 100 may be connected to each other by an internal network line or bus 160 to exchange signals and data.

프로세서(110)는 중앙 처리 장치(central processing unit, CPU), 그래픽 처리 장치(graphics processing unit, GPU), 또는 본 발명의 실시예들에 따른 방법들이 수행되는 전용의 프로세서를 포함할 수 있다.The processor 110 may include a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to embodiments of the present invention are performed.

메모리(120) 및 저장 장치(150) 각각은 휘발성 저장 매체 및 비휘발성 저장 매체 중에서 적어도 하나로 형성될 수 있다. 예를 들어, 메모리(120)는 읽기 전용 메모리(read only memory, ROM) 및 랜덤 액세스 메모리(random access memory, RAM) 중에서 적어도 어느 하나로 구성될 수 있다. 메모리(120) 및 저장 장치(150) 중 적어도 하나는 상기의 명령어들을 포함하는 하나 이상의 프로그램이 기록된 컴퓨터 판독가능 저장 매체를 포함하거나, 이 저장 매체에 포함될 수 있다.Each of the memory 120 and the storage device 150 may be formed of at least one of a volatile storage medium and a non-volatile storage medium. For example, the memory 120 may be configured as at least one of a read only memory (ROM) and a random access memory (RAM). At least one of the memory 120 and the storage device 150 may include a computer-readable storage medium in which one or more programs including the above instructions are recorded, or may be included in the storage medium.

인터페이스(140)는 디스플레이 장치, 카메라 등을 포함할 수 있다. 디스플레이 장치와 카메라를 포함하는 장치는, 휴대폰, 개인 디지털 어시스턴트(PDA), 스마트 패드 등의 휴대 단말, 노트북, 퍼스널 컴퓨터 등을 포함할 수 있다.The interface 140 may include a display device, a camera, and the like. The device including the display device and the camera may include a mobile phone, a personal digital assistant (PDA), a portable terminal such as a smart pad, a notebook computer, a personal computer, and the like.

또한, 전술한 프로세서(110)은 전자적으로 연결되는 메모리(120)에 저장되는 명령어들이나 이 명령어들에 의해 구현되는 프로그램이나 소프트웨어 모듈들을 탑재하고, 상기 명령어들에 의해, 본 발명의 방법을 구현하는 일련의 단계들을 수행할 수 있다. 즉, 프로세서(110)에는 대조적대신경망(170)이 탑재될 수 있으며, 대조적대신경망(170)은 상기의 명령어들이나 소프트웨어 모듈들에 의해 구성될 수 있다.In addition, the above-described processor 110 is equipped with instructions stored in the electronically connected memory 120 or programs or software modules implemented by these instructions, and by the instructions, to implement the method of the present invention A series of steps can be performed. That is, the processor 110 may be equipped with a contrastive neural network 170 , and the contrastive neural network 170 may be configured by the above instructions or software modules.

이러한 소프프웨어 모듈들은, 도 11에 도시한 바와 같이, 본 실시예의 대조적대신경망(170)을 구성하는 요소들로써, 생성자(172), 전처리부(174), 입력전환부(176) 및 판별자(178)를 포함할 수 있다. 또한, 출력 인터페이스 또는 출력부(140)를 더 포함할 수 있다.As shown in FIG. 11, these software modules are elements constituting the contrastive adversarial network 170 of the present embodiment, and include a generator 172, a preprocessor 174, an input conversion unit 176, and a discriminator. (178). In addition, it may further include an output interface or output unit 140 .

본 실시예는 입력전환부(176)를 통해 대조적대신경망의 훈련 모드에서 사용한 생성자(172)를 판별자(178)로부터 분리하고, 판별자(178)를 사용하여 샘플을 입력받고 판별자를 통해 샘플을 토대로 이미지를 생성하거나 편집하도록 구성된다.In this embodiment, the generator 172 used in the training mode of the contralateral neural network is separated from the discriminator 178 through the input conversion unit 176, a sample is input using the discriminator 178, and a sample is received through the discriminator It is configured to create or edit an image based on

생성자(172)와 전처리부(174)는 입력전환부(176)를 통해 선택적으로 판별자(178)에 연결될 수 있다. 생성자(172)와 판별자(178)에 대한 설명은 전술한 실시예의 설명과 중복되므로 생략하기로 한다.The generator 172 and the preprocessor 174 may be selectively connected to the discriminator 178 through the input conversion unit 176 . The description of the generator 172 and the discriminator 178 overlaps with the description of the above-described embodiment and thus will be omitted.

입력전환부(176)에 의해 훈련 모드에서 판별자(178)에 결합되어 있던 생성자(172)가 분리된 상태일 때, 전처리부(174)는 샘플을 입력 받고 판별자(178)에 입력할 수 있는 형태나 사이즈로 샘플을 변환할 수 있다.When the generator 172 coupled to the discriminator 178 in the training mode by the input conversion unit 176 is in a separated state, the preprocessor 174 may receive a sample and input it to the discriminator 178 Samples can be converted to a given shape or size.

판별자(178)는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성하고(S112), 공통 특징 데이터를 진찌 이미지의 클래스 임베딩의 차원으로 사영하고(S114), 공통 특징 데이터와 클래스 임베딩에 의한 조건부 대조 손실을 최소화하거나 미리 설정된 값에 수렴하도록 동작(S116)할 수 있다(도 12 참조).The discriminator 178 extracts common features from the input image to generate common feature data (S112), projects the common feature data into the dimension of class embedding of the truth image (S114), and uses the common feature data and class embedding An operation S116 may be performed to minimize the conditional contrast loss or to converge to a preset value (see FIG. 12 ).

다시 말해서, 판별자(178)는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성하고(S112), 공통 특징 데이터를 진찌 이미지의 클래스 임베딩의 차원으로 사영하고(S114), 공통 특징 데이터와 클래스 임베딩에 의한 조건부 대조 손실을 이용하여 판별자의 적대적 손실을 최소화하거나 미리 설정된 값에 수렴하도록 동작(S118)할 수 있다(도 13 참조).In other words, the discriminator 178 extracts common features from the input image to generate common feature data (S112), and projects the common feature data as the dimension of class embedding of the true image (S114), and the common feature data and class It is possible to minimize the hostile loss of the discriminator using the conditional contrast loss due to embedding or operate to converge to a preset value ( S118 ) (see FIG. 13 ).

전술한 판별자(178)는, 특징 추출 모델, 프로젝션 모델 및 최적화 모델을 포함할 수 있다. 여기서 특징 추출 모델은 특징 추출기에 대응하고, 프로젝션 모델은 프로젝션 함수에 대응하고, 최적화 모델은 2C 손실을 최적화하는 모델에 대응될 수 있다(도 4 참조).The above-described discriminator 178 may include a feature extraction model, a projection model, and an optimization model. Here, the feature extraction model may correspond to the feature extractor, the projection model may correspond to the projection function, and the optimization model may correspond to a model optimizing the 2C loss (see FIG. 4 ).

전술한 특징 추출 모델은 판별자에 입력되는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성할 수 있다. 프로젝션 모델은 공통 특징 데이터를 진짜 이미지의 클래스 임베딩의 차원으로 사영할 수 있다. 그리고 최적화 모델은 공통 특징 데이터 및 클래스 임베딩에 의한 조건부 대조 손실을 최소화하거나 조건부 대조 손실이 수렴하도록 동작할 수 있다.The above-described feature extraction model may generate common feature data by extracting common features from an input image input to the discriminator. The projection model can project common feature data into the dimension of class embeddings of the real image. In addition, the optimization model may operate to minimize the conditional contrast loss due to common feature data and class embedding or to converge the conditional contrast loss.

특히, 최적화 모델은 조건부 대조 손실을 이용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별하거나 가짜 이미지를 생성하거나 입력 이미지의 일부를 변경하도록 판별자를 훈련시킬 수 있다.In particular, the optimization model uses conditional contrast loss to determine the authenticity of an input image based on data-data relationships and data-class relationships between multiple image embeddings in the same dimension, to determine whether to create a fake image or change a part of the input image. can train you.

다시 도 11을 참조하면, 판별자(178)에서 생성되는 생성 이미지, 화풍 변환된 이미지 또는 아바타는 출력부(140)를 통해 출력될 수 있다. 출력부(140)는 판별자(178)의 2C 손실이 최소가 되거나 미리 설정된 기준값을 기준으로 최적화되었다고 판단될 때의 이미지를 출력할 수 있다. 이러한 출력부(140)는 최적화 모델의 신호에 따라 판별자(178)의 최종 생성 이미지를 출력할 수 있다.Referring back to FIG. 11 , the generated image generated by the discriminator 178 , the style-converted image, or the avatar may be output through the output unit 140 . The output unit 140 may output an image when it is determined that the 2C loss of the discriminator 178 is minimized or optimized based on a preset reference value. The output unit 140 may output the final generated image of the discriminator 178 according to the signal of the optimization model.

또한, 전술한 대조적대신경망의 학습을 위해, 도 14에 도시한 바와 같이, 생성자가 판별자에 연결된 상태에서, 생성자는 가짜 이미지를 생성할 수 있고, 생성자에 의해 생성된 가짜 이미지는 진짜 이미지와 함께 판별자에 전달될 수 있다(S110). 판별자는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성하고(S112), 공통 특징 데이터를 진찌 이미지의 클래스 임베딩의 차원으로 사영하고(S114), 공통 특징 데이터와 클래스 임베딩에 의한 조건부 대조 손실을 최소화하거나 미리 설정된 값에 수렴하도록 학습(S116)할 수 있다.In addition, as shown in FIG. 14 , for the learning of the aforementioned contrastive neural network, in a state in which the generator is connected to the discriminator, the generator can generate a fake image, and the fake image generated by the generator is the real image and the real image. It may be transmitted to the discriminator together (S110). The discriminator extracts common features from the input image to generate common feature data (S112), projects the common feature data into the dimension of class embedding of the true image (S114), and reduces conditional contrast loss due to common feature data and class embedding It can be minimized or learned to converge to a preset value (S116).

도 15는 본 발명의 또 다른 실시예에 따른 대조적대신경망을 활용하는 이미지 생성 및 편집 방법에서 신규 이미지를 생성하는 과정을 설명하기 위한 흐름도이다. 도 16은 본 발명의 또 다른 실시예에 따른 이미지 생성 및 편집 방법에서 화풍 변환된 이미지를 생성하는 과정을 설명하기 위한 흐름도이다. 그리고 도 17은 본 발명의 또 다른 실시예에 따른 이미지 생성 및 편집 방법에서 얼굴 이미지에 기초하여 아바타를 생성하는 과정을 설명하기 위한 흐름도이다.15 is a flowchart for explaining a process of generating a new image in an image creation and editing method using a contrastive adversarial network according to another embodiment of the present invention. 16 is a flowchart illustrating a process of generating a style-converted image in an image creation and editing method according to another embodiment of the present invention. And FIG. 17 is a flowchart for explaining a process of generating an avatar based on a face image in an image generating and editing method according to another embodiment of the present invention.

도 15를 참조하면, 대조적대신경망을 활용하는 이미지 생성 및 편집 방법은, 먼저 판별자에 의해 조건부 대조 손실을 사용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별할 수 있다(S142).Referring to FIG. 15 , an image generation and editing method utilizing a contrastive adversarial network is first input based on data-data relationship and data-class relationship between multiple image embeddings in the same dimension using conditional contrast loss by a discriminator. It is possible to determine the authenticity of the image (S142).

다음, 생성자에 의해 입력 이미지의 진위를 속이고 대조 손실이 적은 사실적 가짜 이미지를 생성할 수 있다(S144).Next, the generator may deceive the authenticity of the input image and generate a realistic fake image with low contrast loss (S144).

또한, 대조적대신경망을 활용하는 이미지 생성 및 편집 방법은, 도 16에 도시한 바와 같이, 판별자에 의해 조건부 대조 손실을 사용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별한 후(S142), 입력 이미지의 배경 영역만이 선택되었는지를 판단하고(S143의 예), 그에 따라 판별된 입력 이미지의 진위를 속이고 대비 손실이 적은 배경 이미지를 생성하고(S145), 입력 이미지와 배경 이미지를 합성하여 화풍 변환된 이미지를 출력하도록 구성될 수 있다.In addition, as shown in Fig. 16, the image creation and editing method utilizing the contrastive neural network is a data-data relationship and data-class relationship between multiple image embeddings in the same dimension using conditional contrast loss by a discriminator. After determining the authenticity of the input image based on (S142), it is determined whether only the background area of the input image is selected (Yes in S143), and accordingly, the authenticity of the determined input image is deceived and a background image with low contrast loss is generated And (S145), by synthesizing the input image and the background image may be configured to output a style-converted image.

또한, 대조적대신경망을 활용하는 이미지 생성 및 편집 방법은, 도 17에 도시한 바와 같이, 판별자에 의해 조건부 대조 손실을 사용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 얼굴 이미지의 진위를 판별한 후(S142), 판별된 얼굴 이미지의 진위를 속이고 대비 손실이 적은 가짜 얼굴 이미지를 생성하고(S146), 생성된 가까 얼굴 이미지를 조합한 아바타를 출력할 수 있다(S148).In addition, as shown in Fig. 17, the image creation and editing method utilizing the contrastive neural network is a data-data relationship and data-class relationship between multiple image embeddings in the same dimension using conditional contrast loss by a discriminator. After determining the authenticity of the face image based on (S142), it is possible to deceive the authenticity of the determined face image and generate a fake face image with low contrast loss (S146), and output an avatar combining the generated close face image. (S148).

전술한 실시예들에 따른 본 발명의 이미지 생성 및 편집 방법의 동작은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 프로그램 또는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산 방식으로 컴퓨터로 읽을 수 있 는 프로그램 또는 코드가 저장되고 실행될 수 있다.The operation of the image creation and editing method of the present invention according to the above-described embodiments can be implemented as a computer-readable program or code on a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored. In addition, the computer-readable recording medium may be distributed in networked computer systems to store and execute computer-readable programs or codes in a distributed manner.

또한, 컴퓨터가 읽을 수 있는 기록매체는 롬(rom), 램(ram), 플래시 메모리(flash memory) 등과 같이 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다. 프로그램 명령은 컴파일러(compiler)에 의해 만 들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터(interpreter) 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.In addition, the computer-readable recording medium may include a hardware device specially configured to store and execute program instructions, such as ROM, RAM, and flash memory. The program instructions may include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

본 발명의 일부 측면들은 장치의 문맥에서 설명되었으나, 그것은 상응하는 방법에 따른 설명 또한 나타낼 수 있고, 여기서 블록 또는 장치는 방법 단계 또는 방법 단계의 특징에 상응한다. 유사하게, 방법의 문맥에서 설명된 측면들은 또한 상응하는 블록 또는 아이템 또는 상응하는 장치의 특징으로 나타낼 수 있다. 방법 단계들의 몇몇 또는 전부는 예를 들어, 마이크로프로세서, 프로그램 가능한 컴퓨터 또는 전자 회로와 같은 하드웨어 장치에 의해(또는 이용하여) 수행될 수 있다. 몇 몇의 실시예에서, 가장 중요한 방법 단계들의 하나 이상은 이와 같은 장치에 의해 수행될 수 있다.Although some aspects of the invention have been described in the context of an apparatus, it may also represent a description according to a corresponding method, wherein a block or apparatus corresponds to a method step or feature of a method step. Similarly, aspects described in the context of a method may also represent a corresponding block or item or a corresponding device feature. Some or all of the method steps may be performed by (or using) a hardware device such as, for example, a microprocessor, a programmable computer, or an electronic circuit. In some embodiments, one or more of the most important method steps may be performed by such an apparatus.

실시예들에서, 프로그램 가능한 로직 장치(예를 들어, 필드 프로그래머블 게이트 어레이)가 여기서 설명된 방법들의 기능의 일부 또는 전부를 수행하기 위해 사용될 수 있다. 실시예들에서, 필드 프로그머블 게이트 어레이는 여기서 설명된 방 법들 중 하나를 수행하기 위한 마이크로프로세서와 함께 작동할 수 있다. 일반적으로, 방법들은 어떤 하드웨어 장치에 의해 수행되는 것이 바람직하다.In embodiments, a programmable logic device (eg, a field programmable gate array) may be used to perform some or all of the functions of the methods described herein. In embodiments, the field programmable gate array may work with a microprocessor to perform one of the methods described herein. In general, the methods are preferably performed by some hardware device.

이상 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to the preferred embodiment of the present invention, those skilled in the art can variously modify and change the present invention within the scope without departing from the spirit and scope of the present invention described in the claims below. You will understand that you can.

Claims

An image creation and editing method utilizing contrast learning and adversarial generative neural networks, comprising:
generating common feature data by extracting common features from an input image input to a discriminator;
projecting the common feature data into a dimension of a class embedding of a real image; and
Minimizing the conditional contrast loss due to the common feature data and the class embedding or operating so that the conditional contrast loss converges;
The operating step uses the conditional contrast loss to determine the authenticity of the input image based on the data-data relationship and the data-class relationship between multiple image embeddings in the same dimension, generate a fake image, or select a part of the input image. How to change, create and edit images.

The method according to claim 1,
The above operating steps. In the embedding space in which the data-data relationship and the data-class relationship are expressed, a vertex embedding or another embedding having a relatively high similarity to the target embedding is pulled to be located closer, and another embedding having a relatively low similarity is pushed further away. How to create and edit images.

The method according to claim 1,
The conditional contrast loss is expressed by Equation 1 below,
[Equation 1]

In Equation 1 above,

is the image,

is the image label,

is the label embedding model,

silver image embedding model,

is an image creation and editing method, each representing hyperparameters for hardness adjustment.

The method according to claim 1,
and deceiving the authenticity of the input image and generating a realistic fake image with low contrast loss.

5. The method according to claim 4,
and inputting the fake image and the real image as the input image to the discriminator.

An image generation and editing device utilizing contrast learning and adversarial generative neural networks, comprising:
display;
one or more cameras;
one or more processors; and
a memory storing one or more programs configured to be executed by the one or more processors;
The one or more programs include instructions for performing the method of any one of claims 1 to 5, image creation and editing apparatus.

An image generation and editing device utilizing contrast learning and adversarial generative neural networks, comprising:
A computer-readable storage medium in which one or more programs including instructions for performing the method of any one of claims 1 to 5 are recorded,
wherein the one or more programs are executed by one or more processors of a computing device having one or more cameras and displays.

An image creation and editing method utilizing contrast learning and adversarial generative neural networks,
determining the authenticity of an input image based on data-data relationships and data-class relationships between multiple image embeddings in the same dimension using conditional contrast loss; and
and deceiving the authenticity of the input image and generating a realistic fake image with low contrast loss.

9. The method of claim 8,
and the conditional contrast loss corresponds to projecting common feature data obtained by extracting common features from input data as a dimension of class embedding of real data.

10. The method of claim 9,
The conditional contrast loss pulls another embedding having a relatively high similarity to a vertex embedding or an object embedding in an embedding space in which the data-data relationship and the data-class relationship are expressed to be located closer to each other, and another embedding having a relatively low similarity is pulled. How to create and edit images, which is processed to be pushed further away.

An image generation and editing device using contrast learning and adversarial generative neural networks, comprising:
a feature extraction model for generating common feature data by extracting common features from an input image input to a discriminator;
a projection model for projecting the common feature data into a dimension of class embedding of a real image; and
an optimization model operable to minimize a conditional contrast loss due to the common feature data and the class embedding or to converge the conditional contrast loss;
includes,
The optimization model uses the conditional contrast loss to determine the authenticity of the input image based on the data-data relationship and data-class relationship between multiple image embeddings in the same dimension, generate a fake image, or change a part of the input image to output, image creation and editing devices.

12. The method of claim 11,
The optimization model is In the embedding space in which the data-data relationship and the data-class relationship are expressed, a vertex embedding or another embedding having a high similarity to the target embedding is pulled to be located closer, and another embedding having a low similarity is pushed to be positioned further away, Image creation and editing device.

12. The method of claim 11,
further comprising a constructor that deceives the authenticity of the input image and generates a realistic fake image with low contrast loss;
and the fake image and the real image are input to the discriminator as the input image.

a discriminator that uses conditional contrast loss to determine the authenticity of an input image based on data-data relationships and data-class relationships between multiple image embeddings in the same dimension; and
a generator that deceives the authenticity of the input image and generates a realistic image with low contrast loss;
The discriminator uses the conditional contrast loss to determine the authenticity of the input image based on data-data relationships and data-class relationships between multiple image embeddings in the same dimension, or generate a fake image based on the input image, or An image creation and editing device that changes parts of an image and outputs it.

15. The method of claim 14,
It further comprises a generator separation unit or an input conversion unit that separates the generator from the discriminator and connects an input unit to which a sample is inputted to the discriminator,
An image generating and editing apparatus for generating a new image or outputting an edited image obtained by editing the input image by processing a sample input to the discriminator as the input image through a neural network.