KR102477700B1

KR102477700B1 - Method and apparatus for generating and editing images using contrasitive learning and generative adversarial network

Info

Publication number: KR102477700B1
Application number: KR1020210076556A
Authority: KR
Inventors: 박재식; 강민국
Original assignee: 포항공과대학교 산학협력단
Priority date: 2021-03-16
Filing date: 2021-06-14
Publication date: 2022-12-14
Also published as: KR20220129433A

Abstract

대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 방법과 장치가 개시된다. 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 방법은, 판별자에 입력되는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성하는 단계, 공통 특징 데이터를 진짜 이미지의 클래스 임베딩의 차원으로 사영하는 단계, 및 공통 특징 데이터 및 클래스 임베딩에 의한 조건부 대조 손실을 최소화하거나 조건부 대조 손실이 수렴하도록 동작하는 단계를 포함하며, 여기서 동작하는 단계는 조건부 대조 손실을 이용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별하거나 가짜 이미지를 생성하거나 입력 이미지의 일부를 변경한다.A method and apparatus for generating and editing images using contrastive learning and adversarial generative neural networks are disclosed. An image creation and editing method using contrastive learning and adversarial generative neural networks includes extracting common features from an input image input to a discriminator to generate common feature data, and projecting the common feature data into the dimension of class embedding of a real image. and minimizing the conditional contrast loss due to common feature data and class embedding or operating so that the conditional contrast loss converges, wherein the operating step is performed between multiple image embeddings in the same dimension using the conditional contrast loss. Based on the data-data relationship and the data-class relationship, the authenticity of the input image is determined, a fake image is created, or part of the input image is changed.

Description

Method and apparatus for generating and editing images using contrast learning and adversarial generative neural networks

본 발명은 조건부 이미지 생성 기술에 관한 것으로, 보다 상세하게는, 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 방법과 장치에 관한 것이다.The present invention relates to a conditional image generation technique, and more particularly, to a method and apparatus for generating and editing an image utilizing contrast learning and an adversarial generative neural network.

적대적 생성 신경망(Generative adversarial network, GAN)은 이안 굿펠로우(Ian Goodfellow)가 실질적으로 처음 제안한 방법으로 생성자와 판별자를 적대적인 방법으로 학습시켜 데이터가 가진 패턴을 학습하고, 이를 통해 새로운 데이터를 생성하는 기술이다.Generative adversarial network (GAN) is a method first proposed by Ian Goodfellow, which learns the pattern of data by training the generator and discriminator in an adversarial way, and through this, creates new data. to be.

적대적 생성 신경망은 컴퓨터 비전 분야에서 이미지 합성 및 편집을 위한 기술로 많이 사용되고 있으며, 인터넷에서 크롤링한 이미지 데이터 세트를 학습하여 학습 과정에서 보지 못한 새로운 이미지를 생성하는데 사용하는 것도 가능하다.Adversarial generative neural networks are widely used as a technique for image synthesis and editing in the field of computer vision, and can be used to create new images that have not been seen in the learning process by learning image data sets crawled from the Internet.

적대적 생성 신경망을 이용하면 도 1에 나타낸 바와 같이 얼굴 이미지 합성과 같은 작업을 수행하거나, 도 2에 나타낸 바와 같이 화풍 변환과 같은 작업을 수행할 수 있다. 즉, 도 1은 적대적 생성 신경망으로 생성한 얼굴들로써, 학습 데이터에는 존재하지 않는 새롭게 생성된 얼굴들을 나타낸다. 그리고 도 2는 적대적 생성 신경망을 통해 화풍을 변환한 그림들을 나타낸다.Using the adversarial generative neural network, as shown in FIG. 1 , a task such as face image synthesis may be performed, or as shown in FIG. 2 , a task such as painting style conversion may be performed. That is, FIG. 1 shows newly generated faces that do not exist in the training data as faces generated by an adversarial generative neural network. And FIG. 2 shows pictures in which the painting style is converted through the adversarial generative neural network.

한편, 적대적 생성 신경망의 이미지 생성 성능과는 별개로, 적대적 생성 신경망을 학습시키기 위해서는 엄청난 양의 데이터와 안정적인 구조의 생성자, 판별자 및 섬세한 초기 설정값이 필요하다. 그래서 적대적 생성 신경망의 학습 불안정성을 완화하기 위해 많은 안정화 기법들이 제안되고 있다.On the other hand, apart from the image generation performance of adversarial generative neural networks, a huge amount of data and stable structured generators, discriminators, and delicate initial settings are required to train adversarial generative neural networks. Therefore, many stabilization techniques have been proposed to mitigate the learning instability of adversarial generative neural networks.

일례로 학습할 이미지의 정답 레이블 예를 들어 개, 고양이, 사슴, 고슴도치 등을 힌트로 주어 적대적 생성 신경망에서의 이미지 생성을 안정화하는 방법이 제안되었다. 그러나, 이러한 기존의 조건부 이미지 생성 작업은 여전히 판별자의 과적합(overfitting) 문제가 적지 않게 발생하고, 일관성 정규화를 위해 데이터를 추가로 제공해야 하는 등 여전해 개선해야 할 문제가 있다.As an example, a method for stabilizing image generation in an adversarial generative neural network has been proposed by giving the correct label of an image to be learned, for example, dog, cat, deer, hedgehog, etc. as a hint. However, these conventional conditional image generation tasks still have problems to be improved, such as overfitting problems of discriminators and additional data for consistency normalization.

이와 같이 기존의 문제들을 개선할 수 있는 새로운 적대적 생성 신경망을 활용한 조건부 이미지 생성 모델이 요구되고 있다.As such, a conditional image generation model using a new adversarial generative neural network capable of improving existing problems is required.

본 발명은 기존의 조건부 이미지 생성 모델의 문제점을 해결하기 위해 도출된 것으로, 본 발명의 목적은 새로운 모델인 대조 학습을 통한 적대적 생성 신경망을 활용하여 좀더 효과적으로 이미지를 생성하거나 편집할 수 있는 방법 및 장치를 제공하는데 있다.The present invention was derived to solve the problems of existing conditional image generation models, and an object of the present invention is a method and apparatus capable of generating or editing images more effectively by utilizing a new model, an adversarial generation neural network through contrast learning. is providing

본 발명의 다른 목적은 대조 학습을 통한 적대적 생성 신경망을 활용하여 다양한 종류의 적대적 생성 신경망 기반의 이미지 생성 및 편집 방법과 장치를 제공하는 데 있다.Another object of the present invention is to provide a method and apparatus for generating and editing images based on various kinds of adversarial neural networks by utilizing adversarial generative neural networks through collational learning.

상기 기술적 과제를 해결하기 위한 본 발명의 일 측면에 따른 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 방법은, 판별자에 입력되는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성하는 단계; 공통 특징 데이터를 진짜 이미지의 클래스 임베딩의 차원으로 사영하는 단계; 및 공통 특징 데이터 및 클래스 임베딩에 의한 조건부 대조 손실을 최소화하거나 조건부 대조 손실이 수렴하도록 동작하는 단계;를 포함하며, 상기 동작하는 단계는 조건부 대조 손실을 이용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별하거나 진짜 이미지로서 판별되는 가짜 이미지를 생성하거나 입력 이미지의 일부를 변경한다.An image generation and editing method using contrast learning and an adversarial generative neural network according to an aspect of the present invention for solving the above technical problem includes extracting common features from an input image input to a discriminator and generating common feature data. ; projecting the common feature data into the dimension of the class embedding of the real image; and minimizing the conditional contrast loss due to the common feature data and class embedding or operating so that the conditional contrast loss converges, wherein the operating step comprises data-between multiple image embeddings in the same dimension using the conditional contrast loss. Based on the data relationship and the data-class relationship, the authenticity of the input image is determined, or a fake image that is determined as a real image is generated or part of the input image is changed.

일실시예에서, 상기 동작하는 단계는. 데이터-데이터 관계 및 데이터-클래스 관계가 표현되는 임베딩 공간에서 정점 임베딩 또는 목적 임베딩과 유사도가 높은 다른 임베딩을 당겨 더 가까이 위치하도록 하고 유사도가 낮은 또 다른 임베딩을 밀어 더 멀리 위치하도록 동작한다.In one embodiment, the operating step is to. In the embedding space where the data-data relationship and the data-class relationship are expressed, another embedding with a high similarity to the vertex embedding or target embedding is pulled to be located closer, and another embedding with a low similarity is pushed to be located farther away.

일실시예에서, 이미지 생성 및 편집 방법은 입력 이미지의 진위를 속이고 대조 손실이 적은 사실적 가짜 이미지를 생성하는 단계를 더 포함할 수 있다.In one embodiment, the image creation and editing method may further include generating a realistic fake image that fakes authenticity of the input image and has low contrast loss.

일실시예에서, 이미지 생성 및 편집 방법은 입력 이미지로서 가짜 이미지를 판별자에 입력하거나 입력 이미지로서 가짜 이미지와 진짜 이미지를 판별자에 입력하는 단계를 더 포함할 수 있다.In one embodiment, the image generating and editing method may further include inputting a fake image as an input image into the discriminator or inputting a fake image and a real image into the discriminator as input images.

상기 기술적 과제를 해결하기 위한 본 발명의 다른 측면에 따른 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 장치는, 디스플레이; 하나 이상의 카메라; 하나 이상의 프로세서; 및 하나 이상의 프로세서에 의해 실행되도록 구성된 하나 이상의 프로그램을 저장하는 메모리;를 포함하며, 하나 이상의 프로그램은 전술한 실시예들 중 어느 하나의 방법을 수행하기 위한 명령어들을 포함한다.According to another aspect of the present invention for solving the above technical problem, an image generating and editing apparatus utilizing contrast learning and an adversarial generative neural network includes a display; one or more cameras; one or more processors; and a memory for storing one or more programs configured to be executed by one or more processors, wherein the one or more programs include instructions for performing any one method of the above-described embodiments.

상기 기술적 과제를 해결하기 위한 본 발명의 또 다른 측면에 따른 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 장치는, 전술한 실시예들 중 어느 하나의 방법을 수행하기 위한 명령어들을 포함하는 하나 이상의 프로그램이 기록된 컴퓨터 판독가능 저장 매체를 포함한다. 여기서, 하나 이상의 프로그램은 하나 이상의 카메라 및 디스플레이를 갖춘 컴퓨팅 장치의 하나 이상의 프로세서에 의해 실행될 수 있다.An image generating and editing apparatus utilizing contrast learning and adversarial generative neural networks according to another aspect of the present invention for solving the above technical problem is one comprising instructions for performing any one of the methods of the above-described embodiments. It includes a computer readable storage medium in which the above programs are recorded. Here, one or more programs may be executed by one or more processors of a computing device having one or more cameras and displays.

상기 기술적 과제를 해결하기 위한 본 발명의 또 다른 측면에 따른 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 방법은, 조건부 대조 손실을 사용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별하는 단계; 및 입력 이미지의 진위를 속이고 대조 손실이 적은 사실적 가짜 이미지를 생성하는 단계를 포함한다.An image generation and editing method using contrast learning and an adversarial generative neural network according to another aspect of the present invention for solving the above technical problem is a data-data relationship between multiple image embeddings in the same dimension using conditional contrast loss determining authenticity of the input image based on the data-class relationship; and deceiving the authenticity of the input image and generating a realistic fake image with low contrast loss.

일실시예에서, 상기 조건부 대조 손실은 입력 데이터에서 공통적 특징을 추출한 공통 특징 데이터를 진짜 데이터의 클래스 임베딩의 차원으로 사영한 것에 대응할 수 있다.In one embodiment, the conditional contrast loss may correspond to projection of common feature data obtained by extracting common features from input data as a dimension of class embedding of real data.

일실시예에서, 상기 조건부 대조 손실은 데이터-데이터 관계 및 데이터-클래스 관계가 표현되는 임베딩 공간에서 정점 임베딩 또는 목적 임베딩과 유사도가 높은 다른 임베딩을 당겨 더 가까이 위치하도록 하고 유사도가 낮은 또 다른 임베딩을 밀어 더 멀리 위치하도록 처리되거나 학습될 수 있다.In one embodiment, the conditional contrast loss pulls another embedding having a high similarity to the vertex embedding or target embedding to be placed closer to the vertex embedding or target embedding in the embedding space where the data-data relationship and the data-class relationship are expressed, and another embedding having a low similarity It can be processed or learned to push and position further away.

상기 기술적 과제를 해결하기 위한 본 발명의 또 다른 측면에 따른 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 장치는, 판별자에 입력되는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성하는 특징 추출 모델; 공통 특징 데이터를 진짜 이미지의 클래스 임베딩의 차원으로 사영하는 프로젝션 모델; 및 공통 특징 데이터 및 클래스 임베딩에 의한 조건부 대조 손실을 최소화하거나 조건부 대조 손실이 수렴하도록 동작하는 최적화 모델을 포함한다. 최적화 모델은 조건부 대조 손실을 이용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별하거나 진짜 이미지로 판별되는 가짜 이미지를 생성하거나 입력 이미지의 일부를 변경한 편집 이미지를 출력한다.An image generating and editing apparatus using contrast learning and adversarial generative neural networks according to another aspect of the present invention for solving the above technical problem is to extract common features from an input image input to a discriminator and generate common feature data feature extraction model; a projection model that projects common feature data into the dimensions of class embeddings of real images; and an optimization model that operates to minimize conditional contrast loss due to common feature data and class embedding or converge conditional contrast loss. The optimization model uses conditional contrast loss to determine the authenticity of an input image based on the data-data relationship and data-class relationship between multiple image embeddings in the same dimension, or to generate a fake image that is determined as a real image, or to convert a part of the input image Outputs the edited image that has been changed.

일실시예에서, 상기 최적화 모델은. 데이터-데이터 관계 및 데이터-클래스 관계가 표현되는 임베딩 공간에서 정점 임베딩 또는 목적 임베딩과 유사도가 높은 다른 임베딩을 당겨 더 가까이 위치하도록 하고 유사도가 낮은 또 다른 임베딩을 밀어 더 멀리 위치하도록 동작하거나 학습될 수 있다.In one embodiment, the optimization model is In the embedding space where the data-data relationship and the data-class relationship are expressed, it can be operated or learned to pull another embedding with high similarity to the vertex embedding or target embedding to be located closer and to push another embedding with low similarity to be located farther away. have.

일실시예에서, 이미지 생성 및 편집 장치는, 입력 이미지의 진위를 속이고 대조 손실이 적은 사실적 가짜 이미지를 생성하는 생성자를 더 포함할 수 있다. 가짜 이미지와 진짜 이미지는 입력 이미지로서 판별자에 입력될 수 있다.In one embodiment, the image creation and editing device may further include a generator for deceiving the authenticity of the input image and generating a realistic fake image with less contrast loss. A fake image and a real image may be input to the discriminator as an input image.

상기 기술적 과제를 해결하기 위한 본 발명의 또 다른 측면에 따른 대조 학습과 적대적 생성 신경망을 활용하는 이미지 생성 및 편집 장치는, 조건부 대조 손실을 사용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 상기 입력 이미지의 진위를 판별하는 판별자; 및 입력 이미지의 진위를 속이고 대조 손실이 적은 사실적 이미지를 생성하는 생성자를 포함한다. 판별자는 조건부 대조 손실을 이용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별하거나 입력 이미지에 기초하여 가짜 이미지를 생성하거나 입력 이미지의 일부를 변경하여 출력한다.An image generation and editing apparatus utilizing contrast learning and adversarial generative neural networks according to another aspect of the present invention for solving the above technical problem is a data-data relationship between multiple image embeddings in the same dimension using conditional contrast loss and a discriminator that determines authenticity of the input image based on data-class relationships; and a generator that fakes the authenticity of an input image and produces a realistic image with low contrast loss. The discriminator uses conditional contrast loss to determine the authenticity of an input image based on the data-data relationship and data-class relationship between multiple image embeddings in the same dimension, or to create a fake image based on the input image or to change part of the input image. and output

일실시예에서, 이미지 생성 및 편집 장치는 생성자를 판별자로부터 분리시키고 샘플이 입력되는 입력부를 판별자에 연결하는 생성자 분리부 또는 입력전환부를 더 포함하고, 입력 이미지로서 판별자에 입력되는 샘플을 신경망을 통해 처리하여 새로운 이미지를 생성하거나 입력 이미지를 편집한 편집 이미지를 출력할 수 있다.In one embodiment, the image generating and editing device further includes a generator separation unit or an input conversion unit that separates the generator from the discriminator and connects the input unit to which the sample is input to the discriminator, and receives the sample input to the discriminator as an input image. A new image can be generated by processing through a neural network, or an edited image obtained by editing an input image can be output.

본 발명에 의하면, 조건부 이미지 생성을 위한 새로운 대조적대신경망(Contrastive Generative Adversarial Networks, ContraGAN)을 제공한다. ContraGAN은 데이터-클래스 관계 및 데이터-데이터 관계 모두에 기초하는 조건부 대조 손실(2C 손실) 즉, 대조 학습을 이용하여 우수한 성능으로 그리고 효율적으로 이미지를 생성하거나 이미지를 편집할 수 있다. 실제로, ContraGAN은 Tiny ImageNet 및 ImageNet 데이터 세트에서 각각 7.3% 및 7.7%까지 최첨단 결과를 개선한다는 것을 실험적으로 확인하였다.According to the present invention, a novel Contrastive Generative Adversarial Networks (ContraGAN) for conditional image generation is provided. ContraGAN can create images or edit images efficiently and with excellent performance using conditional contrast loss (2C loss), that is, contrast learning, based on both data-class relationships and data-data relationships. Indeed, we experimentally confirmed that ContraGAN improves state-of-the-art results by 7.3% and 7.7% on the Tiny ImageNet and ImageNet datasets, respectively.

또한, 본 발명의 대조 학습을 이용하는 대조적대신경망에 의하면, 판별자의 과적합 문제를 완화하는 데 크게 기여할 수 있고, 일관성 정규화를 위한 데이터 증가없이 적대적 생성 신경망에서 유리한 결과를 얻을 수 있으며, 대용량의 데이터를 사용하여 일관성 정규화를 적용하는 경우에도 우수한 이미지 생성 혹은 이미지 편집 결과를 얻을 수 있다.In addition, according to the contrasting neural network using the contrasting learning of the present invention, it can greatly contribute to mitigating the overfitting problem of the discriminator, and can obtain advantageous results in adversarial generative neural networks without increasing data for consistency normalization, and large amount of data Even when consistency normalization is applied using , excellent image creation or image editing results can be obtained.

도 1은 기존의 적대적 생성 신경망(Generative Adversarial Networks, GAN)으로 생성한 얼굴로서, 학습데이터에는 존재하지 않는 새로운 얼굴을 생성한 결과에 대한 예시도이다.
도 2는 기존의 적대적 생성 신경망을 통해 생성한 화풍 변환 그림들에 대한 예시도이다.
도 3a는 비교예의 조건부 이미지 생성 모델의 일례로써, 추가 분류기 적대적 생성 신경망(auxiliary classifier GAN, ACGAN)의 주요 구조를 나타낸 도면이다.
도 3b는 비교예의 조건부 이미지 생성 모델의 다른 예로써, 사영 판별자 적대적 생성 신경망(GAN with projection discriminator, ProjGAN)을 각각 나타낸다.
도 4는 본 발명의 일실시예에 따른 대조적대신경망의 주요 구조를 나타낸 도면이다.
도 5는 도 4의 대조적대신경망의 대조 학습 알고리즘을 설명하기 위한 개략도이다.
도 6은 본 실시예에 따른 대조적대신경망의 메트릭 학습 결과를 비교예들과 함께 나타낸 도면이다.
도 7은 도 4의 대조적대신경망의 훈련 알고리즘에 대한 예시도이다.
도 8은 본 실시예에 따른 대조적대신경망을 사용하여 이미지넷 데이터 세트를 학습하고 생성한 결과를 예시한 도면이다.
도 9는 본 실시예에 따른 대조적대신경망의 여러 성능에 대한 스펙트럼 분석 결과를 비교예(ProjGAN)의 대응 결과와 대비하여 나타낸 그래프이다.
도 10은 본 발명의 다른 실시예에 따른 대조적대신경망을 활용하는 이미지 생성 및 편집 장치에 대한 블록도이다.
도 11은 도 10의 이미지 생성 및 편집 장치에서 대조적대신경망을 이용하여 선택적으로 이미지를 학습하거나 이미지를 생성 혹은 편집하는 과정을 설명하기 위한 블록도이다.
도 12는 본 발명의 또 다른 실시예에 따른 대조적대신경망을 활용하는 이미지 생성 및 편집 방법에 대한 흐름도이다.
도 13은 도 12의 이미지 생성 및 편집 방법의 변형 실시예를 설명하기 위한 흐름도이다.
도 14는 도 12의 이미지 생성 및 편집 방법에서 훈련 모드를 설명하기 위한 흐름도이다.
도 15는 본 발명의 또 다른 실시예에 따른 대조적대신경망을 활용하는 이미지 생성 및 편집 방법에서 신규 이미지를 생성하는 과정을 설명하기 위한 흐름도이다.
도 16은 본 발명의 또 다른 실시예에 따른 이미지 생성 및 편집 방법에서 화풍 변환된 이미지를 생성하는 과정을 설명하기 위한 흐름도이다.
도 17은 본 발명의 또 다른 실시예에 따른 이미지 생성 및 편집 방법에서 얼굴 이미지에 기초하여 아바타를 생성하는 과정을 설명하기 위한 흐름도이다.1 is a face generated by an existing Generative Adversarial Networks (GAN), and is an exemplary view of a result of generating a new face that does not exist in training data.
2 is an exemplary view of painting style conversion pictures generated through an existing adversarial generative neural network.
3A is a diagram showing the main structure of an auxiliary classifier adversarial generation neural network (ACGAN) as an example of a conditional image generation model of a comparative example.
FIG. 3B shows a GAN with projection discriminator (ProjGAN) as another example of a conditional image generation model of a comparative example.
4 is a diagram showing the main structure of a contrasting neural network according to an embodiment of the present invention.
FIG. 5 is a schematic diagram for explaining a collational learning algorithm of the contrasting neural network of FIG. 4 .
6 is a diagram showing metric learning results of the contrasting neural network according to the present embodiment together with comparative examples.
FIG. 7 is an exemplary view of a training algorithm of the contrasting neural network of FIG. 4 .
8 is a diagram illustrating a result of learning and generating an ImageNet data set using a contrasting neural network according to the present embodiment.
9 is a graph showing spectrum analysis results for various performances of the contrasting neural network according to the present embodiment compared with corresponding results of a comparative example (ProjGAN).
10 is a block diagram of an image generating and editing apparatus utilizing a contrasting neural network according to another embodiment of the present invention.
FIG. 11 is a block diagram illustrating a process of selectively learning an image or generating or editing an image using a contrasting neural network in the image generating and editing apparatus of FIG. 10 .
12 is a flowchart of a method for generating and editing an image using a contrasting neural network according to another embodiment of the present invention.
FIG. 13 is a flowchart for explaining a modified embodiment of the image creation and editing method of FIG. 12 .
FIG. 14 is a flowchart for explaining a training mode in the image creation and editing method of FIG. 12 .
15 is a flowchart illustrating a process of generating a new image in an image creation and editing method using a contrasting neural network according to another embodiment of the present invention.
16 is a flowchart for explaining a process of generating a style-converted image in an image creation and editing method according to another embodiment of the present invention.
17 is a flowchart illustrating a process of generating an avatar based on a face image in an image creation and editing method according to another embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. Like reference numerals have been used for like elements throughout the description of each figure.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는 데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. "및/또는"이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B may be used to describe various components, but the components should not be limited by the terms. These terms are only used for the purpose of distinguishing one component from another. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element, without departing from the scope of the present invention. The term “and/or” includes any combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다It is understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle. It should be. On the other hand, when an element is referred to as "directly connected" or "directly connected" to another element, it should be understood that no other element exists in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, the terms "include" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in the present application, they should not be interpreted in an ideal or excessively formal meaning. don't

본 발명의 구체적인 실시예를 설명하기에 앞서 본 발명의 창안 배경을 간략히 설명하면 다음과 같다.Prior to describing specific embodiments of the present invention, the invention background of the present invention will be briefly described as follows.

먼저 기존의 조건부 이미지 생성은 클래스 레이블 정보를 사용하여 다양한 이미지를 생성하는 작업이다. 많은 조건부 GAN(Generative Adversarial Networks)이 현실적인 결과를 보였지만 이러한 기존 방법은 이미지 임베딩과 해당 레이블 임베딩 즉, 데이터-클래스 관계(data-to-class relations) 간의 쌍 기반(pair-based) 관계를 조건 손실로 간주한다.First, conventional conditional image generation is a task of generating various images using class label information. Although many conditional generative adversarial networks (GANs) have shown realistic results, these existing methods use pair-based relationships between image embeddings and corresponding label embeddings, i.e., data-to-class relations, as loss of condition. consider

이에 본 실시예에서는 동일한 배치 즉, 데이터-데이터 관계(data-to-data relations)에서 여러 이미지 임베딩 간의 관계와 데이터-클래스 관계를 함께 고려한 조건부 대조 손실을 이용하는 대조적대신경망(ContraGAN)을 제안한다. 여기서, ContraGAN의 판별자(discriminator)는 주어진 샘플의 진위를 판별하고 훈련 이미지 간의 관계를 학습하기 위한 대조적 대상을 최소화한다. 그리고 생성자(generator)는 진위를 속이고 대조 손실이 적은 사실적인 이미지를 생성하려고 동작한다.Therefore, in this embodiment, ContraGAN using conditional contrast loss considering the relationship between multiple image embeddings and the data-class relationship in the same batch, that is, data-to-data relations, is proposed. Here, the discriminator of ContraGAN determines the authenticity of a given sample and minimizes contrast objects for learning the relationship between training images. And the generator works to fake authenticity and generate realistic images with low contrast loss.

전술한 본 발명의 구성을 기존의 조건부 GAN(conditional generative adversarial netwokrs)에 기초하여 좀더 구체적으로 설명하면 다음과 같다.The configuration of the above-described present invention will be described in more detail based on the existing conditional generative adversarial networks (GANs).

조건부 이미지 생성은 웹 크롤링된 이미지와 이미지의 카테고리 예컨대, 고양이, 개 등의 이미지 종류에 대한 정보를 활용하여 적대적 생성 신경망을 안정적으로 학습시키는 방법 중 하나이다.Conditional image generation is one of the methods for stably learning an adversarial generative neural network by utilizing web crawled images and information about image categories, such as cats and dogs.

가장 대표적인 방법은 미야토 등(Miyato et al.)에 의해 제안된 사영 판별자(projection discriminator)를 구비한 GAN인데 이는 주어진 이미지 카테고리를 판별자의 특징맵에 사영하여 적대적 학습을 수행하는 모델이다.The most representative method is a GAN with a projection discriminator proposed by Miyato et al., which is a model that performs adversarial learning by projecting a given image category onto a feature map of the discriminator.

하지만, 사영 판별자는 적대적 학습의 과적합 문제(overfitting)에 취약하며 모든 이미지를 서로 독립적이라고 가정하고 학습하기 때문에 이미지들 사이의 유의미한 상관관계를 학습하기 어려운 문제를 가지고 있다.However, the projective discriminator is vulnerable to the overfitting problem of adversarial learning, and since it learns assuming that all images are independent of each other, it has a difficult problem in learning meaningful correlations between images.

이에 본 발명에서는 위의 두 가지 문제 즉, 과적합 문제와 상관관계 학습이 어려운 문제를 해결하기 위해 새로운 프레임워크인 대조 학습을 통한 적대적 생성 신경망(간략히 '대조적대신경망')을 제안한다.Accordingly, in the present invention, in order to solve the above two problems, that is, the overfitting problem and the difficult problem of correlation learning, we propose a new framework, an adversarial generative neural network (abbreviated as 'contrasted adversarial network') through collational learning.

대조적대신경망은 자기 지도 대조 학습에서 널리 사용되는 대조 손실을 유의미하게 개선한 것으로 효율적인 학습을 위해 이미지 증강을 이미지 카테고리 특징으로 대체하고, 거짓양성(false positive)이 학습에 주는 효과를 없애기 위해 양성 덧셈 기술을 추가한 것이다.Contrast-versus-neural network significantly improves contrast loss, which is widely used in self-supervised contrast learning, and replaces image augmentation with image category features for efficient learning, and positive addition to eliminate the effect of false positives on learning. technology was added.

이하, 본 발명의 바람직한 실시예를, 첨부한 도면들을 참조하여 보다 상세하게 설명한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings.

도 3a는 비교예의 조건부 이미지 생성 모델의 일례로써, 추가 분류기 적대적 생성 신경망(auxiliary classifier GAN, ACGAN)의 주요 구조를 나타낸 도면이다. 도 3b는 비교예의 조건부 이미지 생성 모델의 다른 예로써, 사영 판별자 적대적 생성 신경망(GAN with projection discriminator, ProjGAN)을 각각 나타낸다. 그리고 도 4는 본 발명의 일실시예에 따른 대조적대신경망의 주요 구조를 나타낸 도면이다.3A is a diagram showing the main structure of an auxiliary classifier adversarial generation neural network (ACGAN) as an example of a conditional image generation model of a comparative example. FIG. 3B shows a GAN with projection discriminator (ProjGAN) as another example of a conditional image generation model of a comparative example. 4 is a diagram showing the main structure of a contrasting neural network according to an embodiment of the present invention.

기존의 조건부(conditional) GAN의 일반적인 접근은 생성자(generator)와 판별자(discriminator)에 레이블 정보(label information)를 입력하는 것이다. 예를 들어, 조건부 GAN의 일종인 ACGAN(auxiliary classifier GAN)은 판별자의 합성곱 레이어들 상에 추가 분류기를 부착하여 이미지의 클래스를 판별하도록 구성된다. 그리고, 조건부 GAN의 또 다른 형태인 프로젝션 GAN(ProjGAN)은 클래스 임베딩과 내적(inner product)을 활용하여 적대적 손실을 최적화하도록 구성된다.A common approach of existing conditional GANs is to input label information to generators and discriminators. For example, ACGAN (auxiliary classifier GAN), which is a type of conditional GAN, is configured to determine the class of an image by attaching additional classifiers on the convolutional layers of the discriminator. In addition, a projection GAN (ProjGAN), which is another type of conditional GAN, is configured to optimize adversarial loss by utilizing class embedding and inner product.

즉, 도 3(a)를 참조하면, 비교예의 ACGAN의 판별자(이하 메인 판별자)는 두 개의 분류기와 보조 분류기(auxiliary classifer)를 구비한다. 본 명세서에서는 보조 분류기와의 구별을 위해 메인 판별자의 두 개의 분류기를 특징 추출기(Dφ₁)와 판별자(Dφ₂)로 각각 지칭하기로 하고, 보조 분류기를 간략히 분류기(classifier)라고 지칭하기로 한다. 보조 분류기는 추가 분류기로도 지칭될 수 있다.That is, referring to FIG. 3(a), the ACGAN discriminator of the comparative example (hereinafter referred to as the main discriminator) includes two classifiers and an auxiliary classifier. In this specification, the two classifiers of the main discriminator are referred to as a feature extractor (Dφ ₁ ) and a discriminator (Dφ ₂ ), respectively, to distinguish them from the auxiliary classifier, and the auxiliary classifier is briefly referred to as a classifier. . An auxiliary classifier may also be referred to as an additional classifier.

특징 추출기(Dφ₁)는 입력 이미지(input image(x))의 데이터에서 공통된 특징을 추출하도록 기능한다. 특징 추출기(Dφ₁)에 의해 입력 이미지 데이터로부터 공통된 특징이 추출된 데이터(Dφ₁(x))(이하 공통 특징 데이터)는 판별자(Dφ₂)의 입력과 분류기(classifier)의 입력으로 각각 전달된다.The feature extractor Dφ ₁ functions to extract common features from the data of the input image (input image(x)). The data (Dφ ₁ (x)) from which common features are extracted from the input image data by the feature extractor (Dφ ₁ ) (hereinafter referred to as common feature data) is transmitted to the input of the discriminator (Dφ ₂ ) and the input of the classifier, respectively. do.

판별자(Dφ₂)는 공통 특징 데이터((Dφ₁(x))를 통해 입력 이미지(input image(x))가 진짜(real)인지 혹은 가짜(fake)인지를 판별한다. 판별자(Dφ₂)의 출력은 적대적 손실(adversarial loss)에 대응된다.The discriminator (Dφ ₂ ) determines whether the input image (input image (x)) is real or fake through common feature data ((Dφ ₁ (x)). The discriminator (Dφ ₂ ) corresponds to the adversarial loss.

분류기(classifier)는 공통 특징 데이터(Dφ₁(x))를 통해 입력 이미지의 범주를 분류한다. 분류기(classifier)의 출력은 원본 또는 진짜 이미지(input label (y))와의 차이에 기초하는 분류 손실(classification loss)에 대응된다.A classifier classifies a category of an input image through common feature data (Dφ ₁ (x)). The output of the classifier corresponds to a classification loss based on the difference from the original or genuine image (input label (y)).

ACGAN의 메인 판별자 또는 판별자(Dφ₂)는 진짜 이미지에 대응하는 입력 레이블(input label(y))을 기준으로 손실 값이 거짓(falt)이 되도록 하는 방향으로 학습된다.The main discriminator or discriminator (Dφ ₂ ) of ACGAN is learned in such a way that the loss value becomes false based on the input label (input label(y)) corresponding to the real image.

이러한 ACGAN으로 생성되는 데이터는 다른 분류기에 넣어도 범주 분류가 잘 수행되는 경향이 있다. 즉, ACGAN은 보조 분류기가 생성자를 가이드하여 클래스 분류가 잘되는 이미지들을 합성하거나 이치에 맞는 데이터를 생성하는데 적합하다.Even if the data generated by ACGAN is put into other classifiers, category classification tends to be performed well. In other words, ACGAN is suitable for synthesizing images that can be classified well or generating reasonable data by guiding a generator by an auxiliary classifier.

ACGAN의 생성자는 일반적인 조건부 GAN의 경우와 유사하게 클래스(class) 등의 레이블 정보와 노이즈(noise)를 합쳐 가짜 이미지 또는 가짜 데이터를 생성한다. 가짜 데이터는 입력 이미지를 포함한 데이터와 함께 판별자에 입력된다.Similar to the case of general conditional GAN, ACGAN generator creates fake image or fake data by combining label information such as class and noise. Fake data is input into the discriminator along with data including the input image.

ACGAN의 목적함수는 기존 조건부 GAN의 판별자의 목적함수와 동일할 수 있다. 다시 말해 ACGAN의 목적함수는 해당 데이터가 진짜인지 가짜인지를 판별해내는 것과 해당 데이터의 범주를 분류하는 것에 해당할 수 있다.The objective function of ACGAN may be the same as the objective function of the existing conditional GAN discriminator. In other words, the objective function of ACGAN may correspond to determining whether the data is real or fake and classifying the category of the data.

이러한 ACGAN은 이미지 클래스들 사이의 밀고 당기는 정보를 학습하는데 유용하지만, 클래스의 개수가 많아지면 학습이 오래 걸리고 학습이 어려운 문제가 있다. 이에 본 실시예에서는 이미지 데이터와 클래스 간의 관계 외에 이미지 데이터와 이미지 데이터 간의 관계를 함께 활용하는 대조 학습(contrastive learning)을 통해 세밀한 많은 데이터 세트에서도 이미지 합성을 잘 수행하는 신경망을 제공한다.Such an ACGAN is useful for learning push-and-pull information between image classes, but when the number of classes increases, learning takes a long time and learning is difficult. Accordingly, in this embodiment, a neural network that performs image synthesis well even in a large detailed data set through contrastive learning that utilizes the relationship between image data and image data together in addition to the relationship between image data and classes is provided.

또한, 도 3(b)에 도시한 바와 같이, 비교예의 프로젝션 GAN(ProjGAN)은 특징 추출기(Dφ₁)에서 추출된 공통 특징 데이터(Dφ₁(x))와, 진짜 이미지(input label (y))의 클래스 임베딩(class embedding(e))과의 내적(inner product)을 통해 ACGAN의 학습과 성능을 향상시키도록 구성된다.In addition, as shown in FIG. 3 (b), the projection GAN (ProjGAN) of the comparative example includes the common feature data (Dφ ₁ (x)) extracted from the feature extractor (Dφ ₁ ), and the real image (input label (y) ) is configured to improve the learning and performance of ACGAN through the inner product with the class embedding (e).

여기서 클래스 임베딩(class embedding)은 진짜 이미지(input label(y))에서 필요한 정보를 보존하면서 진짜 이미지의 패턴을 찾거나 벡터화하는 과정을 지칭한다. 클래스 임베딩은 진짜 이미지의 스케일이 너무 커서 값의 분포 범위가 넓어지면, 값을 정하기가 어렵기 때문에 L2 거리(distance)를 구한 후에 이것으로 나누어서 정규화(narmalization)하는 과정을 포함해야 한다.Here, class embedding refers to a process of finding or vectorizing a pattern of a real image while preserving necessary information in the real image (input label(y)). Class embedding should include the process of obtaining the L2 distance and then dividing it by this value to normalize it, since it is difficult to determine the value if the scale of the real image is too large and the distribution range of the value widens.

이와 같이, 프로젝션 GAN은 판별자의 과적합(overfitting) 문제가 적지 않게 발생하고, 일관성 정규화를 위해 데이터를 추가로 제공해야 하는 등의 문제가 있다.In this way, projection GANs have problems such as overfitting of the discriminator not infrequently occurring, and additional data must be provided for consistency normalization.

한편, 도 4에 도시한 바와 같이, 본 실시예에 따른 대조적대신경망(contrastive generative adversarial network, ContraGAN)은, ACGAN과 ProjGAN의 조건부 기능(conditioning functions)이 훈련 샘플의 데이터-클래스 관계만을 고려하는 쌍 기반 손실(pair-based losses)로써 해석될 수 있다는 점에 착안하여 ACGAN의 장점과 ProjGAN의 장점을 효과적으로 조합한다. 즉, ContraGAN은 ACGAN에서 입력 이미지의 정보를 활용할 수 있다는 장점과 ProjGAN에서 학습이 쉬운 장점을 함께 고려한 조건부 대조 손실 즉, 대조 학습을 활용한다.On the other hand, as shown in FIG. 4, in the contrastive generative adversarial network (ContraGAN) according to this embodiment, the conditional functions of ACGAN and ProjGAN consider only data-class relationships of training samples. Focusing on the fact that it can be interpreted as pair-based losses, it effectively combines the advantages of ACGAN and ProjGAN. In other words, ContraGAN uses the conditional contrast loss, that is, contrast learning, which takes into account the advantage of being able to utilize the information of the input image in ACGAN and the advantage of easy learning in ProjGAN.

다시 말해서, 본 실시예에 따른 대조적대신경망은 특징 추출기(Dφ₁, 10)에서 입력 이미지(input image (x))의 공통된 특징을 추출한 것에 대응하는 공통 특징 데이터(Dφ₁(x))와, 진짜 이미지(input label (y))를 클래스 임베딩 모델(class embedding (e), 40)을 통해 얻은 원본 특성 벡터를 이용하여 조건부 대조 손실(conditional contrastive loss, 2C loss)을 형성하고 이를 이용한다.In other words, the contrasting neural network according to the present embodiment includes common feature data (Dφ ₁ (x)) corresponding to the common feature extracted from the feature extractor (Dφ ₁ , 10) of the input image (input image (x)), A conditional contrastive loss (2C loss) is formed using the original feature vector obtained from the real image (input label (y)) through the class embedding model (class embedding (e), 40) and used.

이때, 조건부 대조 손실(2C loss)을 형성하는데 있어, 공통 특징 데이터(Dφ₁(x))의 차원은 원본 특성 벡터의 차원과 기본적으로 상이하다. 따라서 본 실시예에서는 프로젝션 모델(projection (h), 30)을 통해 공통 특징 데이터(Dφ₁(x))를 원본 특성 벡터에 투영시켜 동일한 차원으로 변환시킬 수 있다.At this time, in forming the conditional contrast loss (2C loss), the dimension of the common feature data (Dφ ₁ (x)) is fundamentally different from the dimension of the original feature vector. Accordingly, in this embodiment, the common feature data Dφ ₁ (x) may be projected onto the original feature vector through the projection model (projection (h), 30) and converted into the same dimension.

조건부 대조 손실(2C loss)은 데이터-데이터 관계 및 데이터-클래스 관계 모두를 고려한다. 데이터-데이터 관계를 활용하기 위해 자가지도 학습(self-supervised learning)이나 메트릭 학습(metric learning)에 사용되는 손실 함수를 적절하게 활용할 수 있다.The conditional contrast loss (2C loss) considers both the data-data relationship and the data-class relationship. In order to utilize the data-data relationship, a loss function used in self-supervised learning or metric learning can be appropriately utilized.

즉, 본 실시예의 ContraGAN에서는 판별자와 생성자에 메트릭 학습이나 자기지도 학습 목표를 추가하여 레이블에 따른 임베디드 이미지 특징들 간의 거리를 명시적으로 제어하도록 구현된다. 메트릭 학습 손실의 적절한 후보로는 대조 손실(contrastive loss), 3중 손실(triplet loss), 4중 손실(quadruplet loss) 및 N쌍 손실(N-pair loss) 등이 사용될 수 있다. 3중 손실(triplet loss)이나 4중 손실(quadruplet loss)의 처리에는 더 많은 훈련 복잡도가 요구되고 훈련 시간이 더 길어질 수 있으나, 필요에 따라 채택가능하다.That is, in the ContraGAN of this embodiment, it is implemented to explicitly control the distance between embedded image features according to the label by adding a metric learning or self-supervised learning target to the discriminator and generator. Suitable candidates for metric learning loss include contrastive loss, triplet loss, quadruplet loss, and N-pair loss. Processing of triplet loss or quadruplet loss requires more training complexity and may take longer training time, but it can be adopted as needed.

또한, 프록시 기반 손실(proxy-based losses)은 훈련 가능한 클래스 임베딩 벡터를 사용하여 마이닝 복잡성을 완화하지만 이러한 손실은 데이터-데이터 관계를 명시적으로 고려하지 않으나, 본 실시예에서는 조건부 대조 손실(2C loss)을 통해 이러한 문제를 해결한다. 즉, 조건부 대조 손실(2C loss)을 이용하는 대조적대신경망(ContraGAN)은, 입력 이미지의 데이터-클래스 관계와 데이터-데이터 간의 관계를 함께 고려함으로써 입력 이미지의 정보를 활용하면서 대용량 데이터에서도 학습이 쉬운 장점을 가진다.In addition, proxy-based losses alleviate mining complexity using trainable class embedding vectors, but these losses do not explicitly consider the data-data relationship, but in this embodiment, the conditional contrast loss (2C loss ) solves these problems. In other words, the ContraGAN using conditional contrast loss (2C loss) considers the data-class relationship of the input image and the data-data relationship together, so it is easy to learn even from large amounts of data while using the information of the input image. have

도 5는 도 4의 대조적대신경망의 대조 학습 알고리즘의 작동 원리를 설명하기 위한 개략도이다. 그리고 도 6은 본 실시예의 대조적대신경망의 메트릭 학습 손실과 비교예들의 메트릭 학습 손실들을 도식화하여 보여준다.FIG. 5 is a schematic diagram for explaining the operating principle of the collational learning algorithm of the contrasting neural network of FIG. 4 . 6 schematically shows the metric learning loss of the contrasting neural network of this embodiment and the metric learning loss of comparative examples.

조건부 GAN에서 모든 학습 손실은 통상 동일한 라벨이 있는 경우 샘플을 수집하고 그렇지 않으면 멀리 유지하도록 설계된다. 비교예의 ProjGAN의 손실 함수를 사용하는 경우, 참조(reference)가 실제 이미지일 때 참조와 해당 클래스 임베딩이 서로 가까워지지만, 그렇지 않은 경우에 멀리 밀려나게 된다.All training losses in conditional GANs are usually designed to collect samples if they have the same label and keep them away otherwise. In the case of using the loss function of ProjGAN of the comparative example, when the reference is a real image, the reference and the corresponding class embedding come closer to each other, but when they are not, they are pushed farther apart.

한편, 본 실시예의 대조적대신경망에서는 도 5에 도시한 바와 같이, 판별자가 동일한 클래스의 실제 이미지 임베딩 사이의 거리를 최소화하고, 그렇지 않으면 최대화하여 자체 업데이트를 수행하도록 하며, 클래스 임베딩이 조건부 대조(conditional contrastive, 2C) 손실을 통해 관련되도록 강제함으로써 판별자가 실제 이미지의 세밀한 표현을 학습할 수 있도록 하고, 생성자가 클래스 내 특성(intra-class characteristics) 및 실제 이미지의 고차 표현(higher-order representations)과 같은 판별자의 지식을 활용하여 더욱 사실적인 이미지를 생성할 수 있도록 한다.On the other hand, in the contrasting neural network of this embodiment, as shown in FIG. 5, the discriminator minimizes the distance between real image embeddings of the same class, otherwise maximizes it to perform self-update, and the class embedding is conditional contrasted. Contrastive, 2C) allows discriminators to learn fine-grained representations of real images by forcing them to be related via loss, and allows generators to learn more about intra-class characteristics and higher-order representations of real images. It utilizes the discriminator's knowledge to create more realistic images.

즉, 도 5에서 빗금 종류나 색상(적색, 청색, 노랑색)은 클래스 레이블을 나타내고, 모양은 역할을 나타낸다. 즉, 원(circle) 모양(C1)은 손실이 적용되는 원본 또는 참조 이미지를 임베딩한 것을, 사각(square) 모양(R1, R2, R3, R4, R5, R6, R7)은 입력 이미지를 임베딩한 것을, 별(star) 모양(S1, S2, S3)은 클래스 레이블을 임베딩한 것을 각각 나타낸다. 그리고 실선 또는 빨간 색상의 선 두께와 점선 또는 파란 색상의 선 두께는 기재된 순서대로 당기는 힘의 강도와 미는 힘의 강도를 각각 나타낸다.That is, in FIG. 5, hatched types or colors (red, blue, yellow) represent class labels, and shapes represent roles. That is, the circle shape (C1) embeds the original or reference image to which loss is applied, and the square shape (R1, R2, R3, R4, R5, R6, R7) embeds the input image. That is, the star shapes (S1, S2, and S3) indicate embedding of class labels, respectively. In addition, the line thickness of solid line or red color and line thickness of dotted line or blue color indicate the strength of pulling force and strength of pushing force, respectively, in the order described.

본 실시예의 2C 손실(f)의 또 다른 경우와 비교예의 메트릭 학습 손실(a, b, c) 및 또 다른 비교예의 조건부 GAN(d, e)의 비교을 위해 이를 도식화하여 나타내면 도 6과 같다.For comparison of another case of the 2C loss (f) of this embodiment and the metric learning loss (a, b, c) of the comparative example and the conditional GAN (d, e) of another comparative example, it is shown in FIG. 6.

도 6에서 메트릭 학습 손실의 (a)는 트리플렛(Triplet), (b)는 P-NCA(proxy-neighborhood component analysis) 및 (c)는 NT-Xent(normalized temperature-scaled cross entropy)를 각각 나타내고, 다른 비교예의 조건부 GAN의 (d)는 ACGAN을, (e)는 ProjGAN을 각각 나타낸다.In FIG. 6, (a) of the metric learning loss represents a triplet, (b) represents proxy-neighborhood component analysis (P-NCA), and (c) represents normalized temperature-scaled cross entropy (NT-Xent), respectively. Of conditional GANs of other comparative examples, (d) represents ACGAN and (e) represents ProjGAN, respectively.

또한 도 6에서 색상은 클래스 레이블을 나타내고, 모양은 역할을 나타낸다. 즉, 원(circle) 모양은 원본 또는 참조 이미지를 임베딩한 것을, 사각(square) 모양은 입력 이미지를 임베딩한 것을, 다이아몬드(diamond) 모양은 증강된 이미지를 임베딩한 것을, 별(star) 모양은 클래스 레이블을 임베딩한 것을, 삼각형(triangle) 모양은 클래스 레이블을 원-핫 인코딩(one-hot encoding)한 것을 각각 나타낸다. 그리고 실선 또는 빨간 색상의 선 두께와 점선 또는 파란 색상의 선 두께는 기재된 순서대로 당기는 힘의 강도와 미는 힘의 강도를 각각 나타낸다.Also, in FIG. 6, colors represent class labels, and shapes represent roles. That is, a circle shape embedding an original or reference image, a square shape embedding an input image, a diamond shape embedding an augmented image, and a star shape embedding an input image. The embedding of the class label and the triangle shape represent one-hot encoding of the class label, respectively. In addition, the line thickness of solid line or red color and line thickness of dotted line or blue color indicate the strength of pulling force and strength of pushing force, respectively, in the order described.

도 6에 도시한 바와 같이, 비교예의 ACGAN 및 ProjGAN과 달리, 본 실시예의 이미지 생성 및 합성에 사용되는 2C 손실은 입력 이미지들이나 진짜 이미지를 포함한 훈련 샘플들 간의 데이터-클래스(data-to-class) 관계 및 데이터-데이터(data-to-data) 관계를 고려할 수 있다.As shown in FIG. 6, unlike ACGAN and ProjGAN of comparative examples, the 2C loss used for image generation and synthesis in this embodiment is data-to-class between input images or training samples including real images. Relationships and data-to-data relationships can be considered.

전술한 2C 손실을 도입하기 전에, NT-Xent 손실을 가지고 본 실시예의 대응 부분을 표현해 볼 수 있다. 즉, 훈련 이미지들에서 랜덤하게 샘플링된 미니배치와 대응 클래스 레이블들의 집합이 있다고 가정하고, 딥 신경망 인코더와 새로운 단위의 초구(hypersphere) 상에 임베드되는 프로젝션 레이어를 정의할 수 있다. NT-Xent 손실은 비지도 학습을 위한 것으로, 원본 이미지와 증대 이미지 사이의 데이터-데이터 관계를 고려하기 위해 증대 영상을 양성 샘플로 간주할 수 있다.Before introducing the aforementioned 2C loss, the corresponding part of this embodiment can be expressed with NT-Xent loss. That is, assuming that there is a set of randomly sampled mini-batches and corresponding class labels from training images, a deep neural network encoder and a projection layer embedded on a hypersphere of a new unit can be defined. The NT-Xent loss is for unsupervised learning, and the augmented image can be considered as a positive sample to consider the data-data relationship between the original image and the augmented image.

하지만, 2C 손실과 비교할 때, NT-Xent는 레이블 정보에서 감독이 없기 때문에 동일한 클래스의 이미지 임베딩을 거의 수집하기 어렵다. 게다가, NT-Xent 손실에는 추가 데이터 증가와 추가 순전파 및 추가 역전파가 필요하다.However, compared to 2C loss, NT-Xent hardly collects image embeddings of the same class because of the lack of supervision in the label information. In addition, NT-Xent loss requires additional data increments and additional forward and back propagation.

이와 같이, 본 실시예에서 사용하는 2C 손실은 라벨 정보에 대한 약한 감독을 활용하고, 추가 데이터 증가와 추가 순전파 및 추가 역전파가 필요하지 않으므로, NT-Xext 손실이 있는 비교예의 모델보다 학습 시간이 최소 몇 배 더 짧아질 수 있다.In this way, the 2C loss used in this embodiment utilizes weak supervision for label information, and does not require additional data increment, additional forward propagation, and additional back propagation, so the learning time is higher than the comparative example model with NT-Xext loss. can be at least several orders of magnitude shorter.

다시 말해, 비교예들(도 7의 (a) 내지 (e) 참조)과 대비할 때, 본 실시예(f)의 2C 손실은 데이터-데이터 관계 및 데이터-클래스 관계를 고려하고 데이터 증가없이 전체 정보를 추론함으로써 관련 객체들 간의 유사도(similarity)를 효과적으로 측정하여 나타낼 수 있다.In other words, when compared with comparative examples (see (a) to (e) of FIG. 7 ), the 2C loss of the present embodiment (f) considers the data-data relationship and the data-class relationship, and the entire information without data increase. By inferring, the similarity between related objects can be effectively measured and represented.

도 7은 도 4의 대조적대신경망의 훈련 알고리즘에 대한 예시도이다.FIG. 7 is an exemplary view of a training algorithm of the contrasting neural network of FIG. 4 .

도 7을 참조하면, 본 실시예의 대조적대신경망(ContraGAN)의 훈련을 위한 알고리즘 1(Algorithm 1)의 4행(4:) 및 5행(5:)에 기재된 바와 같이, 판별자 훈련 단계에서 m개의 실제 이미지와 생성자 훈련 단계에서 생성된 m개의 중간 이미지를 사용하여 2C 손실을 계산하는 것을 알 수 있다. 중간 이미지는 공통 특징 이미지 혹은 공통 특징 데이터에 대응된다.Referring to FIG. 7, as described in lines 4 (4:) and lines 5 (5:) of Algorithm 1 for training of ContraGAN of this embodiment, m in the discriminator training step It can be seen that the 2C loss is calculated using the real images and the m intermediate images generated in the constructor training step. The intermediate image corresponds to a common feature image or common feature data.

알고리즘 1에 나타낸 바와 같이, 판별자는 동일한 클래스의 실제 이미지 임베딩 사이의 거리를 최소화하고, 그렇지 않으면 최대화하여 자체 업데이트를 수행한다.As shown in Algorithm 1, the discriminator performs self-update by minimizing, otherwise maximizing, the distance between real image embeddings of the same class.

더욱이, 본 실시예에서는 클래스 임베딩이 2C 손실을 통해 관련되도록 강제함으로써 판별자는 실제 이미지의 세밀한 표현을 학습할 수 있다. 마찬가지로, 생성자는 클래스 내 특성(intra-class characteristics) 및 실제 이미지의 고차 표현(higher-order representations)과 같은 판별자의 지식을 활용하여 더욱 사실적인 이미지를 생성할 수 있다.Moreover, in this embodiment, by forcing the class embeddings to be related via 2C loss, the discriminator can learn a detailed representation of the real image. Similarly, generators can leverage knowledge of discriminators, such as intra-class characteristics and higher-order representations of real images, to create more realistic images.

전술한 조건부 대조 손실(2C 손실)을 사용하는 본 실시예의 이미지 생성 및 합성 방법 또는 그 프레임워크(즉, ContraGAN)에서는 조건부 GAN의 일반적인 훈련 절차와 유사하게, 적대 손실을 계산하는 판별자를 훈련하는 판별자 훈련 단계와 생성자 훈련 단계를 갖는다. ContraGAN은 위의 훈련들을 기반으로 진짜 이미지 또는 가짜 이미지 세트를 사용하여 2C 손실을 추가로 계산하도록 구성된다.In the image generation and synthesis method of this embodiment or its framework (ie, ContraGAN) using the above-described conditional contrast loss (2C loss), the discriminator that trains the discriminator that calculates the adversarial loss is similar to the general training procedure of conditional GAN. It has a child training phase and a constructor training phase. Based on the above trainings, ContraGAN is configured to further compute the 2C loss using a set of real or fake images.

전술한 조건부 대조 손실을 이용하는 대조적대신경망은, 기존의 방법들과는 달리, 이미지들 사이의 공통점 및 차이점을 학습하고 각 카테고리의 세세한 특징을 잘 파악하여 이미지를 생성할 수 있으며, 손실 함수가 셀 수 없을 정도로 많은 이미지 쌍에 의해서 결정되기 때문에 과적합 문제에도 강건한 특징이 있다.Contrast-versus-neural networks using the above-described conditional contrast loss, unlike existing methods, can learn similarities and differences between images and generate images by recognizing detailed features of each category, and the loss function is innumerable. The overfitting problem is also robust because it is determined by a large number of image pairs.

아래의 수학식 1은 본 실시예에서 제안하는 대조적대신경망의 손실함수를 수식으로 표현한 것이다.Equation 1 below is a mathematical expression of the loss function of the contrasting neural network proposed in this embodiment.

수학식 1에서,

는 이미지,

는 이미지 레이블,

는 레이블 임베딩 모델,

은 이미지 임베딩 모델,

는 하드니스 조절을 위한 하이퍼매개변수를 각각 나타낸다. 또한,

는 레퍼런스 샘플(reference sample, )에 대한 이미지 임베딩 함수를,

는 클래스 임베딩 함수를,

는 클래스 임베딩을,

는 데이터-클래스 관계를,

는 데이터-데이터 관계를 각각 나타낸다.In Equation 1,

image,

is the image label,

is the label embedding model,

silver image embedding model,

represent hyperparameters for adjusting the hardness, respectively. In addition,

Is the image embedding function for the reference sample,

is the class embedding function,

is the class embedding,

is the data-class relationship,

represents a data-data relationship, respectively.

본 실시예에서 대조적대신경망의 손실함수는, 깊은 신경망 인코더 S(x)와 반지름이 1인 단위 초구(hypersphere, h에 임베딩된 프로젝션 레이어를 정의한 후, 깊은 신경망 인코더(S)를 단위 초구 함수(h())에 반영한 합성 결과을 이용하여 초구에 데이터 공간(data space)을 매핑하도록 구성될 수 있다.In this embodiment, the loss function of the contrasting neural network is defined by the deep neural network encoder S(x) and the projection layer embedded in the unit hypersphere (h) having a radius of 1, and then the deep neural network encoder (S) is the unit hypersphere function ( It can be configured to map the data space to the first phrase using the synthesized result reflected in h()).

여기서, 대조적대신경망은 판별자의 제1 판별 신경망인 특징 추출기(Dφ₁)의 일부분을 완전 결합 레이어(fully connected layer) 앞에서 인코더 네트워크(S)로 사용하고, Φ로 매개변수화된(marameterized) 다중 레이어 퍼셉트론(multi-layer perceptrons)을 사영 헤드(projection head)(h)로 사용할 수 있다.Here, the contrasting neural network uses a part of the feature extractor (Dφ ₁ ), which is the first discriminant neural network of the discriminator, as an encoder network (S) in front of a fully connected layer, and multi-layer parameterized by Φ. Multi-layer perceptrons can be used as projection heads (h).

본 실시예에 따른 학습 알고리즘은 알고리즘을 공평하고 철저하게 분석하기 위해 적대적 생성 신경망 학습을 위한 통합된 소프트웨어 라이브러리를 구축하여 실험하였다.The learning algorithm according to this embodiment was tested by constructing an integrated software library for adversarial generative neural network learning in order to fairly and thoroughly analyze the algorithm.

본 실시예의 소프트웨어는 총 18개의 적대적 생성 신경망, 4가지의 학습 기술(병렬분산처리, 혼합 정밀도 학습, 배치 동기화, 배치 통계량 누적), 4가지 분석 기법(이미지 시각화, 최근접 이웃 분석, 선형 보간 분석, 주파수 분석), 마지막으로 4가지 평가 측도(인셉션 점수, 프레쳇 인셉션 거리, 정밀도, 제현율)에서 선택되는 임의의 구성을 가질 수 있다. The software of this embodiment includes a total of 18 adversarial generative neural networks, 4 learning techniques (parallel distributed processing, mixed precision learning, batch synchronization, batch statistic accumulation), 4 analysis techniques (image visualization, nearest neighbor analysis, and linear interpolation analysis). , frequency analysis), and finally, it can have an arbitrary configuration selected from four evaluation measures (inception score, Prechet inception distance, precision, and recall).

18개의 적대적 생성 신경망은 DCGAN, LSGAN, GGAN, WGAN-WC, WGAN-GP, WGAN-DRA, ACGAN, ProjGAN, SNGAN, SAGAN, BigGAN, BigGAN-Deep, CRGAN, ICRGAN, LOGAN, DiffAugGAN, ADAGAN, ContraGAN을 포함하나, 이에 한정되지는 않는다.The 18 adversarial generative networks were DCGAN, LSGAN, GGAN, WGAN-WC, WGAN-GP, WGAN-DRA, ACGAN, ProjGAN, SNGAN, SAGAN, BigGAN, BigGAN-Deep, CRGAN, ICRGAN, LOGAN, DiffAugGAN, ADAGAN, and ContraGAN. Including, but not limited to.

본 실시예의 대조적대신경망의 성능을 비교예의 모델들과 대비하고, 대조적대신경망이 과적합에 강건하다는 실험결과를 나타내면 다음의 표 1과 같다.The performance of the contrasting neural network of this embodiment is compared with the models of the comparative examples, and the experimental results showing that the contrasting neural network is robust against overfitting are shown in Table 1 below.

표 1은 CIFAR10 데이터 이미지 생성 실험 결과, 프레쳇 인셉션 거리를 제외한 모든 측도는 값이 높으면 성능이 좋다는 것을 나타낸다. 볼드체로 표기한 부분은 본 실시예의 대조 학습을 활용하는 대조적대신경망(ContraGAN)와 그 변형예(CRGAN, ICRGAN, DiffAugGAN)의 결과이다.Table 1 shows the results of the CIFAR10 data image generation experiment. All measures, except for the Pretchet inception distance, show that the performance is good when the value is high. The parts marked in bold are the results of the contrasting neural network (ContraGAN) and its modifications (CRGAN, ICRGAN, DiffAugGAN) utilizing the contrast learning of this embodiment.

다음으로, 이미지넷 데이터세트에 대해 이미지 생성 실험을 수행하였고, 그 결과는 아래의 도 8과 같다. 도 8은 본 실시예에 따른 방법에 따라 이미지넷 데이터세트에 대해 학습한 후, 생성한 결과를 나타낸다.Next, an image generation experiment was performed on the ImageNet dataset, and the results are shown in FIG. 8 below. 8 shows results generated after learning about the ImageNet dataset according to the method according to the present embodiment.

도 8을 참조하면, 학습데이터에 대한 암기 없이 본 실시예의 모델이 이미지를 자연스럽게 생성할 수 있음을 눈으로 확인할 수 있다. 실험 결과는 ContraGAN이 Tiny ImageNet 및 ImageNet 데이터 세트에서 각각 최첨단 모델을 7.3% 및 7.7% 능가하는 것으로 나타났다.Referring to FIG. 8 , it can be visually confirmed that the model of this embodiment can naturally generate an image without memorizing learning data. Experimental results show that ContraGAN outperforms the state-of-the-art model by 7.3% and 7.7% on the Tiny ImageNet and ImageNet datasets, respectively.

또한, 대조 학습이 판별자의 과적합을 완화하는 데 도움이 된다는 것을 실험적으로 보여준다. 공정한 비교를 위해 PyTorch 라이브러리를 사용하여 12개의 최신 GAN을 구현하여 실험하였다.In addition, we experimentally show that contrastive learning helps to mitigate discriminator overfitting. For a fair comparison, we implemented and tested 12 state-of-the-art GANs using the PyTorch library.

이와 같이, 본 실시예에서는 동일한 클래스의 샘플 간의 상호 정보에 대한 하한을 최대화하기 위해 새로운 조건부 대조 손실을 이용한다. 즉, 대조적대신경망(ContraGAN)으로 지칭되는 본 실시예의 프레임워크는 훈련 샘플의 클래스 정보와 데이터-데이터 관계를 사용하여 이미지를 합성하도록 구성된다.In this way, in this embodiment, a new conditional contrast loss is used to maximize the lower limit for mutual information between samples of the same class. That is, the framework of this embodiment, referred to as ContraGAN, is configured to synthesize images using class information and data-data relationships of training samples.

또한, ContraGAN의 판별자는 주어진 샘플의 진위를 구별하고 동일한 클래스의 진짜 이미지 임베딩 간의 상호 정보를 최대화한다. 그리고 ContraGAN의 생성자는 이미지를 합성하여 판별자를 속이고 동일한 클래스 이전의 가짜 이미지의 상호 정보를 최대화하도록 구성된다.In addition, ContraGAN's discriminator discriminates the authenticity of a given sample and maximizes the mutual information between authentic image embeddings of the same class. And ContraGAN's constructor is configured to synthesize images to fool the discriminator and maximize the mutual information of fake images before the same class.

공정한 비교를 위해 본 발명자는 동일한 조건에서 다양한 방법을 테스트하기 위해 기존의 9가지 최신 접근 방식(비교예들)을 구현하여 비교하였다. 실험 결과, 본 실시예의 ContraGAN이 네트워크 아키텍처 선택에 견고하고, 비교예들(CIFAR10, Tiny ImageNet 등)의 데이터 세트의 각각 최첨단 모델을 데이터 증가없이 3.7% 및 11.2% 능가하는 것을 확인하였다.For a fair comparison, the present inventors implemented and compared nine existing state-of-the-art approaches (comparative examples) to test various methods under the same conditions. As a result of the experiment, it was confirmed that the ContraGAN of this embodiment is robust in network architecture selection and outperforms the state-of-the-art models of the data sets of comparative examples (CIFAR10, Tiny ImageNet, etc.) by 3.7% and 11.2%, respectively, without data increase.

도 9는 본 실시예에 따른 대조적대학습신경망의 스펙트럼 분석 결과를 나타낸다.9 shows the result of spectrum analysis of the contrasting learning neural network according to this embodiment.

도 9의 1열과 2열의 왼쪽 첫번째 그래프들을 참조하면, 훈련 데이터셋을 이용하여 학습한 모델을 가지고 훈련 데이터셋과 검증 데이터셋을 테스트한 결과, 즉 판별자가 샘플의 특정 특징 데이터의 판별하여 샘플의 확실성 혹은 진짜 같음(authenticity)을 결정하여 샘플의 상세를 생성할 때의 분류 정확도(classification accuracy)를 비교한 결과, 비교예의 ProjGAN에서는 약 25000 내지 약 50000 초반의 스텝 구간에서 훈련 정확도(training accuracy)와 검증 정확도(validation accuracy)의 차이가 커서 과적합이 발생한 것으로 판단할 수 있고, 본 실시예의 ContraGAN에서는 약 45000 내지 약 70000 초반의 스텍 구간에서 과적합이 발생한 것으로 볼 수 있다.Referring to the first graphs from the left in columns 1 and 2 of FIG. 9, the result of testing the training dataset and the verification dataset with the model learned using the training dataset, that is, the discriminator discriminates the specific feature data of the sample, As a result of comparing classification accuracy when generating sample details by determining authenticity or authenticity, in the comparative example ProjGAN, training accuracy and It can be determined that overfitting has occurred due to the large difference in validation accuracy, and in the ContraGAN of this embodiment, it can be seen that overfitting has occurred in the stack interval of about 45000 to about 70000 seconds.

위의 스펙트럼 분석 결과에서 보여지듯이, 본 실시예의 contraGAN는 비교예의 ProjGAN에 비해 과적합에 강건함을 확인할 수 있다. 즉, 본 실시예의 대조적대신경망(contraGAN)은 과적합에 강건하며, 따라서 적대적 학습 무너짐이 베이스라인 모델(ProjGAN) 대비 늦게 일어난다는 것을 실험적으로 확인할 수 있다.As shown in the above spectral analysis results, it can be confirmed that the contraGAN of this embodiment is more robust against overfitting than the ProjGAN of the comparative example. That is, it can be experimentally confirmed that the contrasting neural network (contraGAN) of this embodiment is robust against overfitting, and therefore the adversarial learning collapse occurs later than that of the baseline model (ProjGAN).

다시 말해서, 비교예의 베이스라인 모델(ProjGAN)의 결과(도 9의 위쪽 열의 왼쪽)와 본 실시예의 모델(ContraGAN, ours)의 결과(도 9의 아래쪽 열의 왼쪽)에서 볼 수 있듯이, 비교예 대비 본 실시예의 대조적대신경망의 훈련 및 검증 정확도(Training and validation accuracy)의 차이가 느리게 커짐을 확인할 수 있다.In other words, as can be seen from the results of the baseline model (ProjGAN) of the comparative example (left of the upper column in FIG. 9) and the result of the model (ContraGAN, ours) of this embodiment (the left of the lower column of FIG. 9), It can be seen that the difference in training and validation accuracy of the contrasting neural network of the embodiment slowly increases.

또한, 도 9의 위쪽 열의 중간과 아래쪽 열의 중간 그래프들에서 볼 수 있듯이, 낮을 수록 성능이 좋을 것을 나타내는 프레쳇 인셉션 점수(FID: frechet inception distance)의 상승이 비교예 대비 본 실시예의 경우가 늦게 일어남을 확인할 수 있다. 이는 본 실시예의 모델이 과적합에 강건하기 때문에 생기는 현상이다.In addition, as can be seen in the graphs in the middle of the upper row and the middle of the lower row in FIG. 9, the increase in the frechet inception distance (FID), which indicates that the lower the performance, is slower in the present embodiment than in the comparative example. wake up can be confirmed. This is a phenomenon that occurs because the model of this embodiment is robust against overfitting.

마지막으로, 도 9의 위쪽 열의 오른쪽과 아래쪽 열의 오른쪽 그래프들에서 볼 수 있듯이, 비교예의 베이스라인 모델(ProjGAN)과 본 실시예의 모델(ContraGAN)의 스펙트럼 분석 결과를 보면, 비교예 대비 본 실시예의 대조적대신경망의 스펙트럼들이 시각적으로 더욱 안정적이라는 것을 확인할 수 있다.Finally, as can be seen in the graphs on the right side of the upper column and the right side of the lower column of FIG. 9, the results of the spectrum analysis of the baseline model (ProjGAN) of the comparative example and the model (ContraGAN) of the present example show the contrast between the comparative example and the present example. It can be confirmed that the spectra of Daeshin Gyeongmang are visually more stable.

이와 같이, 본 실시예에 따른 대조적대신경망은 세밀한 데이터 세트에서 이미지 합성을 잘 하는 경향이 있고, 특히 이미지 특징 벡터(image feature vector)의 크기 제약으로 인해 학습 붕괴가 일어나지 않아 데이터 증강(data augmentation) 기반 정규화(regularizstion)와 잘 어울리며, FID 기준으로 큰 성능 향상을 보여준다(표 2 참조). As such, the contrasting neural network according to the present embodiment tends to synthesize images well in detailed data sets, and in particular, data augmentation because learning collapse does not occur due to the size restriction of image feature vectors. It matches well with the regularization based regularization and shows a large performance improvement on a FID basis (see Table 2).

표 2를 참조하면, 본 실시예의 대조적대신경망은 서로 다른 구현예들(A, C, E, H)에서 볼 수 있듯이 대용량 사이즈(large batch size)에 대하여 장점이 있다.Referring to Table 2, the contrasting neural network of this embodiment has an advantage in terms of large batch size, as can be seen in the different implementations (A, C, E, H).

또한, 본 실시예의 대조적대신경망의 구현예들(A, C, E, H)의 2C 손실의 효과는, FID 점수 기준으로 기존 대비 상당히 우수하고 특히, 바닐라 네트워크(vanilla networks)(A, C)에서 각각 21.6% 및 11.2% 감소한 것으로 나타났다.In addition, the effect of 2C loss of the implementations of the contrasting neural network (A, C, E, H) of this embodiment is significantly superior to the existing one based on the FID score, and in particular, the vanilla networks (A, C) decreased by 21.6% and 11.2%, respectively.

또한, APS(augmented positive samples)의 적용 전후를 비교한 결과, APS를 적용한 구현예(F)가 APS를 적용하지 않은 구현예(E)보다 약 12.9% 더 많은 시간(Time)이 걸리는 것을 확인하였다. 이것은 각 클래스 임베딩이 클래스의 대표자가 될 수 있고 해당 이미지를 끌어오는 앵커 역할을 하기 때문이라고 추측된다. 또한, 클래스 임베딩이 없으면 샘플링 상태에 따라 미니 배치(minibatch)의 이미지가 수집되어 학습이 불안정해질 수 있기 때문이라고 판단된다.In addition, as a result of comparing before and after application of APS (augmented positive samples), it was confirmed that the implementation example (F) to which APS was applied took about 12.9% more time than the implementation example (E) to which APS was not applied. . It is speculated that this is because each class embedding can be a representative of the class and serves as an anchor to pull the corresponding image. In addition, it is determined that this is because if there is no class embedding, mini-batch images are collected according to the sampling state, and learning may become unstable.

또한, 일관성 정규화(consistency regularization, CR) 성능을 비교한 결과, 구현예들(A, E, G) 및 또 다른 구현예들(C, H, I)은 바닐라 네트워크, 2C 손실 및 CR의 조합이 단순히 바닐라 네트워크(A, C)의 결과, 및 바닐라 네트워크와 2C 손실(E, H)의 조합 결과 중 하나의 FID를 줄일 수 있음을 보여줍니다. 이러한 시너지 효과는 CR을 2C 손실과 함께 사용하고 바닐라 네트워크, 2C 손실 및 CR이 큰 마진으로 바닐라 네트워크와 CR(B, D)를 이기는 경우에 적용 가능하다.In addition, as a result of comparing the consistency regularization (CR) performance, implementations (A, E, G) and other implementations (C, H, I) show that the combination of vanilla network, 2C loss and CR is We show that it is possible to reduce the FID of either the result of simply a vanilla network (A, C), or the result of a combination of a vanilla network and 2C loss (E, H). This synergy is applicable when CR is used with 2C loss and the vanilla network, 2C loss and CR beat the vanilla network and CR (B, D) by a large margin.

도 10은 본 발명의 다른 실시예에 따른 대조적대신경망을 활용하는 이미지 생성 및 편집 장치에 대한 블록도이다. 도 11은 도 10의 이미지 생성 및 편집 장치에서 대조적대신경망을 이용하여 선택적으로 이미지를 학습하거나 이미지를 생성 혹은 편집하는 과정을 설명하기 위한 블록도이다. 도 12는 본 발명의 또 다른 실시예에 따른 대조적대신경망을 활용하는 이미지 생성 및 편집 방법에 대한 흐름도이다. 도 13은 도 12의 이미지 생성 및 편집 방법의 변형 실시예를 설명하기 위한 흐름도이다. 그리고 도 14는 도 12의 이미지 생성 및 편집 방법에서 훈련 모드를 설명하기 위한 흐름도이다.10 is a block diagram of an image generating and editing apparatus utilizing a contrasting neural network according to another embodiment of the present invention. FIG. 11 is a block diagram illustrating a process of selectively learning an image or generating or editing an image using a contrasting neural network in the image generating and editing apparatus of FIG. 10 . 12 is a flowchart of a method for generating and editing an image using a contrasting neural network according to another embodiment of the present invention. FIG. 13 is a flowchart for explaining a modified embodiment of the image creation and editing method of FIG. 12 . 14 is a flowchart for explaining a training mode in the image creation and editing method of FIG. 12 .

도 10을 참조하면, 이미지 생성 및 편집 장치(100)는, 적어도 하나의 프로세서(processor, 110) 및 적어도 하나의 프로세서(110)가 일련의 단계들을 수행하도록 지시하는 명령어들(instructions)을 저장하는 메모리(memory, 120)를 포함하거나 이러한 구성요소들을 포함하는 컴퓨팅 장치에 포함될 수 있다.Referring to FIG. 10 , the image creation and editing apparatus 100 includes at least one processor 110 and instructions for instructing the at least one processor 110 to perform a series of steps. It may include a memory 120 or may be included in a computing device including these components.

또한, 이미지 생성 및 편집 장치(100)는, 유선, 무선 또는 유무선 네트워크를 통해 외부 장치와 신호 및 데이터를 주고받는 송수신 장치(transceiver, 130)를 포함할 수 있고, 입력 인터페이스, 출력 인터페이스 또는 입출력 인터페이스를 구비하는 인터페이스(140)와, 저장 장치(150)를 더 포함할 수 있다.In addition, the image creation and editing device 100 may include a transceiver 130 that exchanges signals and data with an external device through a wired, wireless, or wired/wireless network, and may include an input interface, an output interface, or an input/output interface. An interface 140 having a and a storage device 150 may be further included.

이미지 생성 및 편집 장치(100)에 포함되는 각각의 구성 요소들은 내부 네크워크 라인 혹은 버스(bus, 160)에 의해 서로 연결되어 신호 및 데이터를 주고받을 수 있다.Each component included in the image creation and editing device 100 may be connected to each other through an internal network line or bus 160 to exchange signals and data.

프로세서(110)는 중앙 처리 장치(central processing unit, CPU), 그래픽 처리 장치(graphics processing unit, GPU), 또는 본 발명의 실시예들에 따른 방법들이 수행되는 전용의 프로세서를 포함할 수 있다.The processor 110 may include a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to embodiments of the present invention are performed.

메모리(120) 및 저장 장치(150) 각각은 휘발성 저장 매체 및 비휘발성 저장 매체 중에서 적어도 하나로 형성될 수 있다. 예를 들어, 메모리(120)는 읽기 전용 메모리(read only memory, ROM) 및 랜덤 액세스 메모리(random access memory, RAM) 중에서 적어도 어느 하나로 구성될 수 있다. 메모리(120) 및 저장 장치(150) 중 적어도 하나는 상기의 명령어들을 포함하는 하나 이상의 프로그램이 기록된 컴퓨터 판독가능 저장 매체를 포함하거나, 이 저장 매체에 포함될 수 있다.Each of the memory 120 and the storage device 150 may be formed of at least one of a volatile storage medium and a non-volatile storage medium. For example, the memory 120 may include at least one of a read only memory (ROM) and a random access memory (RAM). At least one of the memory 120 and the storage device 150 includes a computer readable storage medium in which one or more programs including the above instructions are recorded, or may be included in the storage medium.

인터페이스(140)는 디스플레이 장치, 카메라 등을 포함할 수 있다. 디스플레이 장치와 카메라를 포함하는 장치는, 휴대폰, 개인 디지털 어시스턴트(PDA), 스마트 패드 등의 휴대 단말, 노트북, 퍼스널 컴퓨터 등을 포함할 수 있다.The interface 140 may include a display device, a camera, and the like. A device including a display device and a camera may include a portable terminal such as a mobile phone, a personal digital assistant (PDA), and a smart pad, a laptop computer, and a personal computer.

또한, 전술한 프로세서(110)은 전자적으로 연결되는 메모리(120)에 저장되는 명령어들이나 이 명령어들에 의해 구현되는 프로그램이나 소프트웨어 모듈들을 탑재하고, 상기 명령어들에 의해, 본 발명의 방법을 구현하는 일련의 단계들을 수행할 수 있다. 즉, 프로세서(110)에는 대조적대신경망(170)이 탑재될 수 있으며, 대조적대신경망(170)은 상기의 명령어들이나 소프트웨어 모듈들에 의해 구성될 수 있다.In addition, the above-described processor 110 is loaded with instructions stored in the electronically connected memory 120 or programs or software modules implemented by these instructions, and implements the method of the present invention by the instructions. A series of steps can be performed. That is, the processor 110 may be loaded with the contrasting neural network 170, and the contrasting neural network 170 may be configured by the above instructions or software modules.

이러한 소프프웨어 모듈들은, 도 11에 도시한 바와 같이, 본 실시예의 대조적대신경망(170)을 구성하는 요소들로써, 생성자(172), 전처리부(174), 입력전환부(176) 및 판별자(178)를 포함할 수 있다. 또한, 출력 인터페이스 또는 출력부(140)를 더 포함할 수 있다.As shown in FIG. 11, these software modules are elements constituting the contrasting neural network 170 of this embodiment, and include a generator 172, a preprocessor 174, an input conversion unit 176, and a discriminator. (178). In addition, an output interface or an output unit 140 may be further included.

본 실시예는 입력전환부(176)를 통해 대조적대신경망의 훈련 모드에서 사용한 생성자(172)를 판별자(178)로부터 분리하고, 판별자(178)를 사용하여 샘플을 입력받고 판별자를 통해 샘플을 토대로 이미지를 생성하거나 편집하도록 구성된다.In this embodiment, the generator 172 used in the training mode of the contrasting neural network is separated from the discriminator 178 through the input conversion unit 176, and samples are input using the discriminator 178, and samples are received through the discriminator. It is configured to create or edit an image based on.

생성자(172)와 전처리부(174)는 입력전환부(176)를 통해 선택적으로 판별자(178)에 연결될 수 있다. 생성자(172)와 판별자(178)에 대한 설명은 전술한 실시예의 설명과 중복되므로 생략하기로 한다.The generator 172 and the preprocessor 174 may be selectively connected to the discriminator 178 through the input conversion unit 176. Descriptions of the generator 172 and the discriminator 178 are omitted because they overlap with those of the above-described embodiment.

입력전환부(176)에 의해 훈련 모드에서 판별자(178)에 결합되어 있던 생성자(172)가 분리된 상태일 때, 전처리부(174)는 샘플을 입력 받고 판별자(178)에 입력할 수 있는 형태나 사이즈로 샘플을 변환할 수 있다.When the generator 172 coupled to the discriminator 178 in the training mode by the input conversion unit 176 is in a separated state, the pre-processing unit 174 can receive samples and input them to the discriminator 178. Samples can be converted to any shape or size.

판별자(178)는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성하고(S112), 공통 특징 데이터를 진찌 이미지의 클래스 임베딩의 차원으로 사영하고(S114), 공통 특징 데이터와 클래스 임베딩에 의한 조건부 대조 손실을 최소화하거나 미리 설정된 값에 수렴하도록 동작(S116)할 수 있다(도 12 참조).The discriminator 178 extracts common features from the input image to generate common feature data (S112), projects the common feature data into the dimension of the class embedding of the Jinjji image (S114), and An operation S116 may be performed to minimize the conditional contrast loss or to converge to a preset value (see FIG. 12).

다시 말해서, 판별자(178)는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성하고(S112), 공통 특징 데이터를 진찌 이미지의 클래스 임베딩의 차원으로 사영하고(S114), 공통 특징 데이터와 클래스 임베딩에 의한 조건부 대조 손실을 이용하여 판별자의 적대적 손실을 최소화하거나 미리 설정된 값에 수렴하도록 동작(S118)할 수 있다(도 13 참조).In other words, the discriminator 178 extracts common features from the input image to generate common feature data (S112), projects the common feature data into the dimension of the class embedding of the Jinjji image (S114), and projects the common feature data and class It is possible to minimize the adversarial loss of the discriminator using the conditional contrast loss due to embedding or to converge to a preset value (S118) (see FIG. 13).

전술한 판별자(178)는, 특징 추출 모델, 프로젝션 모델 및 최적화 모델을 포함할 수 있다. 여기서 특징 추출 모델은 특징 추출기에 대응하고, 프로젝션 모델은 프로젝션 함수에 대응하고, 최적화 모델은 2C 손실을 최적화하는 모델에 대응될 수 있다(도 4 참조).The aforementioned discriminator 178 may include a feature extraction model, a projection model, and an optimization model. Here, the feature extraction model may correspond to the feature extractor, the projection model may correspond to the projection function, and the optimization model may correspond to a model that optimizes 2C loss (see FIG. 4 ).

전술한 특징 추출 모델은 판별자에 입력되는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성할 수 있다. 프로젝션 모델은 공통 특징 데이터를 진짜 이미지의 클래스 임베딩의 차원으로 사영할 수 있다. 그리고 최적화 모델은 공통 특징 데이터 및 클래스 임베딩에 의한 조건부 대조 손실을 최소화하거나 조건부 대조 손실이 수렴하도록 동작할 수 있다.The feature extraction model described above may generate common feature data by extracting common features from an input image input to the discriminator. A projection model can project common feature data into the dimensions of the real image's class embedding. In addition, the optimization model may operate to minimize the conditional contrast loss due to common feature data and class embedding or to converge the conditional contrast loss.

특히, 최적화 모델은 조건부 대조 손실을 이용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별하거나 가짜 이미지를 생성하거나 입력 이미지의 일부를 변경하도록 판별자를 훈련시킬 수 있다.In particular, the optimization model uses conditional contrast loss to determine the authenticity of an input image, create a fake image, or change a part of an input image based on the data-data relationship and data-class relationship between multiple image embeddings in the same dimension. You can train a person.

다시 도 11을 참조하면, 판별자(178)에서 생성되는 생성 이미지, 화풍 변환된 이미지 또는 아바타는 출력부(140)를 통해 출력될 수 있다. 출력부(140)는 판별자(178)의 2C 손실이 최소가 되거나 미리 설정된 기준값을 기준으로 최적화되었다고 판단될 때의 이미지를 출력할 수 있다. 이러한 출력부(140)는 최적화 모델의 신호에 따라 판별자(178)의 최종 생성 이미지를 출력할 수 있다.Referring back to FIG. 11 , a generated image generated by the discriminator 178, an image converted to a painting style, or an avatar may be output through the output unit 140. The output unit 140 may output an image when it is determined that the 2C loss of the discriminator 178 is minimized or optimized based on a preset reference value. The output unit 140 may output a final generated image of the discriminator 178 according to the signal of the optimization model.

또한, 전술한 대조적대신경망의 학습을 위해, 도 14에 도시한 바와 같이, 생성자가 판별자에 연결된 상태에서, 생성자는 가짜 이미지를 생성할 수 있고, 생성자에 의해 생성된 가짜 이미지는 진짜 이미지와 함께 판별자에 전달될 수 있다(S110). 판별자는 입력 이미지에서 공통 특징을 추출하여 공통 특징 데이터를 생성하고(S112), 공통 특징 데이터를 진찌 이미지의 클래스 임베딩의 차원으로 사영하고(S114), 공통 특징 데이터와 클래스 임베딩에 의한 조건부 대조 손실을 최소화하거나 미리 설정된 값에 수렴하도록 학습(S116)할 수 있다.In addition, for the learning of the contrasting neural network described above, as shown in FIG. 14, in a state in which the generator is connected to the discriminator, the generator may generate a fake image, and the fake image generated by the generator is identical to the real image. Together, they can be delivered to the discriminator (S110). The discriminator extracts common features from the input image to generate common feature data (S112), projects the common feature data into the dimension of class embedding of Jinjji image (S114), and calculates conditional contrast loss by common feature data and class embedding. It can be minimized or learned to converge to a preset value (S116).

도 15는 본 발명의 또 다른 실시예에 따른 대조적대신경망을 활용하는 이미지 생성 및 편집 방법에서 신규 이미지를 생성하는 과정을 설명하기 위한 흐름도이다. 도 16은 본 발명의 또 다른 실시예에 따른 이미지 생성 및 편집 방법에서 화풍 변환된 이미지를 생성하는 과정을 설명하기 위한 흐름도이다. 그리고 도 17은 본 발명의 또 다른 실시예에 따른 이미지 생성 및 편집 방법에서 얼굴 이미지에 기초하여 아바타를 생성하는 과정을 설명하기 위한 흐름도이다.15 is a flowchart illustrating a process of generating a new image in an image creation and editing method using a contrasting neural network according to another embodiment of the present invention. 16 is a flowchart for explaining a process of generating a style-converted image in an image creation and editing method according to another embodiment of the present invention. 17 is a flowchart illustrating a process of generating an avatar based on a face image in an image creation and editing method according to another embodiment of the present invention.

도 15를 참조하면, 대조적대신경망을 활용하는 이미지 생성 및 편집 방법은, 먼저 판별자에 의해 조건부 대조 손실을 사용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별할 수 있다(S142).Referring to FIG. 15, an image generation and editing method using a contrasting neural network is first input based on a data-data relationship and a data-class relationship between multiple image embeddings in the same dimension using conditional contrast loss by a discriminator. Authenticity of the image can be determined (S142).

다음, 생성자에 의해 입력 이미지의 진위를 속이고 대조 손실이 적은 사실적 가짜 이미지를 생성할 수 있다(S144).Next, the authenticity of the input image can be faked by the generator and a realistic fake image with low contrast loss can be generated (S144).

또한, 대조적대신경망을 활용하는 이미지 생성 및 편집 방법은, 도 16에 도시한 바와 같이, 판별자에 의해 조건부 대조 손실을 사용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 입력 이미지의 진위를 판별한 후(S142), 입력 이미지의 배경 영역만이 선택되었는지를 판단하고(S143의 예), 그에 따라 판별된 입력 이미지의 진위를 속이고 대비 손실이 적은 배경 이미지를 생성하고(S145), 입력 이미지와 배경 이미지를 합성하여 화풍 변환된 이미지를 출력하도록 구성될 수 있다.In addition, as shown in FIG. 16, an image generation and editing method utilizing a contrasting neural network is a data-data relationship and a data-class relationship between multiple image embeddings in the same dimension using conditional contrast loss by a discriminator. After determining the authenticity of the input image based on (S142), it is determined whether only the background area of the input image is selected (example of S143), and accordingly, the authenticity of the determined input image is faked and a background image with low contrast loss is generated. In operation S145, the input image and the background image may be synthesized to output the converted image.

또한, 대조적대신경망을 활용하는 이미지 생성 및 편집 방법은, 도 17에 도시한 바와 같이, 판별자에 의해 조건부 대조 손실을 사용하여 동일한 차원에서의 여러 이미지 임베딩 간의 데이터-데이터 관계와 데이터-클래스 관계를 토대로 얼굴 이미지의 진위를 판별한 후(S142), 판별된 얼굴 이미지의 진위를 속이고 대비 손실이 적은 가짜 얼굴 이미지를 생성하고(S146), 생성된 가까 얼굴 이미지를 조합한 아바타를 출력할 수 있다(S148).In addition, as shown in FIG. 17, an image generation and editing method utilizing a contrasting neural network is a data-data relationship and a data-class relationship between multiple image embeddings in the same dimension using conditional contrast loss by a discriminator. After determining the authenticity of the face image based on (S142), fake face images with low contrast loss are generated by deceiving the authenticity of the determined face image (S146), and an avatar combining the generated close-up face images can be output. (S148).

전술한 실시예들에 따른 본 발명의 이미지 생성 및 편집 방법의 동작은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 프로그램 또는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산 방식으로 컴퓨터로 읽을 수 있 는 프로그램 또는 코드가 저장되고 실행될 수 있다.The operations of the method for generating and editing images of the present invention according to the above-described embodiments can be implemented as computer-readable programs or codes in a computer-readable recording medium. A computer-readable recording medium includes all types of recording devices in which data that can be read by a computer system is stored. In addition, computer-readable recording media can be distributed over networked computer systems to store and execute computer-readable programs or codes in a distributed manner.

또한, 컴퓨터가 읽을 수 있는 기록매체는 롬(rom), 램(ram), 플래시 메모리(flash memory) 등과 같이 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다. 프로그램 명령은 컴파일러(compiler)에 의해 만 들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터(interpreter) 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.In addition, the computer-readable recording medium may include hardware devices specially configured to store and execute program instructions, such as ROM, RAM, and flash memory. The program instructions may include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine code generated by a compiler.

본 발명의 일부 측면들은 장치의 문맥에서 설명되었으나, 그것은 상응하는 방법에 따른 설명 또한 나타낼 수 있고, 여기서 블록 또는 장치는 방법 단계 또는 방법 단계의 특징에 상응한다. 유사하게, 방법의 문맥에서 설명된 측면들은 또한 상응하는 블록 또는 아이템 또는 상응하는 장치의 특징으로 나타낼 수 있다. 방법 단계들의 몇몇 또는 전부는 예를 들어, 마이크로프로세서, 프로그램 가능한 컴퓨터 또는 전자 회로와 같은 하드웨어 장치에 의해(또는 이용하여) 수행될 수 있다. 몇 몇의 실시예에서, 가장 중요한 방법 단계들의 하나 이상은 이와 같은 장치에 의해 수행될 수 있다.Although some aspects of the present invention have been described in the context of an apparatus, it may also represent a description according to a corresponding method, where a block or apparatus corresponds to a method step or feature of a method step. Similarly, aspects described in the context of a method may also be represented by a corresponding block or item or a corresponding feature of a device. Some or all of the method steps may be performed by (or using) a hardware device such as, for example, a microprocessor, programmable computer, or electronic circuitry. In some embodiments, one or more of the most important method steps may be performed by such an apparatus.

실시예들에서, 프로그램 가능한 로직 장치(예를 들어, 필드 프로그래머블 게이트 어레이)가 여기서 설명된 방법들의 기능의 일부 또는 전부를 수행하기 위해 사용될 수 있다. 실시예들에서, 필드 프로그머블 게이트 어레이는 여기서 설명된 방 법들 중 하나를 수행하기 위한 마이크로프로세서와 함께 작동할 수 있다. 일반적으로, 방법들은 어떤 하드웨어 장치에 의해 수행되는 것이 바람직하다.In embodiments, a programmable logic device (eg, a field programmable gate array) may be used to perform some or all of the functions of the methods described herein. In embodiments, a field programmable gate array may operate in conjunction with a microprocessor to perform one of the methods described herein. Generally, methods are preferably performed by some hardware device.

이상 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to preferred embodiments of the present invention, those skilled in the art can variously modify and change the present invention without departing from the spirit and scope of the present invention described in the claims below. You will understand that you can.

Claims

As an image creation and editing method utilizing contrast learning and adversarial generative neural networks,
generating common feature data by extracting common features from an input image input to the discriminator;
projecting the common feature data into the dimension of the class embedding of the real image; and
Minimizing conditional contrast loss due to the common feature data and class embedding or operating so that the conditional contrast loss converges;
The operating step uses the conditional contrast loss to determine authenticity of the input image based on a data-data relationship and a data-class relationship between multiple image embeddings in the same dimension, or to generate a fake image, or to generate a part of the input image. How to change, create and edit images.

The method of claim 1,
The operating step. In the embedding space in which the data-data relationship and the data-class relationship are expressed, another embedding having a relatively high similarity to the vertex embedding or target embedding is pulled to be positioned closer to it, and another embedding having a relatively low similarity is pushed further away. How to create and edit images, allowing them to be located.

The method of claim 1,
The conditional contrast loss is expressed by Equation 1 below,
[Equation 1]

In Equation 1 above,

image,

is the image label,

is the label embedding model,

silver image embedding model,

A method of generating and editing images, respectively, representing hyperparameters for adjusting hardness.

The method of claim 1,
The image creation and editing method further comprising the step of deceiving the authenticity of the input image and generating a realistic fake image with less contrast loss.

The method of claim 4,
Inputting the fake image and the real image to the discriminator as the input image; further comprising, image creation and editing method.

An image creation and editing device utilizing contrast learning and adversarial generative neural networks,
display;
one or more cameras;
one or more processors; and
a memory storing one or more programs configured to be executed by the one or more processors;
The one or more programs include instructions for performing the method of any one of claims 1 to 5, image creation and editing device.

An image creation and editing device utilizing contrast learning and adversarial generative neural networks,
A computer-readable storage medium in which one or more programs including instructions for performing the method of any one of claims 1 to 5 are recorded,
wherein the one or more programs are executed by one or more processors of a computing device having one or more cameras and a display.

An image creation and editing method that utilizes contrast learning and adversarial generative neural networks,
determining authenticity of an input image based on a data-data relationship and a data-class relationship between multiple image embeddings in the same dimension using conditional contrast loss; and
Containing, generating and editing an image; deceiving the authenticity of the input image and generating a realistic fake image with low contrast loss.

The method of claim 8,
The conditional contrast loss corresponds to projecting common feature data obtained by extracting common features from input data into a dimension of class embedding of real data.

The method of claim 9,
The conditional contrast loss pulls another embedding having a relatively high similarity with the vertex embedding or target embedding in the embedding space in which the data-data relationship and the data-class relationship are expressed, and places it closer to the other embedding having a relatively low similarity. A method of creating and editing images, which is processed to push them further away.

An image creation and editing device using contrastive learning and adversarial generative neural networks,
a feature extraction model generating common feature data by extracting common features from an input image input to the discriminator;
a projection model projecting the common feature data into a dimension of class embedding of a real image; and
an optimization model that operates to minimize a conditional contrast loss caused by the common feature data and the class embedding or to converge the conditional contrast loss;
Including,
The optimization model uses the conditional contrast loss to determine authenticity of the input image based on a data-data relationship and a data-class relationship between multiple image embeddings in the same dimension, generate a fake image, or change a part of the input image. output, image creation and editing device.

The method of claim 11,
The optimization model is. In the embedding space in which the data-data relationship and the data-class relationship are expressed, another embedding having a high similarity to the vertex embedding or the target embedding is pulled to be placed closer and another embedding having a low similarity is pushed to be positioned farther away, Image creation and editing device.

The method of claim 11,
Further comprising a generator for deceiving the authenticity of the input image and generating a realistic fake image with low contrast loss;
wherein the fake image and the real image are input to the discriminator as the input images.

a discriminator that uses conditional contrast loss to determine authenticity of an input image based on a data-data relationship and a data-class relationship between multiple image embeddings in the same dimension; and
A generator for deceiving the authenticity of the input image and generating a realistic image with low contrast loss;
The discriminator determines authenticity of the input image based on a data-data relationship and a data-class relationship between multiple image embeddings in the same dimension by using the conditional contrast loss, generates a fake image based on the input image, or generates a fake image based on the input image. An image creation and editing device that changes and outputs a part of an image.

The method of claim 14,
Further comprising a generator separation unit or an input conversion unit that separates the generator from the discriminator and connects an input unit into which a sample is input to the discriminator,
An image generating and editing device for generating a new image by processing a sample input to the discriminator as the input image through a neural network or outputting an edited image obtained by editing the input image.