KR102611893B1

KR102611893B1 - Apparatus And Method For Generating Images Based On Generative Adversarial Networks Capable Of Dual Learning

Info

Publication number: KR102611893B1
Application number: KR1020230090553A
Authority: KR
Inventors: 서현; 오윤석; 오선혜
Original assignee: 경상국립대학교산학협력단
Priority date: 2023-07-12
Filing date: 2023-07-12
Publication date: 2023-12-07

Abstract

본 발명은 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 장치 및 방법에 관한 것으로, 보다 구체적으로 제1 소스 이미지의 제1 도메인으로부터 제1 소스 특징을 추출하는 제1 소스 인코더, 제1 타겟 이미지의 제2 도메인으로부터 제1 타겟 특징을 추출하는 제1 타겟 인코더, 제2 소스 이미지의 제2 도메인으로부터 제2 소스 특징을 추출하는 제2 소스 인코더, 제2 타겟 이미지의 제1 도메인으로부터 제2 타겟 특징을 추출하는 제2 타겟 인코더, 각 소스 인코더로부터 출력된 각 소스 특징으로부터 각 소스 이미지에서 어디에 초점을 맞춰야 하는지를 나타내는 리파인 특징을 각각 추출하는 컨볼루셔널 블록 어텐션 모듈 및 주어진 쿼리(Query)를 이용하여 각 리파인 특징에 대한 임시 어텐션 매트릭스를 각각 생성하고, 엔트로피를 기반으로 각 임시 어텐션 매트릭스를 정렬한 후 선택된 행을 갖는 최종 어텐션 매트릭스를 출력하는 쿼리 어텐션 모듈을 포함하는 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 장치 및 방법에 관한 것이다.The present invention relates to a generative adversarial network (GAN)-based image generation apparatus and method capable of dual learning, and more specifically, to a first source encoder for extracting a first source feature from a first domain of a first source image, a first a first target encoder extracting first target features from a second domain of a target image, a second source encoder extracting second source features from a second domain of a second source image, 2 A second target encoder that extracts target features, a convolutional block attention module that extracts refine features that indicate where to focus in each source image from each source feature output from each source encoder, and a given query. A generative adversarial capable of double learning that includes a query attention module that generates temporary attention matrices for each refined feature, sorts each temporary attention matrix based on entropy, and then outputs the final attention matrix with the selected rows. This relates to a neural network (GAN)-based image generation device and method.

Description

Apparatus and method for generating images based on generative adversarial networks (GAN) capable of dual learning {Apparatus And Method For Generating Images Based On Generative Adversarial Networks Capable Of Dual Learning}

본 발명은 생성적 적대 신경망(GAN)을 대칭적으로 배치함으로써, 두 도메인에 대해 동시에 학습이 가능하도록 하는 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 장치 및 방법에 관한 것이다.The present invention relates to a generative adversarial network (GAN)-based image generation device and method capable of dual learning, which enables learning for two domains simultaneously by arranging the generative adversarial network (GAN) symmetrically.

Image-to-Image(I2I) 변환은 소스 도메인에서 대상 도메인으로 이미지를 생성하는 것을 목표로 하는 컴퓨터 비전과 기계학습의 분야 중 하나이다. 이는, 생성적 적대 신경망(Generative Adversarial Networks;GAN)의 출현으로 크게 발전하였다.Image-to-Image (I2I) transformation is one of the fields of computer vision and machine learning that aims to generate images from a source domain to a target domain. This has been greatly developed with the emergence of Generative Adversarial Networks (GAN).

생성적 적대 신경망(GAN)은 비지도 학습을 기반으로 하고, 임의의 랜덤 노이즈로부터 가상의 데이터를 생성하는 생성자(Generator)가 더 실제와 같은 데이터를 생성할 수 있도록 진위여부를 판별할 수 있는 구별자(Discriminator)를 붙여 경쟁적으로 학습시키는 모델이다. 상대적으로 규모가 적은 데이터셋에서는 좋은 성능을 발휘함으로 효율적인 면에서 유리한 모델이다. 반면에, 생성적 적대 신경망(GAN)은 생성자와 판별자를 동시에 경쟁적으로 학습시킨다는 점에서 학습이 굉장히 불안정하고, 대규모 데이터셋으로 학습시키기 힘들다는 단점이 존재한다.Generative adversarial network (GAN) is based on unsupervised learning, and a generator that generates virtual data from random noise can distinguish between authenticity and authenticity so that it can generate more realistic data. It is a model that is learned competitively by adding a discriminator. It is an efficient model that performs well on relatively small datasets. On the other hand, the generative adversarial network (GAN) has the disadvantage of being very unstable in that it competitively learns the generator and discriminator at the same time, and that it is difficult to train with large datasets.

Image-to-Image(I2I) 변환에 있어서, 생성적 적대 신경망(GAN)은 입력값과 출력값을 한 쌍으로 하는 데이터를 이용한 지도 학습에서 품질 좋은 이미지를 생성할 수 있다. 그러나 지도 학습을 위해 픽셀 대 픽셀 매핑으로 한쌍을 이룬 데이터를 수집하는 것이 상당히 어렵고, 이러한 데이터를 구축하는데 상당한 비용이 드는 기술적 한계가 존재한다.In Image-to-Image (I2I) conversion, a generative adversarial network (GAN) can generate high-quality images through supervised learning using data that pairs input and output values. However, there are technical limitations that make it quite difficult to collect paired data through pixel-to-pixel mapping for supervised learning, and that constructing such data is quite expensive.

Image-to-Image(I2I) 변환에 있어서, 생성적 적대 신경망(GAN)은 사전 정보 즉, 레이블(Label)이 없는 데이터를 활용한 비지도 학습 역시 가능하나, 적대적 손실이 제한되지 않고 도메인 간에 여러 가능한 매핑이 있어 품질이 좋지 않은 이미지를 생성할 수 있다. 이러한 문제를 해결하기 위해서 개발된 CycleGAN, DiscoGAN 및 DualGAN은 주기 일관성 가정을 하게 되는데, 이 또한 형상 변경을 수행하는 기능을 제한할 수 있으며 두 도메인 간의 관계를 항상 이상적인 전단사(bijectionl)로 만들지는 않는다.In Image-to-Image (I2I) conversion, generative adversarial network (GAN) is also capable of unsupervised learning using prior information, i.e. data without labels, but the adversarial loss is not limited and several There are possible mappings that may produce images of poor quality. CycleGAN, DiscoGAN, and DualGAN, which were developed to solve this problem, make cycle consistency assumptions, which can also limit their ability to perform shape changes and do not always make the relationship between the two domains an ideal bijection. .

CycleGAN, DiscoGAN 및 DualGAN의 기술적 한계를 보완하기 위해서 DCLGAN과 QS-Attn이 개발되었다. DCLGAN은 대조 학습에서 대상 앵커(Anchor)를 선택하지 않으며, 각 앵커 특징은 다른 포지션과의 관계를 고려하지 않고 수용영역만 제한된다는 문제점이 있다. QS-Attn은 두 개의 개별 도메인에 대해 하나의 임베딩을 사용하여 도메인 격차를 포착하지 못하여 훈련 안정성이 감소할 수 있다. 그리고 QS-Attn은 앵커 특징을 얻었을 때 해당 특징이 도메인 특성을 반영하는 데 얼마나 중요한지를 측정하기 위해 엔트로피 분포를 매트릭으로 계산한 후 엔트로피를 정렬하여 이미지에서 가장 작은 N 포인트를 선택한다. 엔트로피 매트릭을 분석했을 때 샘플 수를 초과하는 제로 엔트로피 포인트를 발견하였는데 이는 중요한 특징 선택이 누락된 것이다. 즉, QS-Attn은 대상이 아닌 배경에 해당하는 부분에 누락된 특징들이 집중되어 있을 수 있다.DCLGAN and QS-Attn were developed to complement the technical limitations of CycleGAN, DiscoGAN, and DualGAN. DCLGAN does not select a target anchor in contrastive learning, and each anchor feature has a problem in that its receptive field is limited without considering the relationship with other positions. QS-Attn uses one embedding for two separate domains, which may not capture domain gaps, reducing training stability. And when QS-Attn obtains an anchor feature, it calculates the entropy distribution as a metric to measure how important the feature is in reflecting the domain characteristics, then sorts the entropy to select the smallest N points in the image. When we analyzed the entropy metric, we found zero entropy points that exceeded the number of samples, which meant that important feature selection was missing. In other words, in QS-Attn, missing features may be concentrated in parts that correspond to the background rather than the target.

이에, Image-to-Image(I2I) 변환에 있어서 비지도 학습이 가능한 GAN을 이용하여 보다 품질이 좋은 이미지를 생성할 수 있는 기술이 본 기술분야에서 절실히 필요한 실정이다.Accordingly, technology that can generate better quality images using GAN capable of unsupervised learning in Image-to-Image (I2I) conversion is urgently needed in this technical field.

대한민국 공개특허공보 제10-2023-0070128호Republic of Korea Patent Publication No. 10-2023-0070128 대한민국 등록특허공보 제10-1975186호Republic of Korea Patent Publication No. 10-1975186

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로, 비지도 학습 기반으로 보다 품질이 좋은 이미지를 생성할 수 있도록 생성적 적대 신경망(GAN)을 대칭적으로 배치하여 두 도메인에 대해 동시에 학습이 가능하도록 하는 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 장치 및 방법을 얻고자 하는 것을 목적으로 한다.The present invention is intended to solve the above problems, and symmetrically deploys a generative adversarial network (GAN) to generate better quality images based on unsupervised learning, enabling simultaneous learning in two domains. The purpose is to obtain a generative adversarial network (GAN)-based image generation device and method capable of double learning.

본 발명이 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 본 발명의 기재로부터 당해 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있다.The technical problems to be achieved by the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned can be clearly understood by those skilled in the art from the description of the present invention.

상기 목적을 달성하기 위하여, 본 발명의 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 장치는 제1 소스 이미지의 제1 도메인으로부터 제1 소스 특징을 추출하는 제1 소스 인코더; 제1 타겟 이미지의 제2 도메인으로부터 제1 타겟 특징을 추출하는 제1 타겟 인코더; 제2 소스 이미지의 제2 도메인으로부터 제2 소스 특징을 추출하는 제2 소스 인코더; 제2 타겟 이미지의 제1 도메인으로부터 제2 타겟 특징을 추출하는 제2 타겟 인코더; 각 소스 인코더로부터 출력된 각 소스 특징으로부터 각 소스 이미지에서 어디에 초점을 맞춰야 하는지를 나타내는 리파인 특징을 각각 추출하는 컨볼루셔널 블록 어텐션 모듈; 및 주어진 쿼리(Query)를 이용하여 각 리파인 특징에 대한 임시 어텐션 매트릭스를 각각 생성하고, 엔트로피를 기반으로 각 임시 어텐션 매트릭스를 정렬한 후 선택된 행을 갖는 최종 어텐션 매트릭스를 출력하는 쿼리 어텐션 모듈;을 제공한다.In order to achieve the above object, the generative adversarial network (GAN)-based image generation apparatus capable of dual learning of the present invention includes a first source encoder for extracting a first source feature from a first domain of a first source image; a first target encoder that extracts a first target feature from a second domain of the first target image; a second source encoder for extracting second source features from a second domain of the second source image; a second target encoder extracting second target features from the first domain of the second target image; A convolutional block attention module that extracts refine features indicating where to focus in each source image from each source feature output from each source encoder; and a query attention module that generates a temporary attention matrix for each refined feature using a given query, sorts each temporary attention matrix based on entropy, and then outputs a final attention matrix with the selected row. do.

상기 목적을 달성하기 위하여, 본 발명의 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 방법은 제1 소스 인코더에 의하여, 제1 소스 이미지의 제1 도메인으로부터 제1 소스 특징이 추출되는 제1 소스 특징 추출단계; 제1 타겟 인코더에 의하여, 제1 타겟 이미지의 제2 도메인으로부터 제1 타겟 특징이 추출되는 제1 타겟 특징 추출단계; 제2 소스 인코더에 의하여, 제2 소스 이미지의 제2 도메인으로부터 제2 소스 특징이 추출되는 제2 소스 특징 추출단계; 제2 타겟 인코더에 의하여, 제2 타겟 이미지의 제1 도메인으로부터 제2 타겟 특징이 추출되는 제2 타겟 특징 추출단계; 컨볼루셔널 블록 어텐션 모듈에 의하여, 각 소스 인코더로부터 출력된 각 소스 특징으로부터 각 소스 이미지에서 어디에 초점을 맞춰야 하는지를 나타내는 리파인 특징이 각각 추출되는 리파인 특징 추출단계; 쿼리 어텐션 모듈에 의하여, 주어진 쿼리(Query)가 이용되어 각 리파인 특징에 대한 임시 어텐션 매트릭스가 각각 생성되는 임시 어텐션 매트릭스 생성단계; 및 상기 쿼리 어텐션 모듈에 의하여, 엔트로피를 기반으로 각 임시 어텐션 매트릭스가 정렬된 후 선택된 행을 갖는 최종 어텐션 매트릭스가 출력되는 최종 어텐션 매트릭스 출력단계;를 제공한다.In order to achieve the above object, the generative adversarial network (GAN)-based image generation method capable of dual learning of the present invention is a first source feature in which the first source feature is extracted from the first domain of the first source image by the first source encoder. 1 Source feature extraction step; A first target feature extraction step of extracting a first target feature from a second domain of the first target image by a first target encoder; A second source feature extraction step of extracting second source features from a second domain of the second source image by a second source encoder; A second target feature extraction step of extracting a second target feature from the first domain of the second target image by a second target encoder; A refine feature extraction step in which refine features indicating where to focus in each source image are extracted from each source feature output from each source encoder by a convolutional block attention module; A temporary attention matrix generation step in which a temporary attention matrix for each refined feature is generated by a query attention module using a given query; and a final attention matrix output step in which, by the query attention module, each temporary attention matrix is sorted based on entropy and then a final attention matrix having the selected row is output.

이상과 같이 본 발명에 의하면 생성적 적대 신경망(GAN)을 대칭적으로 배치하여 두 도메인에 대해 동시에 학습이 가능하도록 함으로써, 비지도 학습 기반으로 보다 품질이 좋은 이미지를 생성할 수 있는 효과가 있다.As described above, according to the present invention, a generative adversarial network (GAN) is arranged symmetrically to enable learning for two domains simultaneously, which has the effect of generating better quality images based on unsupervised learning.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 상세한 설명 및 청구범위의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the detailed description and claims.

도 1은 본 발명의 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 장치 구성도이다.
도 2는 본 발명의 일실시예에 따른 제1 소스 이미지를 처리하는 제1 소스 인코더와, 제1 소스 특징을 처리하는 컨볼루셔널 블록 어텐션 모듈을 표시한 도면이다.
도 3은 본 발명의 일실시예에 다른 제2 소스 이미지를 처리하는 제2 소스 인코더와, 제2 소스 특징을 처리하는 컨볼루셔널 블록 어텐션 모듈을 표시한 도면이다.
도 4는 본 발명의 일실시예에 따른 제1 리파인 특징을 처리하는 쿼리 어텐션 모듈과 제1 타겟 이미지를 처리하는 제1 타겟 인코터와 학습부를 표시한 도면이다.
도 5는 본 발명의 일실시예에 따른 제2 리파인 특징을 처리하는 쿼리 어텐션 모듈과 제2 타겟 이미지를 처리하는 제2 타겟 인코더와 학습부를 표시한 도면이다.
도 6은 본 발명의 일실시예에 따른 제1 소스 이미지와 제1 타겟 이미지에 대한 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 방법 흐름도이다.
도 7은 본 발명의 일실시예에 다른 제2 소스 이미지와 제2 타겟 이미지에 대한 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 방법 흐름도이다.
Figure 1 is a configuration diagram of a generative adversarial network (GAN)-based image generation device capable of dual learning according to the present invention.
Figure 2 is a diagram showing a first source encoder that processes a first source image and a convolutional block attention module that processes first source features according to an embodiment of the present invention.
Figure 3 is a diagram showing a second source encoder that processes a second source image and a convolutional block attention module that processes second source features according to an embodiment of the present invention.
Figure 4 is a diagram showing a query attention module that processes the first refine feature and a first target encoder and a learning unit that process the first target image according to an embodiment of the present invention.
Figure 5 is a diagram showing a query attention module that processes second refine features and a second target encoder and learning unit that process a second target image according to an embodiment of the present invention.
Figure 6 is a flowchart of a generative adversarial network (GAN)-based image generation method capable of dual learning of a first source image and a first target image according to an embodiment of the present invention.
Figure 7 is a flowchart of a generative adversarial network (GAN)-based image generation method capable of dual learning of a second source image and a second target image according to an embodiment of the present invention.

본 명세서에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in this specification are general terms that are currently widely used as much as possible while considering the function in the present invention, but this may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technology, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the relevant invention. Therefore, the terms used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, rather than simply the name of the term.

다르게 정의되지 않는 한 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by a person of ordinary skill in the technical field to which the present invention pertains. Terms defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings they have in the context of the related technology, and unless clearly defined in the present application, should not be interpreted in an idealized or excessively formal sense. No.

이하, 본 발명에 따른 실시예를 첨부한 도면을 참조하여 상세히 설명하기로 한다. 도 1은 본 발명의 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 장치 구성도이다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings. Figure 1 is a configuration diagram of a generative adversarial network (GAN)-based image generation device capable of dual learning according to the present invention.

도 2는 본 발명의 일실시예에 따른 제1 소스 이미지(I_x)를 처리하는 제1 소스 인코더(500)와, 제1 소스 특징(F_{x_1})을 처리하는 컨볼루셔널 블록 어텐션 모듈(900a)을 표시한 도면이다. 도 3은 본 발명의 일실시예에 다른 제2 소스 이미지(I_y)를 처리하는 제2 소스 인코더(700)와, 제2 소스 특징(F_{y_2})을 처리하는 컨볼루셔널 블록 어텐션 모듈(900b)을 표시한 도면이다.Figure 2 shows a first source encoder 500 that processes the first source image (I _x ) and a convolutional block attention module 900a that processes the first source feature (F _{x_1} ) according to an embodiment of the present invention. ) is a drawing showing. Figure 3 shows a second source encoder 700 that processes a second source image (I _y ) and a convolutional block attention module (900b) that processes a second source feature (F _{y_2} ) according to an embodiment of the present invention. ) is a drawing showing.

도 4는 본 발명의 일실시예에 따른 제1 리파인 특징(Refined F_{x_1})을 처리하는 쿼리 어텐션 모듈(1000a)과 제1 타겟 이미지(G_y)를 처리하는 제1 타겟 인코터(600)와 학습부(1100a)를 표시한 도면이다. 도 5는 본 발명의 일실시예에 따른 제2 리파인 특징(Refined F_{y_2})을 처리하는 쿼리 어텐션 모듈(1000b)과 제2 타겟 이미지(G_x)를 처리하는 제2 타겟 인코더(800)와 학습부(1100b)를 표시한 도면이다.Figure 4 shows a query attention module 1000a that processes the first refined feature (Refined F _{x_1} ) and a first target encoder 600 that processes the first target image (G _y ) according to an embodiment of the present invention. This is a diagram showing the learning unit 1100a. Figure 5 shows a query attention module 1000b that processes the second refined feature (Refined F _{y_2} ) and a second target encoder 800 that processes the second target image (G _x ) and learning according to an embodiment of the present invention. This is a diagram showing part 1100b.

도 6은 본 발명의 일실시예에 따른 제1 소스 이미지(I_x)와 제1 타겟 이미지(G_y)에 대한 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 방법 흐름도이다. 도 7은 본 발명의 또 다른 일실시예에 따른 제2 소스 이미지(I_y)와 제2 타겟 이미지(G_x)에 대한 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 방법 흐름도이다.Figure 6 is a flowchart of a generative adversarial network (GAN)-based image generation method capable of dual learning on the first source image (I _x ) and the first target image (G _y ) according to an embodiment of the present invention. Figure 7 is a flowchart of a generative adversarial network (GAN)-based image generation method capable of dual learning of a second source image (I _y ) and a second target image (G _x ) according to another embodiment of the present invention.

이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 장치Generative adversarial network (GAN)-based image generation device capable of double learning

도 1을 보면, 본 발명의 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 장치는 제1 소스 인코더(500), 제1 타겟 인코더(600), 제2 소스 인코더(700) 및 제2 타겟 인코더(800), 컨볼루셔널 블록 어텐션 모듈(900a, 900b) 및 쿼리 어텐션 모듈(1000a, 1000b)을 포함한다.Referring to Figure 1, the generative adversarial network (GAN)-based image generation device capable of dual learning of the present invention includes a first source encoder 500, a first target encoder 600, a second source encoder 700, and a second It includes a target encoder 800, convolutional block attention modules 900a and 900b, and query attention modules 1000a and 1000b.

우선, 상기 제1 소스 인코더(500)는 제1 소스 이미지(I_x)의 제1 도메인(x)으로부터 제1 소스 특징(F_{x_1})을 추출한다. 상기 제1 타겟 인코더(600)는 제1 타겟 이미지(G_y)의 제2 도메인(y)으로부터 제1 타겟 특징(F_{y_1})을 추출한다.First, the first source encoder 500 extracts a first source feature (F _{x_1} ) from the first domain (x) of the first source image (I _x ). The first target encoder 600 extracts the first target feature (F _{y_1} ) from the second domain (y) of the first target image (G _y ).

다음으로, 상기 제2 소스 인코더(700)는 제2 소스 이미지(I_y)의 제2 도메인(y)으로부터 제2 소스 특징(F_{y_2})을 추출한다. 상기 제2 타겟 인코더(800)는 제2 타겟 이미지(G_x)의 제1 도메인(x)으로부터 제2 타겟 특징(F_{x_2})을 추출한다.Next, the second source encoder 700 extracts the second source feature (F _{y_2} ) from the second domain (y) of the second source image (I _y ). The second target encoder 800 extracts a second target feature (F _{x_2} ) from the first domain (x) of the second target image (G _x ).

다음으로, 상기 컨볼루셔널 블록 어텐션 모듈(900a, 900b)는 각 소스 인코더(500, 700)로부터 출력된 각 소스 특징(F_{x_1}, F_{y_2})으로부터 각 소스 이미지(I_x, I_y)에서 어디에 초점을 맞춰야 하는지를 나타내는 리파인 특징(Refined F_{x_1}, Refined F_{y_2})을 각각 추출한다.Next, the convolutional block attention modules (900a, 900b) determine where in each source image (I _x , I _y ) from each source feature (F _{x_1} , F _{y_2} ) output from each source encoder (500, 700). Refined features (Refined F _{x_1} , Refined F _{y_2} ) indicating whether to focus are extracted respectively.

도 2의 일실시예를 보면, 하나의 컨볼루셔널 블록 어텐션 모듈(900a)은 제1 소스 인코더(500)로부터 출력된 제1 소스 특징(F_{x_1})이 입력된다. 여기서, 상기 제1 소스 특징(F_{x_1})은 높이(H)*폭(W)*채널(C) 크기를 가질 수 있다. 하나의 컨볼루셔널 블록 어텐션 모듈(900a)은 하나의 채널 어텐션 모듈(910a)과 하나의 공간 어텐션 모듈(920a)를 포함할 수 있다.Looking at the embodiment of FIG. 2, the first source feature (F _{x_1} ) output from the first source encoder 500 is input to one convolutional block attention module 900a. Here, the first source feature (F _{x_1} ) may have a size of height (H) * width (W) * channel (C). One convolutional block attention module 900a may include one channel attention module 910a and one spatial attention module 920a.

우선, 하나의 채널 어텐션 모듈(910a)은 상기 제1 소스 특징(F_{x_1})의 채널 간의 관계를 살펴 어떤 채널에 더 초점을 맞춰야 하는지 인코딩하기 위함이다. 이는 하기 [수학식 1]과 같다.First, the one channel attention module 910a is used to encode which channel should be focused more by examining the relationship between channels of the first source feature (F _{x_1} ). This is as shown in [Equation 1] below.

[수학식 1][Equation 1]

우선 하나의 채널 어텐션 모듈(910a)은 최대 풀링(Max pooling)을 통해서 제1 소스 특징(F_{x_1})에서 채널별 최대값만을 벡터로 출력할 수 있다. 그리고 하나의 채널 어텐션 모듈(910a)은 최대 풀링(Max pooling)된 벡터들을 2개의 층(layer)을 갖는 다중 퍼셉트론(Multi-Layer Perceptron; MLP)을 통해서 차원을 축소할 수 있다. 하나의 채널 어텐션 모듈(910a)은 차원이 축소된 벡터들을 시그모이드 함수(Sigmoid function)를 이용하여 확률값으로 변환할 수 있다. 그리고 하나의 채널 어텐션 모듈(910a)은 채널별 확률값 중에서 가장 큰 확률값을 갖는 채널에 초점을 맞춰 상기 제1 소스 특징(F_{x_1})을 인코딩할 수 있다.First, one channel attention module 910a can output only the maximum value for each channel from the first source feature (F _{x_1} ) as a vector through max pooling. And one channel attention module 910a can reduce the dimension of the maximum pooled vectors through a multi-layer perceptron (MLP) with two layers. One channel attention module 910a can convert dimension-reduced vectors into probability values using a sigmoid function. And one channel attention module 910a can encode the first source feature (F _{x_1} ) by focusing on the channel with the largest probability value among the probability values for each channel.

이에 따라, 하나의 컨볼루셔널 블록 어텐션 모듈(900a)은 상기 제1 소스 특징(F_{x_1})과 하나의 채널 어텐션 모듈(910a)로부터 채널이 인코딩된 제1 소스 특징을 곱산하여 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})을 출력할 수 있다.Accordingly, one convolutional block attention module 900a multiplies the first source feature ( _F Source features (Channel_refined F _{x_1} ) can be output.

또한, 하나의 공간 어텐션 모듈(920a)은 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})의 H*W*1 크기에서 다수 개의 픽셀 중 위치관계를 살펴 어떤 위치에 더 초점을 맞춰야 하는지 인코딩하기 위함이다. 이는 하기 [수학식 2]와 같다.In addition, one spatial attention module 920a is used to encode which position should be focused more by examining the positional relationship among a plurality of pixels in the H*W*1 size of the channel refined first source feature (Channel_refined F _{x_1} ). am. This is the same as [Equation 2] below.

[수학식 2][Equation 2]

우선 하나의 공간 어텐션 모듈(920a)은 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})이 입력될 수 있다. 하나의 공간 어텐션 모듈(920a)은 H*W*1 크기를 대상으로 하는 최대 풀링(Max pooling)을 통해서 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})에서 최대값만을 벡터로 출력할 수 있다. 그리고 하나의 공간 어텐션 모듈(920a)은 H*W*1 크기를 갖는 두 채널을 연결하여 H*W*2 크기의 특징맵을 생성할 수 있다. 하나의 공간 어텐션 모듈(920a)은 특징맵이 2채널이므로 7*7 콘볼루션 연산을 수행할 수 있다. 하나의 공간 어텐션 모듈(920a)은 시그모이드 함수(Sigmoid function)를 이용하여 7*7 콘볼루션 연산된 특징맵을 확률값으로 변환할 수 있다. 그리고 하나의 공간 어텐션 모듈(920a)은 확률값 중에서 가장 큰 확률값을 갖는 위치에 초점을 맞춰 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})을 인코딩할 수 있다.First, a channel-refined first source feature (Channel_refined F _{x_1} ) may be input to one spatial attention module 920a. One spatial attention module 920a can output only the maximum value from the channel refined first source feature (Channel_refined F _{x_1} ) as a vector through max pooling targeting the size of H*W*1. And one spatial attention module 920a can generate a feature map of size H*W*2 by connecting two channels of size H*W*1. One spatial attention module 920a can perform a 7*7 convolution operation because the feature map is 2 channels. One spatial attention module 920a can convert the feature map calculated by 7*7 convolution into a probability value using a sigmoid function. And one spatial attention module 920a can encode the channel-refined first source feature (Channel_refined F _{x_1} ) by focusing on the position with the largest probability value among the probability values.

이에 따라, 하나의 컨볼루셔널 블록 어텐션 모듈(900a)은 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})과 하나의 공간 어텐션 모듈(920a)로부터 위치가 인코딩된 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})을 곱산하여 하나의 리파인 특징(Refined F_{x_1})을 출력할 수 있다.Accordingly, one convolutional block attention module 900a combines a channel-refined first source feature (Channel_refined F _{x_1} ) and a channel-refined first source feature (Channel_refined By multiplying F _{x_1} ), one refined feature (Refined F _{x_1} ) can be output.

도 3의 일실시예를 보면, 또 다른 하나의 컨볼루셔널 블록 어텐션 모듈(900b)은 제2 소스 인코더(700)로부터 출력된 제2 소스 특징(F_{y_2})이 입력된다. 여기서, 상기 제2 소스 특징(F_{y_2})은 높이(H)*폭(W)*채널(C) 크기를 가질 수 있다. 또 다른 하나의 컨볼루셔널 블록 어텐션 모듈(900b)은 또 다른 하나의 채널 어텐션 모듈(910b)과 또 다른 하나의 공간 어텐션 모듈(920b)를 포함할 수 있다.Looking at the embodiment of FIG. 3, the second source feature (F _{y_2} ) output from the second source encoder 700 is input to another convolutional block attention module 900b. Here, the second source feature (F _{y_2} ) may have a size of height (H) * width (W) * channel (C). Another convolutional block attention module 900b may include another channel attention module 910b and another spatial attention module 920b.

우선, 또 다른 하나의 채널 어텐션 모듈(910b)은 상기 제2 소스 특징(F_{y_2})의 채널 간의 관계를 살펴 어떤 채널에 더 초점을 맞춰야 하는지 인코딩하기 위함이다. 이는 하기 [수학식 3]과 같다.First, another channel attention module 910b is used to encode which channel should be focused more by examining the relationship between channels of the second source feature (F _{y_2} ). This is the same as [Equation 3] below.

[수학식 3][Equation 3]

우선 또 다른 하나의 채널 어텐션 모듈(910b)은 최대 풀링(Max pooling)을 통해서 제2 소스 특징(F_{y_2})에서 채널별 최대값만을 벡터로 출력할 수 있다. 다음으로 또 다른 하나의 채널 어텐션 모듈(910b)은 최대 풀링(Max pooling)된 벡터들을 각각 2개의 층(layer)을 갖는 다중 퍼셉트론(Multi-Layer Perceptron; MLP)을 통해서 차원을 축소할 수 있다. 또 다른 하나의 채널 어텐션 모듈(910b)은 차원이 축소된 벡터들을 시그모이드 함수(Sigmoid function)를 이용하여 확률값으로 변환할 수 있다. 그리고 또 다른 하나의 채널 어텐션 모듈(910b)은 채널별 확률값 중에서 가장 큰 확률값을 갖는 채널에 초점을 맞춰 상기 제2 소스 특징(F_{y_2})을 인코딩할 수 있다.First, another channel attention module 910b can output only the maximum value for each channel from the second source feature (F _{y_2} ) as a vector through max pooling. Next, another channel attention module 910b can reduce the dimension of the maximum pooled vectors through a multi-layer perceptron (MLP) with two layers each. Another channel attention module 910b can convert dimension-reduced vectors into probability values using a sigmoid function. And another channel attention module 910b may encode the second source feature (F _{y_2} ) by focusing on the channel with the largest probability value among the probability values for each channel.

이에 따라, 또 다른 하나의 컨볼루셔널 블록 어텐션 모듈(900b)은 상기 제2 소스 특징(F_{y_2})과 또 다른 하나의 채널 어텐션 모듈(910b)로부터 채널이 인코딩된 제2 소스 특징을 곱산하여 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})을 출력할 수 있다.Accordingly, another convolutional block attention module 900b multiplies the second source feature (F _{y_2} ) and the second source feature in which the channel is encoded from another channel attention module 910b to obtain the channel The refined second source feature (Channel_refined F _{y_2} ) can be output.

또한, 또 다른 하나의 공간 어텐션 모듈(920b)은 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})의 H*W*1 크기에서 다수 개의 픽셀 중 위치관계를 살펴 어떤 위치에 더 초점을 맞춰야 하는지 인코딩하기 위함이다. 이는 하기 [수학식 4]와 같다.In addition, another spatial attention module 920b encodes which position to focus more on by looking at the positional relationship among a plurality of pixels in the H*W*1 size of the channel refined second source feature (Channel_refined F _{y_2} ). This is to do it. This is the same as [Equation 4] below.

[수학식 4][Equation 4]

우선 또 다른 하나의 공간 어텐션 모듈(920b)은 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})이 입력될 수 있다. 또 다른 하나의 공간 어텐션 모듈(920b)은 H*W*1 크기를 대상으로 하는 최대 풀링(Max pooling)을 통해서 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})에서 최대값만을 벡터로 출력할 수 있다. 그리고 또 다른 하나의 공간 어텐션 모듈(920b)은 H*W*1 크기를 갖는 두 채널을 연결하여 H*W*2 크기의 특징맵을 생성할 수 있다. 또 다른 하나의 공간 어텐션 모듈(920b)은 특징맵이 2채널이므로 7*7 콘볼루션 연산을 수행할 수 있다. 또 다른 하나의 공간 어텐션 모듈(920b)은 시그모이드 함수(Sigmoid function)를 이용하여 7*7 콘볼루션 연산된 특징맵을 확률값으로 변환할 수 있다. 그리고 또 다른 하나의 공간 어텐션 모듈(920b)은 확률값 중에서 가장 큰 확률값을 갖는 위치에 초점을 맞춰 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})을 인코딩할 수 있다.First, a channel-refined second source feature (Channel_refined F _{y_2} ) may be input to another spatial attention module 920b. Another spatial attention module 920b can output only the maximum value from the channel-refined second source feature (Channel_refined F _{y_2} ) as a vector through max pooling targeting the size of H*W*1. there is. And another spatial attention module 920b can generate a feature map of size H*W*2 by connecting two channels of size H*W*1. Another spatial attention module 920b can perform a 7*7 convolution operation because the feature map is 2 channels. Another spatial attention module 920b can convert the feature map calculated by 7*7 convolution into a probability value using a sigmoid function. And another spatial attention module 920b can encode the channel-refined second source feature (Channel_refined F _{y_2} ) by focusing on the position with the largest probability value among the probability values.

이에 따라, 또 다른 하나의 컨볼루셔널 블록 어텐션 모듈(900b)은 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})과 또 다른 하나의 공간 어텐션 모듈(920b)로부터 위치가 인코딩된 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})을 곱산하여 또 다른 하나의 리파인 특징(Refined F_{y_2})을 출력할 수 있다.Accordingly, another convolutional block attention module 900b uses a channel-refined second source feature (Channel_refined F _{y_2} ) and a second channel-refined channel-refined source feature whose position is encoded from another spatial attention module 920b. Another refined feature (Refined F _{y_2} ) can be output by multiplying the source feature (Channel_refined F _{y_2} ).

다음으로, 상기 쿼리 어텐션 모듈(1000a, 1000b)은 주어진 쿼리(Query)를 이용하여 각 리파인 특징(Refined F_{x_1}, Refined F_{y_2})에 대한 임시 어텐션 매트릭스를 각각 생성하고, 엔트로피를 기반으로 각 임시 어텐션 매트릭스를 정렬한 후 선택된 행을 갖는 최종 어텐션 매트릭스를 출력한다.Next, the query attention modules 1000a and 1000b generate temporary attention matrices for each refined feature (Refined F _{x_1} , Refined F _{y_2} ) using a given query, and each temporary attention matrix is generated based on entropy. After sorting the matrix, output the final attention matrix with the selected rows.

바람직하게, 상기 쿼리 어텐션 모듈(1000a, 1000b)은, 각 리파인 특징(Refined F_{x_1}, Refined F_{y_2})의 값(Value)과 각 최종 어텐션 매트릭스를 연산하여 소스 어텐션 매트릭스를 각각 출력하는 소스 어텐션 매트릭스 출력부(1010a, 1010b) 및 각 타겟 특징(F_{y_1}, F_{x_2})의 값(Value)과 각 최종 어텐션 매트릭스를 연산하여 타겟 어텐션 매트릭스를 각각 출력하는 타겟 어텐션 매트릭스 출력부(1020a, 1020b)를 포함하는 것을 특징으로 한다.Preferably, the query attention modules 1000a and 1000b calculate the value of each refined feature (Refined F _{x_1} , Refined F _{y_2} ) and each final attention matrix to output a source attention matrix, respectively. Comprising units 1010a and 1010b and target attention matrix output units 1020a and 1020b that calculate the value of each target feature (F _{y_1} , F _{x_2} ) and each final attention matrix and output the target attention matrix, respectively. It is characterized by

다음으로, 상기 학습부(1100a, 1100b)는 각 소스 어텐션 매트릭스와 각 타겟 어텐션 매트릭스 내 쌍(pair)의 유사도를 연산하고, 손실함수를 이용하여 각 최종 어텐션 매트릭스의 손실값을 줄이는 방향으로 학습하는 것을 특징으로 한다.Next, the learning units 1100a and 1100b calculate the similarity of pairs in each source attention matrix and each target attention matrix, and learn to reduce the loss value of each final attention matrix using a loss function. It is characterized by

일반적으로, 어텐션 연산에는 쿼리(Query), 키(key) 및 값(Value) 3가지 벡터가 사용될 수 있다. 쿼리(Query)는 현재 시점의 토큰이고, 키(key) 및 값(Value)은 어텐션을 구하고자 하는 대상 토큰이다. 다시 말하면, 쿼리(Query)는 고정되어 있는 하나의 토큰이고, 해당 쿼리(Query)와 가장 부합하는, 즉 어텐션이 가장 높은 토큰을 찾기 위해서 키(key) 및 값(Value)을 처음부터 끝까지 탐색시키는 것이다. 쿼리(Query), 키(key) 및 값(Value)은 동일한 크기를 갖는다.Generally, three vectors can be used in attention operations: query, key, and value. Query is the token at the current time, and key and value are the target tokens for which attention is sought. In other words, a query is a fixed token, and the key and value are searched from beginning to end to find the token that best matches the query, that is, the token with the highest attention. will be. Query, key, and value have the same size.

도 4의 일실시예를 보면, 하나의 쿼리 어텐션 모듈(1000a)은 (n×d_k) 크기를 갖는 쿼리(Query)를 이용하여 행과 열이 변환되어 (d_k×n) 크기를 갖는 키(key)와 곱산할 수 있다. 그리고 하나의 쿼리 어텐션 모듈(1000a)은 소프트맥스(softmax) 함수를 이용하여 앞서 곱산한 값을 0 내지 1 사이의 정규화된 값으로 변환할 수 있다. 이때, 소프트맥스(Softmax) 함수로 정규화된 값을 모두 더하면 1이 된다.4, one query attention module 1000a converts rows and columns using a query with a size of (n × d _k ) to create a key with a size of (d _k × n). It can be multiplied by (key). And one query attention module 1000a can convert the previously multiplied value into a normalized value between 0 and 1 using the softmax function. At this time, adding up all the normalized values with the Softmax function becomes 1.

즉, 하나의 쿼리 어텐션 모듈(1000a)로부터 출력된 (n×n) 크기를 갖는 하나의 임시 어텐션 매트릭스의 각 픽셀은 전체 확률에서 특정 클래스에 속할 확률을 갖고, 0 내지 1 사이의 정규화된 값을 갖는다. 하나의 임시 어텐션 매트릭스는 각 픽셀이 속한 클래스마다 서로 다른 색으로 표시될 수 있고, 동일한 클래스에서 높은 값을 가질수록 진한 색으로 표시될 수 있고, 동일한 클래스에서 낮은 값을 가질수록 옅은 색으로 표시될 수 있다. 여기서, 클래스는 긍정(Positive) 및 부정(Negative)을 포함할 수 있다.That is, each pixel of one temporary attention matrix with a size (n×n) output from one query attention module 1000a has a probability of belonging to a specific class in the overall probability, and has a normalized value between 0 and 1. have One temporary attention matrix may be displayed in different colors for each class to which each pixel belongs, with higher values in the same class being displayed in darker colors, and lower values in the same class being displayed in lighter colors. You can. Here, the class may include positive and negative.

또한, 하나의 쿼리 어텐션 모듈(1000a)은 (n×n) 크기를 갖는 하나의 임시 어텐션 매트릭스를 엔트로피를 기반으로 정렬할 수 있고, 선택된 행을 갖는 하나의 최종 어텐션 매트릭스를 출력할 수 있다. 예컨대, 선택된 행이 d_k개일 수 있다. 즉, 하나의 최종 어텐션 매트릭스는 (d_k×n) 크기를 가질 수 있다.Additionally, one query attention module 1000a can sort one temporary attention matrix with a size of (n×n) based on entropy and output one final attention matrix with the selected row. For example, there may be d _k selected rows. That is, one final attention matrix may have a size of (d _k × n).

또한, 하나의 소스 어텐션 매트릭스 출력부(1010a)는 하나의 리파인 특징(Refined_F_{x_1})에 대한 (n×d_k) 크기를 갖는 값(Value)과 (d_k×n) 크기를 갖는 최종 어텐션 매트릭스를 곱산하여 (n×n) 크기를 갖는 하나의 소스 어텐션 매트릭스(Source attention matrix)를 출력할 수 있다.In addition, the one source attention matrix output unit 1010a outputs a value having a size of (n×d _k ) for one refined feature (Refined_F _{x_1} ) and a final attention matrix having a size of (d _k ×n). By multiplying, one source attention matrix with a size of (n×n) can be output.

또한, 하나의 타겟 어텐션 매트릭스 출력부(1020a)는 제1 타겟 이미지(G_y)로부터 출력된 제1 타겟 특징(F_{y_1})에 대한 (n×d_k) 크기를 갖는 값(Value)과 (d_k×n) 크기를 갖는 하나의 최종 어텐션 매트릭스를 곱산하여 (n×n) 크기를 갖는 하나의 타겟 어텐션 매트릭스(Target attention matrix)를 출력할 수 있다.In addition, one target attention matrix output unit 1020a outputs a value (Value) having a size of (n×d _k ) and (d) for the first target feature (F _{y_1} ) output from the first target image (G _y ). One final attention matrix with a size of _k × n) can be multiplied to output a target attention matrix with a size of (n × n).

또한, 하나의 학습부(1110a)는 (n×n) 크기를 갖는 하나의 소스 어텐션 매트릭스(Source attention matrix)와 (n×n) 크기를 갖는 하나의 타겟 어텐션 매트릭스(Target attention matrix) 내 픽셀의 쌍(pair)의 유사도를 연산할 수 있다. 그리고 상기 학습부(1110a)는 대조 학습(Contrastive learning) 기반으로 하고, 손실함수(loss function(x))를 이용하여 하나의 최종 어텐션 매트릭스의 손실값을 줄이는 방향으로 학습할 수 있다.In addition, one learning unit 1110a is a source attention matrix with a size of (n × n) and a target attention matrix with a size of (n × n). The similarity of a pair can be calculated. Additionally, the learning unit 1110a is based on contrastive learning and can learn to reduce the loss value of one final attention matrix using a loss function (x).

도 5의 일실시예를 보면, 또 다른 하나의 쿼리 어텐션 모듈(1000b)은 (n×d_k) 크기를 갖는 쿼리(Query)를 이용하여 (d_k×n) 크기를 갖는 키(key)와 곱산할 수 있다. 그리고 또 다른 하나의 쿼리 어텐션 모듈(1000b)은 소프트맥스(softmax) 함수를 이용하여 앞서 곱산한 값을 0 내지 1 사이의 정규화된 값으로 변환할 수 있다. 이때, 소프트맥스(Softmax) 함수로 정규화된 값을 모두 더하면 1이 된다.Looking at one embodiment of Figure 5, another query attention module 1000b uses a query with a size of (n × d _k ) and a key with a size of (d _k × n) It can be multiplied. And another query attention module 1000b can convert the previously multiplied value into a normalized value between 0 and 1 using the softmax function. At this time, adding up all the normalized values with the Softmax function becomes 1.

즉, 또 다른 하나의 쿼리 어텐션 모듈(1000b)로부터 출력된 (n×n) 크기를 갖는 또 다른 하나의 임시 어텐션 매트릭스의 각 픽셀은 전체 확률에서 특정 클래스에 속할 확률을 갖고, 0 내지 1 사이의 정규화된 값을 갖는다. 또 다른 하나의 임시 어텐션 매트릭스는 각 픽셀이 속한 클래스마다 서로 다른 색으로 표시될 수 있고, 동일한 클래스에서 높은 값을 가질수록 진한 색으로 표시될 수 있고, 동일한 클래스에서 낮은 값을 가질수록 옅은 색으로 표시될 수 있다. 여기서, 클래스는 긍정(Positive) 및 부정(Negative)을 포함할 수 있다.That is, each pixel of another temporary attention matrix with a size (n×n) output from another query attention module 1000b has a probability of belonging to a specific class in the overall probability, between 0 and 1. It has normalized values. Another temporary attention matrix may be displayed in different colors for each class to which each pixel belongs, with higher values in the same class being displayed in darker colors, and lower values in the same class being displayed in lighter colors. can be displayed. Here, the class may include positive and negative.

또한, 또 다른 하나의 쿼리 어텐션 모듈(1000b)은 (n×n) 크기를 갖는 또 다른 하나의 임시 어텐션 매트릭스를 엔트로피를 기반으로 정렬할 수 있고, 선택된 행을 갖는 또 다른 하나의 최종 어텐션 매트릭스를 출력할 수 있다. 예컨대, 선택된 행이 d_k개일 수 있다. 즉, 또 다른 하나의 최종 어텐션 매트릭스는 (d_k×n) 크기를 가질 수 있다.Additionally, another query attention module 1000b can sort another temporary attention matrix with a size of (n×n) based on entropy, and create another final attention matrix with the selected row. Can be printed. For example, there may be d _k selected rows. That is, another final attention matrix may have a size of (d _k × n).

또한, 또 다른 하나의 소스 어텐션 매트릭스 출력부(1010b)는 또 다른 하나의 리파인 특징(Refined_F_{y_2})에 대한 (n×d_k) 크기를 갖는 값(Value)과 (d_k×n) 크기를 갖는 또 다른 하나의 최종 어텐션 매트릭스를 곱산하여 (n×n) 크기를 갖는 또 다른 하나의 소스 어텐션 매트릭스(Source attention matrix)를 출력할 수 있다.In addition, another source attention matrix output unit 1010b has a value having a size of (n×d _k ) and (d _k ×n) for another refined feature (Refined_F _{y_2} ). Another final attention matrix can be multiplied to output another source attention matrix having a size of (n×n).

또한, 또 다른 하나의 타겟 어텐션 매트릭스 출력부(1020b)는 제2 타겟 이미지(G_x)로부터 출력된 제2 타겟 특징(F_{x_2})에 대한 (n×d_k) 크기를 갖는 값(Value)과 (d_k×n) 크기를 갖는 또 다른 하나의 최종 어텐션 매트릭스를 곱산하여 (n×n) 크기를 갖는 또 다른 하나의 타겟 어텐션 매트릭스(Target attention matrix)를 출력할 수 있다.In addition, another target attention matrix output unit 1020b outputs a value having a size of (n×d _k ) for the second target feature (F _{x_2} ) output from the second target image (G _x ) and Another final attention matrix with a size of (d _k × n) can be multiplied to output another target attention matrix with a size of (n × n).

또한, 또 다른 하나의 학습부(1110b)는 또 다른 하나의 소스 어텐션 매트릭스(Source attention matrix)와 또 다른 하나의 타겟 어텐션 매트릭스(Target attention matrix) 내 픽셀의 쌍(pair)의 유사도를 연산할 수 있다. 그리고 상기 학습부(1110b)는 대조 학습(Contrastive learning) 기반으로 하고, 손실함수(loss function(y))를 이용하여 또 다른 하나의 최종 어텐션 매트릭스의 손실값을 줄이는 방향으로 학습할 수 있다.In addition, another learning unit 1110b can calculate the similarity of pixel pairs in another source attention matrix and another target attention matrix. there is. Additionally, the learning unit 1110b is based on contrastive learning and can learn to reduce the loss value of another final attention matrix using a loss function (y).

다시 도 1을 보면, 본 발명은 제1 생성자(100) 및 제1 판별자(200)를 포함하는 생성적 적대 신경망(GAN) 1개와 제2 생성자(300) 및 제2 판별자(400)를 포함하는 생성적 적대 신경망(GAN) 1개가 대칭적으로 배치될 수 있다.Looking again at Figure 1, the present invention includes one generative adversarial network (GAN) including a first generator 100 and a first discriminator 200, a second generator 300, and a second discriminator 400. One generative adversarial network (GAN) including one can be placed symmetrically.

여기서, 상기 제1 생성자(100)는 상기 제1 소스 이미지(I_x)로부터 상기 제2 도메인(y)을 포함하는 제1 타겟 이미지(G_y)를 생성할 수 있다. 상기 제1 생성자(100)는 선단에 인코딩(Encoding) 하는 제1 생성자 인코더(110)와 말단에 디코딩(Decoding) 하는 제1 생성자 디코더(120)를 포함할 수 있다.Here, the first generator 100 may generate a first target image (G _y ) including the second domain (y) from the first source image (I _x ). The first generator 100 may include a first generator encoder 110 for encoding at the front end and a first generator decoder 120 for decoding at the end.

다음으로, 상기 제2 생성자(300)는 상기 제2 소스 이미지(I_y)로부터 상기 제1 도메인(x)을 포함하는 제2 타겟 이미지(G_x)를 생성할 수 있다. 상기 제2 생성자(300)의 선단에 인코딩(Encoding) 하는 제2 생성자 인코더(310)와 말단에 디코딩(Decoding) 하는 제2 생성자 디코더(320)를 포함할 수 있다.Next, the second generator 300 may generate a second target image (G _x ) including the first domain (x) from the second source image (I _y ). It may include a second generator encoder 310 for encoding at the front end of the second generator 300 and a second generator decoder 320 for decoding at the end.

다음으로, 상기 제1 판별자(200)는 상기 제1 소스 이미지(I_x)와 상기 제2 타겟 이미지(G_x) 간의 진위여부를 판별할 수 있다. 즉, 상기 제1 판별자(200)는 상기 제2 생성자(300)로부터 생성된 제2 타겟 이미지(G_x)가 진짜(Real)인지 가짜(Fake)인지 판단하는 것이다. 그리고 상기 제2 생성자(300)는 상기 제1 판별자(200)가 진짜(Real)인지 가짜(Fake)인지 구분하지 못하는 제2 타겟 이미지(G_x)를 생성하는 것을 학습 목표로 한다.Next, the first discriminator 200 can determine authenticity between the first source image (I _x ) and the second target image (G _x ). That is, the first discriminator 200 determines whether the second target image (G _x ) generated from the second generator 300 is real or fake. And the learning goal of the second generator 300 is to generate a second target image (G _x ) that the first discriminator 200 cannot distinguish between real and fake.

다음으로, 상기 제2 판별자(400)는 상기 제2 소스 이미지(I_y)와 상기 제1 타겟 이미지(G_y) 간의 진위여부를 판별할 수 있다. 즉, 상기 제2 판별자(400)는 상기 제1 생성자(100)로부터 생성된 제1 타겟 이미지(G_y)가 진짜(Real)인지 가짜(Fake)인지 판단하는 것이다. 그리고 상기 제1 생성자(100)는 상기 제2 판별자(400)가 진짜(Real)인지 가짜(Fake)인지 구분하지 못하는 제1 타겟 이미지(G_y)를 생성하는 것을 학습 목표로 한다.Next, the second discriminator 400 can determine authenticity between the second source image (I _y ) and the first target image (G _y ). That is, the second discriminator 400 determines whether the first target image (G _y ) generated from the first generator 100 is real or fake. And the first generator 100 has a learning goal of generating a first target image (G _y ) that the second discriminator 400 cannot distinguish between real and fake.

이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 방법Generative adversarial network (GAN)-based image generation method capable of double learning

우선, 본 발명의 일실시예에 따르면, 본 발명의 이중 학습이 가능한 생성적 적대 신경망(GAN) 기반 이미지 생성 방법은 제1 생성자(100)에 의하여, 상기 제1 소스 이미지(I_x)로부터 제2 도메인(y)을 포함하는 제1 타겟 이미지(G_y)가 생성되는 제1 타겟 이미지 생성단계(S100)와, 제2 생성자(300)에 의하여, 상기 제2 소스 이미지(I_y)로부터 제1 도메인(x)을 포함하는 제2 타겟 이미지(G_x)가 생성되는 제2 타겟 이미지 생성단계(S200)를 포함할 수 있다.First, according to an embodiment of the present invention, the image generation method based on a generative adversarial network (GAN) capable of dual learning of the present invention generates a first image from the first source image (I _x ) by the first generator 100. A first target image generation step (S100) in which a first target image (G _y ) including two domains (y) is generated, and a first image is generated from the second source image (I _y ) by the second generator 300. It may include a second target image generation step (S200) in which a second target image (G _x ) including one domain (x) is generated.

또한, 본 발명의 일실시예에 따르면 본 발명은 제1 판별자(200)에 의하여, 상기 제1 소스 이미지(I_x)와 상기 제2 타겟 이미지(G_x) 간의 진위여부가 판별되는 제1 판별단계(S1400)와, 제2 판별자(400)에 의하여, 상기 제2 소스 이미지(I_y)와 상기 제1 타겟 이미지(G_y) 간의 진위여부가 판별되는 제2 판별단계(S1500)를 더 포함할 수 있다.In addition, according to one embodiment of the present invention, the first discriminator 200 determines whether the authenticity of the first source image (I _x ) and the second target image (G _x ) is determined. A determination step (S1400) and a second determination step (S1500) in which authenticity between the second source image (I _y ) and the first target image (G _y ) is determined by the second discriminator 400. More may be included.

즉, 상기 제1 판별단계(S1400)는 상기 제2 생성자(300)로부터 생성된 제2 타겟 이미지(G_x)가 진짜(Real)인지 가짜(Fake)인지 판단되는 것이다. 그리고 본 발명은 상기 제1 판별단계(S1400)로부터 상기 제1 판별자(200)가 진짜(Real)인지 가짜(Fake)인지 구분하지 못하는 제2 타겟 이미지(G_x)가 생성되도록 하는 것을 학습 목표로 한다.That is, the first determination step (S1400) determines whether the second target image (G _x ) generated from the second generator 300 is real or fake. And the learning goal of the present invention is to generate a second target image ( _G Do this.

그리고 상기 제2 판별단계(S1500)는 상기 제1 생성자(100)로부터 생성된 제1 타겟 이미지(G_y)가 진짜(Real)인지 가짜(Fake)인지 판단되는 것이다. 그리고 본 발명은 상기 제2 판별단계(S1500)로부터 상기 제2 판별자(400)가 진짜(Real)인지 가짜(Fake)인지 구분하지 못하는 제1 타겟 이미지(G_y)가 생성되도록 하는 것을 학습 목표로 한다.And in the second determination step (S1500), it is determined whether the first target image (G _y ) generated from the first generator 100 is real or fake. And the learning goal of the present invention is to generate a first target image (G _y ) that cannot distinguish whether the second discriminator 400 is real or fake from the second determination step (S1500). Do this.

(1) 제1 소스 이미지(I(1) The first source image (I _xx )와 제1 타겟 이미지(G) and the first target image (G _yy )에 대한 학습) learning about

도 6의 일실시예를 보면, 상기 제1 소스 이미지(I_x)와 상기 제1 타겟 이미지(G_y)에 대해서, 본 발명은 제1 소스 인코더(500)에 의하여, 제1 소스 이미지(I_x)의 제1 도메인(x)으로부터 제1 소스 특징(F_{x_1})이 추출되는 제1 소스 특징 추출단계(S300)와, 제1 타겟 인코더(600)에 의하여, 제1 타겟 이미지(G_y)의 제2 도메인(y)으로부터 제1 타겟 특징(F_{y_1})이 추출되는 제1 타겟 특징 추출단계(S400)를 포함한다.Looking at the embodiment of FIG. 6, with respect to the first source image (I _x ) and the first target image (G _y ), the present invention encodes the first source image (I A first source feature extraction step ( _S300 ) in which the first source feature (F _{x_1} ) is extracted from the first domain ( _x ) of It includes a first target feature extraction step (S400) in which the first target feature (F _{y_1} ) is extracted from the second domain (y) of.

다음으로, 본 발명은 컨볼루셔널 블록 어텐션 모듈(900a, 900b)에 의하여, 각 소스 인코더(500, 700)로부터 출력된 각 소스 특징(F_{x_1}, F_{y_2})으로부터 각 소스 이미지(I_x, I_y)에서 어디에 초점을 맞춰야 하는지를 나타내는 리파인 특징(Refined F_{x_1}, Refined F_{y_2})이 각각 추출되는 리파인 특징 추출단계(S700a, S700b)를 더 포함한다.Next, the present invention converts each source image (I _x , _I ) from each source feature ₍ F It further includes a refine feature _extraction step (S700a, S700b) in which refine features (Refined F _{x_1} , Refined F _{y_2} ) indicating where to focus in y) are extracted, respectively.

일실시예에 따른 리파인 특징 추출단계(S700a)는 제1 소스 특징(F_{x_1})으로부터 제1 소스 이미지(I_x)에서 어디에 초점을 맞춰야 하는지를 나타내는 하나의 리파인 특징(Refined F_{x_1})이 추출될 수 있다. 여기서, 상기 제1 소스 특징(F_{x_1})은 높이(H)*폭(W)*채널(C) 크기를 가질 수 있다.In the refine feature extraction step (S700a) according to one embodiment, one refined feature (Refined F _{x_1} ) indicating where to focus in the first source image (I _x ) can be extracted from the first source feature (F _{x_1} ). there is. Here, the first source feature (F _{x_1} ) may have a size of height (H) * width (W) * channel (C).

일실시예에 따른 리파인 특징 추출단계(S700a)는 하나의 컨볼루셔널 블록 어텐션 모듈(900a) 내 채널 어텐션 모듈(910a)에 의하여, 상기 제1 소스 특징(F_{x_1})의 채널 간의 관계를 살펴 어떤 채널에 더 초점을 맞춰야 하는지 인코딩되는 채널 어텐션 단계(S710a)를 포함하는 것을 특징으로 한다. 이는, 상기 [수학식 1]과 같다.The refine feature extraction step (S700a) according to one embodiment examines the relationship between channels of the first source feature (F _{x_1} ) by the channel attention module 910a in one convolutional block attention module 900a. It is characterized by including a channel attention step (S710a) in which it is encoded whether more focus should be placed on the channel. This is the same as [Equation 1] above.

우선 일실시예에 따른 채널 어텐션 단계(S710a)는 최대 풀링(Max pooling)을 통해서 제1 소스 특징(F_{x_1})에서 채널별 최대값만이 벡터로 출력될 수 있다. 다음으로 일실시예에 따른 채널 어텐션 단계(S710a)는 최대 풀링(Max pooling)된 벡터들이 2개의 층(layer)을 갖는 다중 퍼셉트론(Multi-Layer Perceptron; MLP)을 통해서 차원이 축소될 수 있다. 일실시예에 따른 채널 어텐션 단계(S710a)는 차원이 축소된 벡터들이 시그모이드 함수(Sigmoid function)가 이용되어 확률값으로 변환될 수 있다. 그리고 일실시예에 따른 채널 어텐션 단계(S710a)는 채널별 확률값 중에서 가장 큰 확률값을 갖는 채널에 초점을 맞춰 상기 제1 소스 특징(F_{x_1})이 인코딩될 수 있다.First, in the channel attention step (S710a) according to one embodiment, only the maximum value for each channel in the first source feature (F _{x_1} ) can be output as a vector through max pooling. Next, in the channel attention step (S710a) according to one embodiment, the dimension of the maximum pooled vectors can be reduced through a multi-layer perceptron (MLP) having two layers. In the channel attention step (S710a) according to one embodiment, vectors with reduced dimensions may be converted into probability values using a sigmoid function. And in the channel attention step (S710a) according to one embodiment, the first source feature (F _{x_1} ) may be encoded by focusing on the channel with the largest probability value among the probability values for each channel.

이에 따라, 일실시예에 따른 채널 어텐션 단계(S710a)는 상기 제1 소스 특징(F_{x_1})과 하나의 채널 어텐션 모듈(910a)로부터 채널이 인코딩된 제1 소스 특징이 곱산되어 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})이 출력될 수 있다.Accordingly, in the channel attention step (S710a) according to one embodiment, the first source feature ( _F Source features (Channel_refined F _{x_1} ) may be output.

또한, 일실시예에 따른 리파인 특징 추출단계(S700a)는 하나의 컨볼루셔 블록 어텐션 모듈(900a) 내 공간 어텐션 모듈(920a)에 의하여. 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})의 H*W*1 크기에서 다수 개의 픽셀 중 위치관계를 살펴 어떤 위치에 더 초점을 맞춰야 하는지 인코딩되는 공간 어텐션 단계(S720a)를 더 포함하는 것을 특징으로 한다. 이는 상기 [수학식 2]와 같다.In addition, the refine feature extraction step (S700a) according to one embodiment is performed by the spatial attention module (920a) within one convolutional block attention module (900a). It further includes a spatial attention step (S720a) in which positional relationships among a plurality of pixels are examined in the H*W*1 size of the channel-refined first source feature (Channel_refined F _{x_1} ) to encode which position to focus on more. Do it as This is the same as [Equation 2] above.

우선 일실시예에 따른 공간 어텐션 단계(S720a)는 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})이 입력될 수 있다. 일실시예에 따른 공간 어텐션 단계(S720a)는 H*W*1 크기를 대상으로 하는 최대 풀링(Max pooling)을 통해서 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})에서 최대값만이 벡터로 출력될 수 있다. 그리고 일실시예에 따른 공간 어텐션 단계(S720a)는 H*W*1 크기를 갖는 두 채널이 연결되어 H*W*2 크기의 특징맵이 생성될 수 있다. 일실시예에 따른 공간 어텐션 단계(S720a)는 특징맵이 2채널이므로 7*7 콘볼루션 연산이 수행될 수 있다. 일실시예에 따른 공간 어텐션 단계(S720a)는 시그모이드 함수(Sigmoid function)가 이용되어 7*7 콘볼루션 연산된 특징맵이 확률값으로 변환될 수 있다. 그리고 일실시예에 따른 공간 어텐션 단계(S720a)는 확률값 중에서 가장 큰 확률값을 갖는 위치에 초점을 맞춰 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})이 인코딩될 수 있다.First, in the spatial attention step (S720a) according to one embodiment, the channel refined first source feature (Channel_refined F _{x_1} ) may be input. The spatial attention step (S720a) according to one embodiment outputs only the maximum value from the channel-refined first source feature (Channel_refined F _{x_1} ) as a vector through max pooling targeting the size of H*W*1. It can be. And in the spatial attention step (S720a) according to one embodiment, two channels with sizes of H*W*1 are connected to create a feature map of size H*W*2. In the spatial attention step (S720a) according to one embodiment, since the feature map is 2 channels, a 7*7 convolution operation can be performed. In the spatial attention step (S720a) according to one embodiment, a sigmoid function is used so that the 7*7 convolution calculated feature map can be converted into a probability value. And in the spatial attention step (S720a) according to one embodiment, the channel-refined first source feature (Channel_refined F _{x_1} ) may be encoded by focusing on the position with the largest probability value among the probability values.

이에 따라, 일실시예에 따른 공간 어텐션 단계(S720a)는 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})과 하나의 공간 어텐션 모듈(920a)로부터 위치가 인코딩된 채널 정제된 제1 소스 특징(Channel_refined F_{x_1})이 곱산되어 하나의 리파인 특징(Refined F_{x_1})이 출력될 수 있다.Accordingly, the spatial attention step (S720a) according to one embodiment includes a channel-refined first source feature (Channel_refined F _{x_1} ) and a channel-refined first source feature (Channel_refined F _{x_1} ) may be multiplied to output one refined feature (Refined F _{x_1} ).

다음으로, 본 발명은 쿼리 어텐션 모듈(1000a, 1000b)에 의하여, 주어진 쿼리(Query)가 이용되어 각 리파인 특징(Refined F_{x_1}, F_{y_2})에 대한 임시 어텐션 매트릭스가 각각 생성되는 임시 어텐션 매트릭스 생성단계(S800a, S800b) 및 상기 쿼리 어텐션 모듈(1000a, 1000b)에 의하여, 엔트로피를 기반으로 각 임시 어텐션 매트릭스가 정렬된 후 선택된 행을 갖는 최종 어텐션 매트릭스가 각각 출력되는 최종 어텐션 매트릭스 출력단계(S900a, S900b)를 더 포함한다.Next, the present invention includes a temporary attention matrix generation step in which a given query is used by the query attention modules 1000a and 1000b to generate temporary attention matrices for each refined feature (Refined F _{x_1} , F _{y_2} ). (S800a, S800b) and the query attention module (1000a, 1000b), each temporary attention matrix is sorted based on entropy, and then the final attention matrix with the selected row is output, respectively (S900a, S900b). ) further includes.

일실시예에 따른 임시 어텐션 매트릭스 생성단계(S800a)는 (n×d_k) 크기를 갖는 쿼리(Query)가 이용되어 (d_k×n) 크기를 갖는 키(key)와 곱산될 수 있다. 그리고 일실시예에 따른 임시 어텐션 매트릭스 생성단계(S800a)는 소프트맥스(softmax) 함수가 이용되어 앞서 곱산된 값이 0 내지 1 사이의 정규화된 값으로 변환될 수 있다. 이때, 소프트맥스(Softmax) 함수로 정규화된 값을 모두 더하면 1이 된다.In the temporary attention matrix generation step (S800a) according to one embodiment, a query having a size of (n×d _k ) may be used and multiplied by a key having a size of (d _k ×n). And in the temporary attention matrix generation step (S800a) according to one embodiment, a softmax function may be used to convert the previously multiplied value into a normalized value between 0 and 1. At this time, adding up all the normalized values with the Softmax function becomes 1.

또한, 일실시예에 따른 최종 어텐션 매트릭스 생성단계(S900a)는 (n×n) 크기를 갖는 하나의 임시 어텐션 매트릭스가 엔트로피를 기반으로 정렬될 수 있고, 선택된 행을 갖는 하나의 최종 어텐션 매트릭스가 출력될 수 있다. 예컨대, 선택된 행이 d_k개일 수 있다. 즉, 하나의 최종 어텐션 매트릭스는 (d_k×n) 크기를 가질 수 있다.Additionally, in the final attention matrix generation step (S900a) according to one embodiment, one temporary attention matrix with a size of (n×n) can be sorted based on entropy, and one final attention matrix with the selected row is output. It can be. For example, there may be d _k selected rows. That is, one final attention matrix may have a size of (d _k × n).

다음으로, 본 발명은 상기 쿼리 어텐션 모듈(1000a, 1000b)에 의하여, 각 리파인 특징(Refined F_{x_1}, Refined F_{y_2})의 값(Value)과 각 최종 어텐션 매트릭스가 연산되어 소스 어텐션 매트릭스가 각각 출력되는 소스 어텐션 매트릭스 출력단계(S1000a, 1000b) 및 상기 쿼리 어텐션 모듈(1000a, 1000b)에 의하여, 각 타겟 특징(F_{y_1}, F_{x_2})의 값(Value)과 각 최종 어텐션 매트릭스가 연산되어 타겟 어텐션 매트릭스가 각각 출력되는 타겟 어텐션 매트릭스 출력단계(S1100a, S1100b)를 더 포함할 수 있다.Next, the present invention calculates the value of each refined _feature ₍ Refined F By the source attention matrix output step (S1000a, 1000b) and the query attention module (1000a, 1000b), the value (Value) of each target feature (F _{y_1} , F _{x_2} ) and each final attention matrix are calculated to create a target attention matrix. It may further include target attention matrix output steps (S1100a, S1100b), respectively.

일실시예에 따른 소스 어텐션 매트릭스 출력단계(S1000a)는 하나의 소스 어텐션 매트릭스 출력부(1010a)에 의하여, 하나의 리파인 특징(Refined_F_{x_1})에 대한 (n×d_k) 크기를 갖는 값(Value)과 (d_k×n) 크기를 갖는 최종 어텐션 매트릭스이 곱산되어 (n×n) 크기를 갖는 하나의 소스 어텐션 매트릭스(Source attention matrix)가 출력될 수 있다.In the source attention matrix output step (S1000a) according to one embodiment, a value (Value) having a size of (n×d _k ) for one refined feature (Refined_F _{x_1} ) is generated by one source attention matrix output unit (1010a). and the final attention matrix having a size of (d _k × n) may be multiplied to output one source attention matrix having a size of (n × n).

또한, 일실시예에 따른 타겟 어텐션 매트릭스 출력단계(S1100a)는 하나의 타겟 어텐션 매트릭스 출력부(1020a)에 의하여, 제1 타겟 이미지(G_y)로부터 출력된 제1 타겟 특징(F_{y_1})에 대한 (n×d_k) 크기를 갖는 값(Value)과 (d_k×n) 크기를 갖는 하나의 최종 어텐션 매트릭스가 곱산되어 (n×n) 크기를 갖는 하나의 타겟 어텐션 매트릭스(Target attention matrix)가 출력될 수 있다.In addition, the target attention matrix output step (S1100a) according to one embodiment is the first target feature (F _{y_1} ) output from the first target image (G _y ) by one target attention matrix output unit (1020a). A value with size (n×d _k ) is multiplied by a final attention matrix with size (d _k ×n), resulting in a target attention matrix with size (n×n). can be printed.

다음으로, 본 발명은 학습부(1100a, 1100b)에 의하여, 각 소스 어텐션 매트릭스와 각 타겟 어텐션 매트릭스 내 쌍(pair)의 유사도가 연산되는 유사도 연산단계(S1200a, 1200b) 및 상기 학습부(1100a, 1100b)에 의하여, 손실함수가 이용되어 긍정적인 쌍(Positive pair)의 유사도가 작아지도록 학습하고, 부정적인 쌍(Negative pair)의 유사도가 높아지도록 학습하는 학습단계(S1300a, S1300b)를 더 포함할 수 있다.Next, the present invention includes a similarity calculation step (S1200a, 1200b) in which the similarity of pairs within each source attention matrix and each target attention matrix is calculated by the learning units (1100a, 1100b), and the learning unit (1100a, By 1100b), a learning step (S1300a, S1300b) may be further included in which a loss function is used to learn to reduce the similarity of positive pairs and to learn to increase the similarity of negative pairs. there is.

일실시예에 따른 유사도 연산단계(S1200a)는 하나의 소스 어텐션 매트릭스와 하나의 타겟 어텐션 매트릭스 내 쌍(pair)의 유사도가 연산될 수 있다. 그리고 일실시예에 따른 학습단계(S1300a)는 손실함수가 이용되어 긍정적인 쌍(Positive pair)의 유사도가 작아지도록 학습하고, 부정적인 쌍(Negative pair)의 유사도가 높아지도록 학습할 수 있다.In the similarity calculation step (S1200a) according to one embodiment, the similarity of a pair within one source attention matrix and one target attention matrix may be calculated. And in the learning step (S1300a) according to one embodiment, a loss function can be used to learn to decrease the similarity of positive pairs and to increase the similarity of negative pairs.

(2) 제2 소스 이미지(I(2) second source image (I _yy )와 제2 타겟 이미지(G) and the second target image (G _xx )에 대한 학습) learning about

도 7의 일실시예를 보면, 상기 제2 소스 이미지(I_y)와 상기 제2 타겟 이미지(G_x)에 대해서, 본 발명은 제2 소스 인코더(700)에 의하여, 제2 소스 이미지(I_y)의 제2 도메인(y)으로부터 제2 소스 특징(F_{y_2})이 추출되는 제2 소스 특징 추출단계(S500), 제2 타겟 인코더(800)에 의하여, 제2 타겟 이미지(G_x)의 제1 도메인(x)으로부터 제2 타겟 특징(F_{x_2})이 추출되는 제2 타겟 특징 추출단계(S600)을 포함한다.Looking at the embodiment of FIG. 7, with respect to the second source image (I _y ) and the second target image (G _x ), the present invention provides a second source image (I The second source feature extraction step ( _S500 ) in which the second source feature (F _{y_2} ) is extracted from the second domain (y) of the second target image (G _x ) by the second target encoder 800. It includes a second target feature extraction step (S600) in which the second target feature (F _{x_2} ) is extracted from the first domain (x).

또 다른 일실시예에 따른 리파인 특징 추출단계(S700b)는 제2 소스 특징(F_{y_2})으로부터 제2 소스 이미지(I_x)에서 어디에 초점을 맞춰야 하는지를 나타내는 또 다른 하나의 리파인 특징(Refined F_{y_2})이 추출될 수 있다. 여기서, 상기 제2 소스 특징(F_{y_2})은 높이(H)*폭(W)*채널(C) 크기를 가질 수 있다.The refine feature extraction step (S700b) according to another embodiment is a refinement feature (Refined F _{y_2} ) that indicates where to focus in the second source image (I _x ) from the second source feature (F _{y_2} ). This can be extracted. Here, the second source feature (F _{y_2} ) may have a size of height (H) * width (W) * channel (C).

또 다른 일실시예에 따른 리파인 특징 추출단계(S700b)는 또 다른 하나의 컨볼루셔널 블록 어텐션 모듈(900b) 내 채널 어텐션 모듈(910b)에 의하여, 상기 제2 소스 특징(F_{y_2})의 채널 간의 관계를 살펴 어떤 채널에 더 초점을 맞춰야 하는지 인코딩되는 채널 어텐션 단계(S710b)를 포함하는 것을 특징으로 한다. 이는, 상기 [수학식 1]과 같다.The refine feature extraction step (S700b) according to another embodiment is performed by the channel attention module (910b) in another convolutional block attention module (900b), between channels of the second source feature (F _{y_2} ). It is characterized by including a channel attention step (S710b) in which which channel to focus more on is encoded by examining the relationship. This is the same as [Equation 1] above.

우선 또 다른 일실시예에 따른 채널 어텐션 단계(S710b)는 최대 풀링(Max pooling)을 통해서 제2 소스 특징(F_{y_2})에서 채널별 최대값만이 벡터로 출력될 수 있다. 다음으로 또 다른 일실시예에 따른 채널 어텐션 단계(S710b)는 최대 풀링(Max pooling)된 벡터들이 2개의 층(layer)을 갖는 다중 퍼셉트론(Multi-Layer Perceptron; MLP)을 통해서 차원이 축소될 수 있다. 또 다른 일실시예에 따른 채널 어텐션 단계(S710b)는 차원이 축소된 벡터들이 시그모이드 함수(Sigmoid function)가 이용되어 확률값으로 변환될 수 있다. 그리고 또 다른 일실시예에 따른 채널 어텐션 단계(S710b)는 채널별 확률값 중에서 가장 큰 확률값을 갖는 채널에 초점을 맞춰 상기 제2 소스 특징(F_{y_2})이 인코딩될 수 있다.First, in the channel attention step (S710b) according to another embodiment, only the maximum value for each channel in the second source feature (F _{y_2} ) can be output as a vector through max pooling. Next, in the channel attention step (S710b) according to another embodiment, the dimension of the maximum pooled vectors can be reduced through a multi-layer perceptron (MLP) with two layers. there is. In the channel attention step (S710b) according to another embodiment, vectors with reduced dimensions can be converted into probability values using a sigmoid function. And in the channel attention step (S710b) according to another embodiment, the second source feature (F _{y_2} ) may be encoded by focusing on the channel with the largest probability value among the probability values for each channel.

이에 따라, 또 다른 일실시예에 따른 채널 어텐션 단계(S710b)는 상기 제2 소스 특징(F_{y_2})과 또 다른 하나의 채널 어텐션 모듈(910b)로부터 채널이 인코딩된 제2 소스 특징이 곱산되어 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})이 출력될 수 있다.Accordingly, in the channel attention step (S710b) according to another embodiment, the second source feature (F _{y_2} ) and the second source feature in which the channel is encoded from another channel attention module 910b are multiplied to obtain a channel The refined second source feature (Channel_refined F _{y_2} ) may be output.

또한, 또 다른 일실시예 따른 리파인 특징 추출단계(S700b)는 하나의 컨볼루셔 블록 어텐션 모듈(900b) 내 공간 어텐션 모듈(920b)에 의하여. 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})의 H*W*1 크기에서 다수 개의 픽셀 중 위치관계를 살펴 어떤 위치에 더 초점을 맞춰야 하는지 인코딩되는 공간 어텐션 단계(S720b)를 더 포함하는 것을 특징으로 한다. 이는 상기 [수학식 2]와 같다.In addition, the refine feature extraction step (S700b) according to another embodiment is performed by the spatial attention module (920b) within one convolutional block attention module (900b). It further includes a spatial attention step (S720b) in which positional relationships among a plurality of pixels are examined in the H*W*1 size of the channel-refined second source feature (Channel_refined F _{y_2} ) to encode which position to focus more on. Do it as This is the same as [Equation 2] above.

우선 또 다른 일실시예에 따른 공간 어텐션 단계(S720b)는 채널 정제된 제1 소스 특징(Channel_refined F_{y_2})이 입력될 수 있다. 또 다른 일실시예에 따른 공간 어텐션 단계(S720b)는 H*W*1 크기를 대상으로 하는 최대 풀링(Max pooling)을 통해서 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})에서 최대값만이 벡터로 출력될 수 있다. 그리고 또 다른 일실시예에 따른 공간 어텐션 단계(S720b)는 H*W*1 크기를 갖는 두 채널이 연결되어 H*W*2 크기의 특징맵이 생성될 수 있다. 또 다른 일실시예에 따른 공간 어텐션 단계(S720b)는 특징맵이 2채널이므로 7*7 콘볼루션 연산이 수행될 수 있다. 또 다른 일실시예에 따른 공간 어텐션 단계(S720b)는 시그모이드 함수(Sigmoid function)가 이용되어 7*7 콘볼루션 연산된 특징맵이 확률값으로 변환될 수 있다. 그리고 또 다른 일실시예에 따른 공간 어텐션 단계(S720b)는 확률값 중에서 가장 큰 확률값을 갖는 위치에 초점을 맞춰 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})이 인코딩될 수 있다.First, the channel-refined first source feature (Channel_refined F _{y_2} ) may be input to the spatial attention step (S720b) according to another embodiment. In the spatial attention step (S720b) according to another embodiment, only the maximum value in the channel-refined second source feature (Channel_refined F _{y_2} ) is vector through max pooling targeting the size of H*W*1. It can be output as . And in the spatial attention step (S720b) according to another embodiment, two channels with sizes of H*W*1 are connected to generate a feature map of size H*W*2. In the spatial attention step (S720b) according to another embodiment, a 7*7 convolution operation can be performed because the feature map is 2 channels. In the spatial attention step (S720b) according to another embodiment, a sigmoid function can be used to convert the 7*7 convolution calculated feature map into a probability value. And in the spatial attention step (S720b) according to another embodiment, the channel-refined second source feature (Channel_refined F _{y_2} ) may be encoded by focusing on the position with the largest probability value among the probability values.

이에 따라, 또 다른 일실시예에 따른 공간 어텐션 단계(S720b)는 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})과 또 다른 하나의 공간 어텐션 모듈(920b)로부터 위치가 인코딩된 채널 정제된 제2 소스 특징(Channel_refined F_{y_2})이 곱산되어 또 다른 하나의 리파인 특징(Refined F_{y_2})이 출력될 수 있다.Accordingly, the spatial attention step (S720b) according to another embodiment is a channel-refined second source feature (Channel_refined F _{y_2} ) and a second channel-refined source feature whose position is encoded from another spatial attention module 920b. The source feature (Channel_refined F _{y_2} ) may be multiplied to output another refined feature (Refined F _{y_2} ).

다음으로, 또 다른 일실시예에 따른 임시 어텐션 매트릭스 생성단계(S800b)는 (n×d_k) 크기를 갖는 쿼리(Query)가 이용되어 (d_k×n) 크기를 갖는 키(key)와 곱산될 수 있다. 그리고 또 다른 일실시예에 따른 임시 어텐션 매트릭스 생성단계(S800b)는 소프트맥스(softmax) 함수가 이용되어 앞서 곱산된 값이 0 내지 1 사이의 정규화된 값으로 변환될 수 있다. 이때, 소프트맥스(Softmax) 함수로 정규화된 값을 모두 더하면 1이 된다.Next, in the temporary attention matrix generation step (S800b) according to another embodiment, a query with a size of (n × d _k ) is used and multiplied by a key with a size of (d _k × n). It can be. And in the temporary attention matrix generation step (S800b) according to another embodiment, a softmax function may be used to convert the previously multiplied value into a normalized value between 0 and 1. At this time, adding up all the normalized values with the Softmax function becomes 1.

즉, 하나의 쿼리 어텐션 모듈(1000b)로부터 출력된 (n×n) 크기를 갖는 또 다른 하나의 임시 어텐션 매트릭스의 각 픽셀은 전체 확률에서 특정 클래스에 속할 확률을 갖고, 0 내지 1 사이의 정규화된 값을 갖는다. 또 다른 하나의 임시 어텐션 매트릭스는 각 픽셀이 속한 클래스마다 서로 다른 색으로 표시될 수 있고, 동일한 클래스에서 높은 값을 가질수록 진한 색으로 표시될 수 있고, 동일한 클래스에서 낮은 값을 가질수록 옅은 색으로 표시될 수 있다. 여기서, 클래스는 긍정(Positive) 및 부정(Negative)을 포함할 수 있다.That is, each pixel of another temporary attention matrix with size (n×n) output from one query attention module 1000b has a probability of belonging to a specific class in the overall probability, and a normalized probability between 0 and 1. It has value. Another temporary attention matrix may be displayed in different colors for each class to which each pixel belongs, with higher values in the same class being displayed in darker colors, and lower values in the same class being displayed in lighter colors. can be displayed. Here, the class may include positive and negative.

다음으로, 또 다른 일실시예에 따른 최종 어텐션 매트릭스 생성단계(S900b)는 (n×n) 크기를 갖는 또 다른 하나의 임시 어텐션 매트릭스가 엔트로피를 기반으로 정렬될 수 있고, 선택된 행을 갖는 또 다른 하나의 최종 어텐션 매트릭스가 출력될 수 있다. 예컨대, 선택된 행이 d_k개일 수 있다. 즉, 또 다른 하나의 최종 어텐션 매트릭스는 (d_k×n) 크기를 가질 수 있다.Next, in the final attention matrix generation step (S900b) according to another embodiment, another temporary attention matrix with a size of (n×n) can be sorted based on entropy, and another temporary attention matrix with the selected row is generated. One final attention matrix may be output. For example, there may be d _k selected rows. That is, another final attention matrix may have a size of (d _k × n).

다음으로, 또 다른 일실시예에 따른 소스 어텐션 매트릭스 출력단계(S1000b)는 또 다른 하나의 소스 어텐션 매트릭스 출력부(1010b)에 의하여, 또 다른 하나의 리파인 특징(Refined_F_{y_2})에 대한 (n×d_k) 크기를 갖는 값(Value)과 (d_k×n) 크기를 갖는 최종 어텐션 매트릭스이 곱산되어 (n×n) 크기를 갖는 또 다른 하나의 소스 어텐션 매트릭스(Source attention matrix)가 출력될 수 있다.Next, the source attention matrix output step (S1000b) according to another embodiment is (n×d) for another refined feature (Refined_F _{y_2} ) by another source attention matrix output unit (1010b). A value having a size of _k ) and a final attention matrix having a size of (d _k × n) may be multiplied to output another source attention matrix having a size of (n × n).

다음으로, 또 다른 일실시예에 따른 타겟 어텐션 매트릭스 출력단계(S1100b)는 또 다른 하나의 타겟 어텐션 매트릭스 출력부(1020b)에 의하여, 제2 타겟 이미지(G_x)로부터 출력된 제2 타겟 특징(F_{x_2})에 대한 (n×d_k) 크기를 갖는 값(Value)과 (d_k×n) 크기를 갖는 또 다른 하나의 최종 어텐션 매트릭스가 곱산되어 (n×n) 크기를 갖는 또 다른 하나의 타겟 어텐션 매트릭스(Target attention matrix)가 출력될 수 있다.Next, the target attention matrix output step (S1100b) according to another embodiment includes the second target feature (G x ) output from the second target image (G _x ) by another target attention matrix output unit (1020b). A value with size (n×d _k ) for F _{x_2} ) and another final attention matrix with size (d _k A target attention matrix may be output.

다음으로, 또 다른 일실시예에 따른 유사도 연산단계(S1200b)는 또 다른 하나의 소스 어텐션 매트릭스와 또 다른 하나의 타겟 어텐션 매트릭스 내 쌍(pair)의 유사도가 연산될 수 있다. 그리고 또 다른 일실시예에 따른 학습단계(S1300b)는 손실함수가 이용되어 긍정적인 쌍(Positive pair)의 유사도가 작아지도록 학습하고, 부정적인 쌍(Negative pair)의 유사도가 높아지도록 학습할 수 있다.Next, in the similarity calculation step (S1200b) according to another embodiment, the similarity of a pair within another source attention matrix and another target attention matrix may be calculated. And in the learning step (S1300b) according to another embodiment, a loss function can be used to learn to decrease the similarity of positive pairs and to increase the similarity of negative pairs.

실시예들은 하드웨어, 소프트웨어, 펌웨어, 미들웨어, 마이크로코드, 하드웨어 기술 언어, 또는 이들의 임의의 조합에 의해 구현될 수 있다. 소프트웨어, 펌웨어, 미들웨어 또는 마이크로코드로 구현되는 경우, 필요한 작업을 수행하는 프로그램 코드 또는 코드 세그먼트들은 컴퓨터 판독 가능 저장 매체에 저장되고 하나 이상의 프로세서에 의해 실행될 수 있다.Embodiments may be implemented by hardware, software, firmware, middleware, microcode, hardware description language, or any combination thereof. When implemented as software, firmware, middleware, or microcode, program code or code segments that perform necessary tasks may be stored in a computer-readable storage medium and executed by one or more processors.

그리고 본 명세서에 설명된 주제의 양태들은 컴퓨터에 의해 실행되는 프로그램 모듈 또는 컴포넌트와 같은 컴퓨터 실행 가능 명령어들의 일반적인 맥락에서 설명될 수 있다. 일반적으로, 프로그램 모듈 또는 컴포넌트들은 특정 작업을 수행하거나 특정 데이터 형식을 구현하는 루틴, 프로그램, 객체, 데이터 구조를 포함한다. 본 명세서에 설명된 주제의 양태들은 통신 네트워크를 통해 링크되는 원격 처리 디바이스들에 의해 작업들이 수행되는 분산 컴퓨팅 환경들에서 실시될 수도 있다. 분산 컴퓨팅 환경에서, 프로그램 모듈들은 메모리 저장 디바이스들을 포함하는 로컬 및 원격 컴퓨터 저장 매체에 둘 다에 위치할 수 있다.And aspects of the subject matter described herein may be described in the general context of computer-executable instructions, such as program modules or components that are executed by a computer. Typically, program modules or components include routines, programs, objects, and data structures that perform specific tasks or implement specific data types. Aspects of the subject matter described herein may be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media, including memory storage devices.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 으로 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, various modifications and variations can be made by those skilled in the art from the above description. For example, the described techniques are performed in an order different from the described method, and/or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or in a different configuration. Appropriate results may be achieved through substitution or substitution by elements or equivalents.

그러므로 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims also fall within the scope of the claims described below.

100.. 제1 생성자
110.. 제1 생성자 인코더
120.. 제1 생성자 디코더
200.. 제1 판별자
300.. 제2 생성자
310.. 제2 생성자 인코더
320.. 제2 생성자 디코더
400.. 제2 판별자
500.. 제1 소스 인코더
600.. 제1 타겟 인코더
700.. 제2 소스 인코더
800.. 제2 타겟 인코더
900a, 900b.. 컨볼루셔널 블록 어텐션 모듈
910a, 910b.. 채널 어텐션 모듈
920a, 920b.. 공간 어텐션 모듈
1000a, 1000b.. 쿼리 어텐션 모듈
1010a, 1010b.. 소스 어텐션 매트릭스 출력부
1020a, 1020b.. 타겟 어텐션 매트릭스 출력부
1100a, 1100b.. 학습부
S100.. 제1 타겟 이미지 생성단계
S200.. 제2 타겟 이미지 생성단계
S300.. 제1 소스 특징 추출단계
S400.. 제1 타겟 특징 추출단계
S500.. 제2 소스 특징 추출단계
S600.. 제2 타겟 특징 추출단계
S700a, S700b.. 리파인 특징 추출단계
S710a, S710b.. 채널 어텐션 단계
S720a, S720b.. 공간 어텐션 단계
S800a, S800b.. 임시 어텐션 매트릭스 생성단계
S900a, S900b.. 최종 어텐션 매트릭스 출력단계
S1000a, S1000b.. 소스 어텐션 매트릭스 출력단계
S1100a, S1100b.. 타겟 어텐션 매트릭스 출력단계
S1200a, S1200b.. 유사도 연산단계
S1300a, S1300b.. 학습단계
S1400.. 제1 판별단계
S1500.. 제2 판별단계
100.. First constructor
110.. First generator encoder
120.. First generator decoder
200.. First discriminator
300.. Second constructor
310.. Second generator encoder
320.. Second generator decoder
400.. Second discriminator
500.. 1st source encoder
600.. First target encoder
700.. Second source encoder
800.. Second target encoder
900a, 900b.. Convolutional block attention module
910a, 910b.. Channel attention module
920a, 920b.. Spatial attention module
1000a, 1000b.. Query attention module
1010a, 1010b.. Source attention matrix output unit
1020a, 1020b.. Target attention matrix output unit
1100a, 1100b.. Learning Department
S100.. First target image generation step
S200.. Second target image generation step
S300.. First source feature extraction step
S400.. First target feature extraction step
S500.. Second source feature extraction step
S600.. Second target feature extraction step
S700a, S700b.. Refine feature extraction stage
S710a, S710b.. Channel attention stage
S720a, S720b.. Spatial attention stage
S800a, S800b.. Temporary attention matrix creation step
S900a, S900b.. Final attention matrix output stage
S1000a, S1000b.. Source attention matrix output stage
S1100a, S1100b.. Target attention matrix output stage
S1200a, S1200b.. Similarity calculation step
S1300a, S1300b.. learning stage
S1400.. First discrimination step
S1500.. 2nd determination step

Claims

a first source encoder that extracts first source features from a first domain of the first source image;
a first target encoder that extracts a first target feature from a second domain of the first target image;
a second source encoder for extracting second source features from a second domain of the second source image;
a second target encoder extracting second target features from the first domain of the second target image;
A convolutional block attention module that encodes each source feature and extracts each refined feature according to probability values for the channel and pixel position of each source feature output from each source encoder; and
By performing an attention operation on each refined feature using a given query, a temporary attention matrix is generated in which the probability value of belonging to a specific class is displayed for each pixel. Each temporary attention matrix is sorted based on entropy, and then the selected row is selected. A generative adversarial network (GAN)-based image generation device capable of dual learning, including a query attention module that outputs a final attention matrix having .

According to clause 1,
a first generator generating the first target image including the second domain from the first source image so that the first target image is input to the first target encoder;
a second generator generating the second target image including the first domain from the second source image so that the second target image is input to the second target encoder;
a first discriminator that determines authenticity between the first source image and the second target image generated from the second generator; and
A generative adversarial network (GAN)-based image generation capable of dual learning, comprising a second discriminator that determines authenticity between the second source image and the first target image generated from the first generator. Device.

According to clause 1,
The query attention module is,
a source attention matrix output unit that calculates the value of each refined feature and each final attention matrix and outputs each source attention matrix; and
A generative adversarial network (GAN)-based image generation capable of dual learning, comprising a target attention matrix output unit that calculates the value of each target feature and each final attention matrix and outputs each target attention matrix. Device.

According to clause 3,
A learning unit that calculates the similarity of pairs in each source attention matrix and each target attention matrix and learns to reduce the loss value of each final attention matrix using a loss function. A generative adversarial network (GAN)-based image generator capable of learning.

A first source feature extraction step of extracting first source features from a first domain of a first source image by a first source encoder;
A first target feature extraction step of extracting a first target feature from a second domain of the first target image by a first target encoder;
A second source feature extraction step of extracting second source features from a second domain of the second source image by a second source encoder;
A second target feature extraction step of extracting a second target feature from the first domain of the second target image by a second target encoder;
A refine feature extraction step in which each source feature is encoded and refined features are extracted according to probability values for the channel and pixel position of each source feature output from each source encoder by a convolutional block attention module;
A temporary attention matrix generation step in which a given query is used by the query attention module to perform an attention operation on each refined feature, thereby generating temporary attention matrices in which each pixel displays a probability value of belonging to a specific class; and
A final attention matrix output step in which each temporary attention matrix is sorted based on entropy by the query attention module and then a final attention matrix with selected rows is output. Based on a generative adversarial network (GAN) capable of double learning, including: How to create an image.

According to clause 5,
A source attention matrix output step in which the value of each refined feature and each final attention matrix are calculated by the query attention module to output a source attention matrix, respectively; and
A target attention matrix output step in which the value of each target feature and each final attention matrix are calculated by the query attention module to output the target attention matrix, respectively. A generative method capable of double learning, further comprising: An adversarial neural network (GAN)-based image generation method.

According to clause 6,
A similarity calculation step in which the similarity of pairs within each source attention matrix and each target attention matrix is calculated by the learning unit; and
A learning step in which, by the learning unit, a loss function is used to learn to reduce the similarity of positive pairs and to increase the similarity of negative pairs, the method further comprising: A generative adversarial network (GAN)-based image generation method capable of dual learning.

According to clause 5,
A first target image generating step of generating a first target image including a second domain from the first source image by a first generator;
A second target image generating step of generating a second target image including a first domain from the second source image by a second generator;
A first determination step in which authenticity between the first source image and the second target image is determined by a first discriminator; and
A generative adversarial network (GAN)-based image capable of dual learning, further comprising a second discriminator in which authenticity between the second source image and the first target image is determined. How to create it.