KR102192211B1

KR102192211B1 - Efficient Generative Adversarial Networks using Depthwise Separable and Channel Attention for Image to Image Translation

Info

Publication number: KR102192211B1
Application number: KR1020200076546A
Authority: KR
Inventors: 조근식; 김진용
Original assignee: 인하대학교 산학협력단
Priority date: 2020-06-23
Filing date: 2020-06-23
Publication date: 2020-12-16
Anticipated expiration: 2040-06-23

Abstract

Provided are an image translation method of efficient generative adversarial networks (GANs) using a depthwise separable convolution and channel attention, and an apparatus thereof. According to the present invention, the image translation method of efficient generative adversarial networks (GANs) using a depthwise separable convolution and channel attention comprises the steps of: applying a depthwise separable convolution to reduce the number of parameters; and applying channel attention to balance the quality of an output image and computational cost by compensating for information loss that occurs by applying the depthwise separable convolution.

Description

Efficient Generative Adversarial Networks using Depthwise Separable and Channel Attention for Image to Image Translation}

본 발명은 깊이별 분리 가능한 컨볼루션과 채널 어텐션을 이용하여 효율적인 적대적 생성 신경망을 통해 이미지를 변환하기 위한 방법 및 장치에 관한 것이다. The present invention relates to a method and an apparatus for transforming an image through an efficient hostile generated neural network using convolution and channel attention separable by depth.

최근 이미지 변환에 대한 연구들이 활발히 연구되고 있다. 이미지 변환은 기하학적 방법, 필터링 등을 이용한 고전적인 방법부터 딥 러닝을 이용한 방법까지 다양한 형태들이 있다. 특히, 딥 러닝을 이용한 방법의 경우 인공신경망의 매우 빠른 발전으로 인해 의미 있는 성과를 거두고 있다. 그 중 이미지 변환 연구에서 가장 큰 성과를 거두고 있는 분야는 적대적 생성 신경망(Generative Adversarial Networks, GANs)를 이용한 연구들이다. 적대적 생성 신경망을 통한 이미지 변환은 이미지의 스타일을 다른 도메인의 이미지형태로 변환하거나(Image-to-Image Translation), 기존 이미지를 고해상도의 새로운 이미지로 생성하거나(Super-resolution), 이미지의 손상된 부분을 채워 넣거나(Image inpainting) 복원하는(Image restoration) 등이 있다. 이는 자율주행 데이터셋 개선, 의료영상 복원 및 개선, 클래스 불균형 데이터셋 증강 등 산업 전반적인 분야에서 사용되고 있다. 그러나, 적대적 생성 신경망은 여러 개의 기존 컨볼루션 신경망으로 이루어져 있고, 이로 인해 다른 인공신경망에 비해 무거운 구조와 많은 계산 량을 가지고 있다. 그러므로 딥 러닝 모델의 복잡도(Complexity)를 표현하는 파라미터(parameter)의 수가 굉장히 높게 나타난다. 높은 파라미터의 수는 학습시간과 출력물을 생성하는 추론 시간(Inference time)에 대해 상당한 영향을 줄뿐 아니라 모델 학습, 추론 시 필요로 하는 메모리 리소스가 증가하게 된다. 이러한 문제점들은 현실세계에서 다양한 분야에 이미지 간 변환을 적용하는데 큰 제한점이 된다. Recently, studies on image conversion are actively being studied. Image transformation has a variety of forms, from classical methods using geometric methods and filtering to methods using deep learning. In particular, the deep learning method has achieved meaningful results due to the rapid development of artificial neural networks. Among them, the field that has achieved the greatest achievements in image transformation research is research using Generative Adversarial Networks (GANs). Image conversion through a hostile generation neural network converts the style of an image into an image form of another domain (Image-to-Image Translation), creates a new image with high resolution (Super-resolution), or repairs damaged parts of an image. This includes image inpainting and image restoration. It is used in the overall field of the industry, such as improving autonomous driving data sets, restoring and improving medical images, and enhancing class imbalance data sets. However, the hostile generated neural network consists of several existing convolutional neural networks, and thus has a heavier structure and a large amount of computation compared to other artificial neural networks. Therefore, the number of parameters expressing the complexity of the deep learning model appears very high. The high number of parameters not only significantly affects the learning time and the inference time to generate the output, but also increases the memory resources required for model training and inference. These problems become a big limitation in applying the conversion between images to various fields in the real world.

도 1은 종래 기술에 따른 기본적인 적대적 생성 신경망의 구조이다. 1 is a structure of a basic hostile generated neural network according to the prior art.

최근 적대적 생성 신경망에 대한 다양하고 많은 양의 연구들이 진행되고 있다. 가장 초기의 모델은 도 1과 같은 형태의 단순한 형태로 진행되었다.Recently, a large amount of research has been conducted on hostile neural networks. The earliest model proceeded in a simple form as shown in FIG. 1.

무작위 잡음(Random noise)를 입력으로 받은 생성자(Generator)는 무작위 잡음을 통해 가짜 이미지, 다시 말해 생성자를 통해 생성된 이미지(Generated Image)를 생성하기 위한 학습을 한다. 반대로 판별자(Discriminator)는 생성자가 생성한 가짜 이미지와 제공되는 데이터셋의 진짜 이미지(Real Image)를 구분한다. 이 과정에서 생성자는 가짜 이미지가 판별자를 속이기 위해 진짜 이미지와 구별이 가지 않는 이미지를 생성하는 쪽으로 학습되고, 판별자는 가짜 이미지와 진짜 이미지를 구분하기 위한 방향으로 학습된다. 생성자와 판별자는 서로 적대적(Adversarial)인 관계이기 때문에 서로 배반적인 관계이며, 다음과 같은 수식으로 표현된다: A generator that receives random noise as an input learns to generate a fake image, that is, an image generated by the generator through random noise. Conversely, the discriminator distinguishes between the fake image created by the creator and the real image of the provided dataset. In this process, the creator learns to create an image that is indistinguishable from the real image in order for the fake image to deceive the discriminator, and the discriminator is learned in the direction to distinguish the fake image from the real image. Since the constructor and the discriminator are in an adversarial relationship, they are contradictory to each other, expressed by the following formula:

위의 설명과 같이 기본적인 적대적 생성 신경망은 두 개의 네트워크로 되어있다. 기본적인 생성자 한 개와 판별자 한 개로 구성되어 있는 경우도 있지만, 목적에 따라 여러 개의 생성자와 판별자로 구성되어 있다. As described above, the basic hostile generation neural network consists of two networks. In some cases, it is composed of one basic constructor and one discriminator, but it is composed of several constructors and discriminators depending on the purpose.

이미지 간 변환(Image-to-Image Translation)은 서로 다른 도메인의 데이터셋의 스타일을 바꾸는 연구분야다. 다양한 분야에서 쓰일 수 있으며, 최근에는 굉장한 결과 물을 내는 연구들이 많이 보여지고 있다. 그러나, 기존 연구에서 사용되는 도메인의 데이터셋은 페어링된(paired) 데이터셋이 주를 이루었다. 이는 데이터셋을 빌드할 때 비용이 많이 든다는 문제점과 현실세계에 존재하는 데이터셋은 대부분 페어링되지 않아 현실에 적용하기 힘들다는 문제점이 있다. 이를 해결하기 위해 페어링되지 않은(unpaired) 데이터셋의 활용을 중점적으로 연구되기 시작했고, 많은 성과들을 이루어 냈다. 종래기술 CycleGAN 에서는 두 도메인간 이미지 데이터셋을 매핑 하고, 순환 일관성 손실(Cycle consistency loss)를 학습 하며 각 도메인을 서로 다른 도메인의 스타일로 이미지 간 변환을 하는 적대적 생성 신경망 모델이 개시되었다. CycleGAN에서, 앞서 말한 페어링되지 않은 데이터셋을 사용하기 때문에 데이터셋을 빌드하는데 있어서 큰 강점을 가지고 있으므로 다양한 분야에서 적용하여 문제를 해결하는 연구들이 보여지고 있다. 일반적인 적대적 생성 신경망과 다르게 두 개의 생성자(G, F)와 2개의 판별자(Dx, Dy)로 이루어져 있으며, 이는 상대적으로 높은 계산량을 보유 하고 있고, 상대적으로 많은 메모리 리소스가 필요하다. 이러한 CycleGAN의 범용성을 이용하여 보다 좋은 품질의 이미지를 생성하기 위한 연구들이 진행되어왔다. Image-to-Image Translation is a research field that changes the style of datasets in different domains. It can be used in a variety of fields, and recently, many studies that produce great results have been shown. However, the domain data set used in the previous study was mainly a paired data set. This has a problem that it is expensive to build a dataset, and that it is difficult to apply it to reality because most of the datasets that exist in the real world are not paired. To solve this problem, research has begun to focus on the use of unpaired datasets, and many achievements have been achieved. In the prior art CycleGAN, a hostile generated neural network model has been disclosed that maps image datasets between two domains, learns cycle consistency loss, and transforms each domain into images in different domain styles. In CycleGAN, since the aforementioned unpaired data set is used, it has a great advantage in building the data set, so studies that solve the problem by applying it in various fields are being shown. Unlike general hostile generated neural networks, it consists of two generators (G, F) and two discriminators (Dx, Dy), which has a relatively high computational amount and requires a relatively large amount of memory resources. Research has been conducted to create images of better quality using the versatility of CycleGAN.

또 다른 종래 기술 AttentionGAN은 CycleGAN을 기반으로 한 적대적 생성 신경망으로 CycleGAN과 마찬가지로 순환 일관성 손실을 이용하여 두 도메인 간의 맵핑을 통하여 이미지 간 변환을 수행하는 모델이다. 추가적으로, 어텐션 메커니즘을 이용했다는 점이 보다 좋은 품질의 이미지를 생성하는데 도움을 주었다. AttentionGAN은 특징점을 같은 네트워크에서 추출하여 다른 두 네트워크에 공급한 다. 콘텐츠 마스크 생성자(Content Mask Generator)는 입력 이미지에 대한 주요 오브젝트에 대한 특징에 집중한 콘텐츠 마스크(Contents mask)를 생성한다. 어텐션 마스크 생성자(Attention Mask Generator)에서는 데이터셋에서 중요하게 집중해야 할 부분에 대한 특징을 학습한 어텐션 마스크(Attention mask)를 생성한다. 이렇게 이미지의 특징을 분리(disentangled)하여 각 네트워크는 세부적인 학습을 진행한다. 각 생성자에서 나온 콘텐츠 마스크와 어텐션 마스크는 정규화 되고 행렬에 대한 곱연산을 진행하여 어텐션 마스크에서 학습한 집중해야 하는 부분에 대해 보전하고 아닌 부분은 지워가는 학습을 진행하게 된다. 이는 모두 AttentionGAN의 내장된 기능으로 진행되므로 보다 효율적인 학습을 할 수 있다. 그러나 CycleGAN과 AttentionGAN은 모두 1:1의 유니모달(unimodal) 출력만 하는 단점이 있다. 결국 한 도메인에서 여러 스타 일의 이미지를 생성하려면 다회차 학습을 해야 하기 때 문에 비효율적이라는 문제점이 있게 된다. 이 문제점을 해결하기 위해, 멀티모달(multimodal)을 출력하는 모델들이 연구되었다. 멀티모달을 중점으로 한 연구들은 앞서 말한 유니모달의 모델보다 상이한 이미지들을 출력하기에 효율적이다. Another prior art AttentionGAN is a hostile generation neural network based on CycleGAN. Like CycleGAN, it is a model that performs conversion between images through mapping between two domains using cyclic coherence loss. In addition, the fact that the attention mechanism was used helped to create better quality images. AttentionGAN extracts feature points from the same network and supplies them to two other networks. The Content Mask Generator creates a Contents mask that focuses on the characteristics of the main object for the input image. The Attention Mask Generator creates an attention mask that learns features of an important part of the dataset. By separating the features of the image in this way, each network performs detailed learning. Content masks and attention masks from each constructor are normalized, and multiplication operation is performed on the matrix to preserve the parts to be focused and erase parts that are not learned from the attention mask. All of these are performed by the built-in function of AttentionGAN, so more efficient learning can be performed. However, both CycleGAN and AttentionGAN have the disadvantage of only 1:1 unimodal output. In the end, there is a problem that it is inefficient because it requires multiple learning to generate multiple style images in one domain. To solve this problem, models that output multimodal were studied. Studies focusing on multimodal are more efficient in outputting different images than the aforementioned unimodal model.

또 다른 종래기술 StarGAN은 한 개의 생성자와 N개의 생성자로 구성되어 있으며, 여기서 N은 만들고자 하는 도메인의 개수를 의미한다. 한 개의 생성자가 입력된 이미지와 도메인 라벨에 대한 학습을 진행하고 해당 라벨의 판별자는 마찬가지로 진짜 데이터(real data)와 가짜 데이터(fake data)를 구분하도록 학습한다. 이렇게 학습하면 기존 유니모달의 모델과는 다르게 한 개의 생성자만으로도 다양한 도메인의 이미지를 학습하여 생성할 수 있기 때문에 효율적이다. Another prior art StarGAN consists of one constructor and N constructors, where N denotes the number of domains to be created. One creator learns the input image and domain label, and the discriminator of the label learns to distinguish between real data and fake data. This learning is efficient because, unlike the existing Unimodal model, images of various domains can be learned and created with only one constructor.

또 다른 종래기수 MUNIT은 StarGAN과 다르게 도메인의 라벨없이 스스로 여러 가지의 도메인 이미지를 생성할 수 있다. 이것은 AttentionGAN과 마찬가지로 분리(disentangled)하여 특징점을 추출하고 이것을 콘텐츠 인코더(Contents encoder)와 스타일 인코더(Style encoder)에 따로 분리하여 주입하기 때문에 가능하다. MUNIT은 마찬가지로 한 개의 생성자와 한 개의 판별자로 이루어져있기 때문에 보다 효율적인 멀티모달 생성이 가능하다. Unlike StarGAN, another conventional rider MUNIT can create various domain images without a domain label. This is possible because, like AttentionGAN, the feature points are extracted by disentangled, and these are separately injected into the Contents encoder and the Style encoder. MUNIT is likewise composed of one constructor and one discriminator, so more efficient multimodal generation is possible.

특징의 채널 수가 늘어날수록 딥 러닝 네트워크에서 발생하는 연산이 늘어나고 비용이 증가하게 된다. 이러한 문제를 해결하기 위한 연구들은 계속되어 왔고, 특히 깊이별 분리 컨볼루션을 기반으로 한 연구들은 좋은 성과를 보여주고 있다. 종래기술 Inception과 깊이별 분리 컨볼루션에서 영감을 얻은 Xception에서는 1x1 전환을 먼저 수행한 다음 모든 출력 채널에 대해 공간 상관 맵핑을 개별적으로 수행하는 극한 버전 시작 모듈을 제안했다. As the number of channels of a feature increases, the computations that occur in the deep learning network increase and the cost increases. Research to solve this problem has been continued, and studies based on depth-specific separation convolution have shown good results. Inspired by the prior art Inception and depth-specific convolution, Xception proposed an extreme version starter module that performs 1x1 conversion first and then performs spatial correlation mapping individually for all output channels.

MobileNet은 깊이 분리 가능한 컨벌루션과 입력 및 출력 채널을 제어하기 위한 폭승수와 입력 이미지의 크기를 조정하기 위한 해상도 승수로 구성된 추가 수축 하이퍼 파라미터를 사용한다. ShuffleNet은 포인트 단위 컨볼루션(Piontwise Convolution)이 여전히 높은 비용 영역임을 강조한다. 이 문제를 해결하기 위해 ShuffleNet 은 모든 가중치를 연결하지 않은 채널 스페얼스(Channel sparse)를 설계하고 그룹을 섞어 특정 영역에 대한 정보 흐름만 입력으로 얻는 문제를 방지했다.MobileNet uses a depth-separable convolution and an additional contraction hyperparameter consisting of a width multiplier to control the input and output channels and a resolution multiplier to scale the input image. ShuffleNet emphasizes that pointwise convolution is still a high cost domain. To solve this problem, ShuffleNet designed a channel sparse that did not connect all the weights and mixed groups to avoid the problem of getting only the information flow for a specific area as input.

기존 어텐션 메커니즘은 자연어처리(Natural Language Process) 분야에서 주로 사용되었지만, 최근에는 컴퓨터 비전 분야에서도 다양하게 사용되고 있다. Image Classification, Super-Resolution, Image Detection 등 다양하게 사용되고 있으며, 다양한 종 류의 어텐션 메커니즘이 연구되고 있다. SeNet에서 사용되는 채널 어텐션(Channel Attention)은 채널 정보를 압축하고 리스케일링 하여 채널 간의 의존도를 분석 하고 집중할 부분에 더 높은 가중치를 주어 성능을 개선한다. 또한, CBAM은 채널뿐만 아니라 공간적 어텐션(Spatial Attention) 메커니즘을 결합하여 공간정보를 추가적으로 스케일링하여 성능을 개선하였다. Existing attention mechanisms have been mainly used in the field of natural language processing, but recently, they are also used in various ways in the field of computer vision. It is used in various ways such as Image Classification, Super-Resolution, and Image Detection, and various kinds of attention mechanisms are being studied. Channel Attention used in SeNet analyzes the dependence between channels by compressing and rescaling channel information and improving performance by giving a higher weight to the part to be focused. In addition, CBAM improves performance by additionally scaling spatial information by combining a spatial attention mechanism as well as a channel.

본 발명이 이루고자 하는 기술적 과제는 페어링되지 않은 이미지 간 변환(Unpaired Image-to-Image Translation)에서 일반적인 컨볼루션을 깊이별 분리가능한 컨볼루션(Depthwise separable convolution)으로 대체하고 채널 어텐션을 적용하는 DCBlock 모듈을 제공하고, DCBlock을 통해 메모리 리소스가 제한된 환경에서 사용할 수 있도록 학습 파라미터의 수를 줄이고자 한다. The technical problem to be achieved by the present invention is to replace the general convolution with depthwise separable convolution in unpaired image-to-image translation, and a DCBlock module that applies channel attention. It provides and tries to reduce the number of learning parameters so that it can be used in an environment with limited memory resources through DCBlock.

일 측면에 있어서, 본 발명에서 제안하는 깊이별 분리가능한 컨볼루션과 채널 어텐션을 이용한 효율적인 적대적 생성 신경망의 이미지 변환 방법은 파라미터의 수를 줄이기 위해 깊이 분리 가능한 컨벌루션을 적용하는 단계 및 깊이 분리 가능한 컨벌루션을 적용하여 발생하는 정보 손실을 보상하여 출력된 이미지의 품질과 계산 비용 간의 균형을 맞추기 위해 채널 어텐션을 적용하는 단계를 포함한다. In one aspect, the image conversion method of an efficient hostile-generated neural network using separable convolution for each depth and channel attention proposed in the present invention includes the steps of applying a depth separable convolution and a depth separable convolution to reduce the number of parameters. And applying a channel attention to balance between the quality of the output image and computational cost by compensating for information loss caused by the application.

파라미터의 수를 줄이기 위해 깊이 분리가능한 컨벌루션을 적용하는 단계는 깊이별 컨볼루션의 커널 크기, 폭, 높이, 입력 채널과 특징 맵을 이용하여 깊이별 컨볼루션을 적용하고, 공간적 특징을 다루지 않고 채널에서만 수행되는 포인트 컨볼루션을 적용한다. The step of applying depth-separable convolution to reduce the number of parameters is to apply convolution by depth using the kernel size, width, height, input channel and feature map of the convolution by depth, and do not deal with spatial features. The point convolution that is performed is applied.

깊이 분리 가능한 컨벌루션을 적용하여 발생하는 정보 손실을 보상하여 출력된 이미지의 품질과 계산 비용 간의 균형을 맞추기 위해 채널 어텐션을 적용하는 단계는 채널 간의 상호 의존성을 활용하여 필요한 입력 특징점에 집중하고, 글로벌 에버리지 풀링(Global Average Pooling; GAP)을 사용하여 채널 정보를 압축하고 컨볼루션 층을 통해 입력 특징점을 복원하며, 네트워크 내에서 집중해야 할 부분에 대한 입력 특징점들을 강화하고 복원하기 위해 게이팅(gating) 메커니즘을 통해 채널의 통계치를 추출한다. The step of applying channel attention to balance between the quality of the output image and computational cost by compensating for information loss that occurs by applying depth-separable convolution is to focus on the required input feature points by utilizing interdependence between channels, and global average The channel information is compressed using Global Average Pooling (GAP), the input feature points are restored through the convolution layer, and a gating mechanism is used to reinforce and restore the input feature points for the part to be focused in the network. Channel statistics are extracted.

또 다른 본 발명에서 제안하는 깊이별 분리가능한 컨볼루션과 채널 어텐션을 이용한 효율적인 적대적 생성 신경망의 이미지 변환 장치는 파라미터의 수를 줄이기 위해 깊이 분리 가능한 컨벌루션을 적용하는 깊이별 분리부 및 깊이 분리 가능한 컨벌루션을 적용하여 발생하는 정보 손실을 보상하여 출력된 이미지의 품질과 계산 비용 간의 균형을 맞추기 위해 채널 어텐션을 적용하는 채널 어텐션부를 포함한다. Another image conversion device for an efficient hostile-generated neural network using depth-separable convolution and channel attention proposed in the present invention provides a depth-separable convolution and a depth-separable convolution to reduce the number of parameters. And a channel attention unit for applying channel attention to balance between the quality of the output image and computational cost by compensating for information loss caused by the application.

본 발명의 실시예들에 따른 DCBlock은 표준 컨볼루션(Standard Convolution)보다 상대적으로 적은 계산 비용을 갖는 깊이별 분리가능한 합성곱(Depthwise Separable Convolution)과 깊이별 분리 가능한 합성곱으로 인해 발생하는 정보손실(Information loss)을 보완하기 위한 채널 어텐션(Channel Attention)을 적용하여 계산량을 표현하는 지표인 파라미터(parameter)의 수를 최대 91.6% 줄임에도 불구하고 종래기술의 모델들과 비교하여 유사한 품질의 이미지를 생성하는 성능을 보여준다. The DCBlock according to the embodiments of the present invention includes a Depthwise Separable Convolution that has a relatively lower computational cost than a Standard Convolution and an information loss caused by separable convolution by depth ( In spite of the maximum 91.6% reduction in the number of parameters, an index expressing the amount of computation, by applying channel attention to compensate for information loss), an image of similar quality is generated compared to the models of the prior art. Shows the performance.

도 1은 종래 기술에 따른 기본적인 적대적 생성 신경망의 구조이다.
도 2는 본 발명의 일 실시예에 따른 깊이별 분리가능한 컨볼루션과 채널 어텐션을 이용한 효율적인 적대적 생성 신경망의 이미지 변환 장치의 구조를 나타내는 도면이다.
도 3은 본 발명의 일 실시예에 따른 깊이별 분리가능한 컨볼루션과 채널 어텐션을 이용한 효율적인 적대적 생성 신경망의 이미지 변환 방법을 설명하기 위한 흐름도이다.
도 4는 본 발명의 일 실시예에 따른 깊이별 분리 가능한 컨볼루션 구성을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 채널 어텐션 매커니즘을 설명하기 위한 도면이다.1 is a structure of a basic hostile generated neural network according to the prior art.
FIG. 2 is a diagram illustrating a structure of an image conversion apparatus for an efficient hostile generated neural network using convolution and channel attention that can be separated by depth according to an embodiment of the present invention.
3 is a flowchart illustrating an image conversion method of an efficient hostile-generated neural network using convolution and channel attention separable for each depth according to an embodiment of the present invention.
4 is a diagram for describing a convolution configuration that can be separated by depth according to an embodiment of the present invention.
5 is a diagram for explaining a channel attention mechanism according to an embodiment of the present invention.

본 발명의 실시예에 따르면, 학습 파라미터의 수를 줄이기 위해 일반적인 컨볼루션을 깊이별 분리 가능한 컨볼루션(depthwise separable convolution)으로 대체했다. 그러나, 깊이별 분리가능한 컨볼루션은 파라미터의 수는 줄어들지만 그만큼 정보 손실(Information loss)를 발생시킨다. 정보 손실은 적대적 생성 신경망 모델이 좋지 않은 퀄리티의 이미지 생성을 하는 것에 결정적인 영향을 끼친다. According to an embodiment of the present invention, in order to reduce the number of learning parameters, a general convolution is replaced with a depthwise separable convolution. However, separable convolution for each depth reduces the number of parameters, but causes information loss accordingly. Information loss has a decisive effect on the generation of images of poor quality by the hostile generated neural network model.

학습 파라미터의 수를 줄이기 위해 깊이별 분리 가능한 컨볼루션을 적용하는 경우 발생하는 정보 손실을 보상하기 위해, 본 발명에서는 잔차 블록(Residual block)에 채널 어텐션을 적용한다. 채널 어텐션을 적용하면 중요한 특징점에 집중을 하게 되고 정보 손실에 대한 완화작용을 하게 되어 이미지의 품질에 대한 보장을 할 수 있다. In order to compensate for information loss that occurs when a convolution separable by depth is applied to reduce the number of learning parameters, in the present invention, channel attention is applied to a residual block. When channel attention is applied, it focuses on important features and mitigates information loss, thereby ensuring image quality.

따라서, 본 발명에서는 앞서 말한 기술들을 적용하여 DCBlock을 제안한다. DCBlock은 일반적인 적대적 생성 신경망에서 사용되는 "잔차 블록(Residual Block)"을 대체하여 이미지간 변환을 실행하는 구조로 되어있다. 이하, 본 발명의 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다.Therefore, in the present invention, DCBlock is proposed by applying the aforementioned techniques. DCBlock has a structure that performs image-to-image conversion by replacing the “residual block” used in general hostile neural networks. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 일 실시예에 따른 깊이별 분리가능한 컨볼루션과 채널 어텐션을 이용한 효율적인 적대적 생성 신경망의 이미지 변환 장치의 구조를 나타내는 도면이다. FIG. 2 is a diagram illustrating a structure of an image conversion apparatus for an efficient hostile generated neural network using convolution and channel attention that can be separated by depth according to an embodiment of the present invention.

본 발명에서는 적대적 생성 신경망을 이용해 페어링되지 않은 이미지 간 변환(Unpaired Image-to-Image Translation) 시 학습 모델이 무거워지는 문제를 해결하기 위해 DCBlock (Depthwise separable Channel attention Block)을 제안한다. In the present invention, a Depthwise separable Channel Attention Block (DCBlock) is proposed in order to solve the problem that the learning model becomes heavy during unpaired image-to-image translation using a hostile generated neural network.

제안하는 모듈 DCBlock은 도 2에서 볼 수 있듯, 깊이별 분리부(210)(다시 말해, 깊이별 분리 가능 세션(Depthwise Separable Session))과 채널 어텐션부(220)(다시 말해, 채널 어텐션 세션(Channel Attention Session))의 두 세션으로 구성된다. As can be seen in FIG. 2, the proposed module DCBlock includes a depth-specific separation unit 210 (that is, a depth-specific separation session (Depthwise Separable Session)) and a channel attention unit 220 (that is, a channel attention session (Channel Attention Session)).

깊이별 분리부(210)는 파라미터의 수를 줄이기 위해 깊이 분리 가능한 컨벌루션을 적용한다. The depth-separating unit 210 applies a depth-separable convolution to reduce the number of parameters.

깊이별 분리부(210)는 깊이별 분리 컨볼루션(Dethwise Separable Conv) 블록, Ins Norm(Instance Normalization) 블록, ReLU 블록, 깊이별 분리 컨볼루션(Dethwise Separable Conv) 블록, Ins Norm(Instance Normalization) 블록을 거쳐, 깊이 컨볼루션의 커널 크기, 폭, 높이, 입력 채널과 특징 맵을 이용하여 깊이별 컨볼루션을 적용하고, 공간적 특징을 다루지 않고 채널에서만 수행되는 포인트 컨볼루션을 적용한다. Depth-specific separation unit 210 is a Dethwise Separable Conv block, Ins Norm (Instance Normalization) block, ReLU block, Dethwise Separable Conv block, Ins Norm (Instance Normalization) block Through this, convolution for each depth is applied using the kernel size, width, height, input channel and feature map of the depth convolution, and point convolution that is performed only in the channel is applied without dealing with spatial features.

깊이 분리가능한 컨벌루션을 적용하여 파라미터의 수를 줄이는 경우 정보 손실이 발생한다. 이러한 정보손실을 잔류 연결로 극복할 수 있지만 이미지 간 변환에서 이를 완화하지 않아 모델의 결과물이 기존 모델의 이미지 품질만큼 나타나지 않는다. 정보손실은 각 채널에 대해 깊이 분리 가능한 컨벌루션이 수행되므로 전체 부분에 나타나는 특징의 손실은 불가피하기 때문에 일어난다. 그러므로 출력된 이미지의 품질과 계산 비용 간의 최적 균형을 맞추기 위해 정보 손실을 보상하는 기술을 추 가해야 했다. 따라서 필요한 정보를 최대한 유지하고 심 층 구조에서도 정보를 잃지 않도록 채널 어텐션을 적용 한다. Information loss occurs when the number of parameters is reduced by applying depth-separable convolution. This loss of information can be overcome with residual linkage, but the result of the model is not as good as the image quality of the existing model because it is not alleviated in the conversion between images. Information loss occurs because deeply separable convolution is performed for each channel, and loss of features appearing in the entire portion is inevitable. Therefore, it was necessary to add a technique to compensate for information loss in order to achieve an optimal balance between the quality of the output image and the computational cost. Therefore, channel attention is applied so that necessary information is kept as much as possible and information is not lost even in a deep structure.

채널 어텐션부(220)는 GAP(Global Average Pooling) 블록, 컨볼루션(Convolution) 블록, ReLU 블록, 컨볼루션(Convolution) 블록, 시그모이드(Sigmoid) 블록을 거쳐, 깊이 분리 가능한 컨벌루션을 적용하여 발생하는 정보 손실을 보상하여 출력된 이미지의 품질과 계산 비용 간의 균형을 맞추기 위해 채널 어텐션을 적용한다. The channel attention unit 220 is generated by applying a depth-separable convolution through a Global Average Pooling (GAP) block, a convolution block, a ReLU block, a convolution block, and a sigmoid block. Channel attention is applied to balance the output image quality and computational cost by compensating for loss of information.

채널 어텐션부(220)는 채널 간의 상호 의존성을 활용하여 필요한 입력 특징점에 집중하고, 글로벌 에버리지 풀링(Global Average Pooling; GAP)을 사용하여 채널 정보를 압축하고 컨볼루션 층을 통해 입력 특징점을 복원한다. 그리고, 네트워크 내에서 집중해야 할 부분에 대한 입력 특징점들을 강화하고 복원하기 위해 게이팅(gating) 메커니즘을 통해 채널의 통계치를 추출한다. The channel attention unit 220 focuses on required input feature points by utilizing interdependence between channels, compresses channel information using Global Average Pooling (GAP), and restores the input feature points through a convolution layer. In addition, in order to reinforce and restore input feature points for a part to be focused in the network, channel statistics are extracted through a gating mechanism.

DCBlock에서 입력 특징과 출력 특징은 잔차 학습을 진행하는 각 블록에 공급된다. 입력 특징

의 경우 DCBlock은 다음과 같이 공식화할 수 있다: In DCBlock, input features and output features are supplied to each block undergoing residual learning. Input features

In this case, DCBlock can be formulated as follows:

DCBlock 모듈을 의미하는

는 깊이별 분리 가능 세션(Depthwise Separable Session)

, 채널 어텐션 세션(Channel Attention Session))

, 그리고 입력 특징인

으로 구성되어 있다. 이는 모두 잔차 학습(Residual Learning)으로 유기적인 연결이 되어있다. 입력된

과 이를

를 통해 출력된 특징인

과 함께 잔차 학습을 진행한다. 추가적으로

를 통해 추출된 특징을 채널 어텐션 세션에 공급해준

도 잔차 학습을 진행한다. 이에 해당되는 부분은 다음과 같이 공식화할 수 있다: DCBlock module means

Is a Depthwise Separable Session

, Channel Attention Session

, And the input feature

It consists of. These are all organically linked to residual learning. Entered

And this

The feature output through

And proceed with residual learning. Additionally

Provided the features extracted through the channel attention session

Also proceeds with residual learning. This can be formulated as follows:

채널 어텐션으로부터 추출된 특징점인

은

과 곱연산을 진행하게 되고 게이팅(gaiting) 메커니즘으로 인해 특징에는 자연스럽게 채널의 통계치로 얻어진 집중점에 대해 입혀지게 되어 중요한 부분의 정보가 강화되고 정보 손실을 방지할 수 있다. 이 방법을 통 해 DCBlock은 이미지 간 변환 시 많은 수의 파라미터를 필요로 하는 학습모델의 파라미터 수를 줄일 수 있고, 기 존 이미지 간 변환을 학습하는 모델과 유사한 품질의 이미지를 생성할 수 있게 된다. Feature points extracted from channel attention

silver

The multiplication operation is performed, and due to the gating mechanism, the feature is naturally applied to the point of focus obtained as the statistics of the channel, thereby reinforcing the information of an important part and preventing information loss. Through this method, DCBlock can reduce the number of parameters of a learning model that requires a large number of parameters when converting between images, and it is possible to create an image of similar quality to a model that learns conversion between images.

도 3은 본 발명의 일 실시예에 따른 깊이별 분리가능한 컨볼루션과 채널 어텐션을 이용한 효율적인 적대적 생성 신경망의 이미지 변환 방법을 설명하기 위한 흐름도이다. 3 is a flowchart illustrating an image conversion method of an efficient hostile-generated neural network using convolution and channel attention separable for each depth according to an embodiment of the present invention.

제안하는 깊이별 분리가능한 컨볼루션과 채널 어텐션을 이용한 효율적인 적대적 생성 신경망의 이미지 변환 방법은 파라미터의 수를 줄이기 위해 깊이 분리 가능한 컨벌루션을 적용하는 단계(310) 및 깊이 분리 가능한 컨벌루션을 적용하여 발생하는 정보 손실을 보상하여 출력된 이미지의 품질과 계산 비용 간의 균형을 맞추기 위해 채널 어텐션을 적용하는 단계(320)를 포함한다. The proposed image conversion method for an efficient hostile generation neural network using separable convolution by depth and channel attention is information generated by applying depth separable convolution (310) and depth separable convolution to reduce the number of parameters. And applying (320) channel attention to a balance between the quality of the output image and the computational cost by compensating for the loss.

단계(310)에서, 파라미터의 수를 줄이기 위해 깊이 분리 가능한 컨벌루션을 적용한다. In step 310, depth-separable convolution is applied to reduce the number of parameters.

단계(320)에서, 깊이 분리 가능한 컨벌루션을 적용하여 발생하는 정보 손실을 보상하여 출력된 이미지의 품질과 계산 비용 간의 균형을 맞추기 위해 채널 어텐션을 적용한다. 아래에서 도 4 및 도 5를 참조하여 더욱 상세히 설명한다. In step 320, channel attention is applied to balance the quality of the output image and computational cost by compensating for information loss caused by applying a depth-separable convolution. It will be described in more detail below with reference to FIGS. 4 and 5.

도 4는 본 발명의 일 실시예에 따른 깊이별 분리 가능한 컨볼루션 구성을 설명하기 위한 도면이다. 4 is a diagram for describing a convolution configuration that can be separated by depth according to an embodiment of the present invention.

앞서 설명했듯, 본 발명에서는 파라미터의 수를 줄이기 위해 도 4(a)와 같은 깊이별 컨볼루션(Depthwise Convolution)과 도 4(b)와 같은 포인트 컨볼루션(Pointwise Convolution)으로 이루어진 깊이별 분리 가능한 컨볼루션을 사용한다.As described above, in the present invention, in order to reduce the number of parameters, a separable convolution by depth consisting of a depthwise convolution as shown in FIG. 4(a) and a pointwise convolution as shown in FIG. 4(b) Use lution.

깊이 컨볼루션에는 공간 피처를 추출하기 위한 채널 수에 대한 필터가 있으므로 입력 및 출력 채널 수가 동일하다. 깊이 컨볼루션은 다음과 같이 나타낼 수 있다: In depth convolution, there is a filter on the number of channels to extract spatial features, so the number of input and output channels is the same. The depth convolution can be expressed as:

여기서

은 깊이 컨볼루션의 커널 크기를 나타내고, i, j, m은 폭, 높이, 입력 채널을 나타내고

는 특징 맵을 나타낸다. 포인트 컨볼루션은 1x1 컨볼루션이며 필터의 크기는 1x1로 고정된다. 깊이 컨볼루션과 달리, 포인트 컨벌루션은 공간적 특징을 다루지 않고 채널에서만 수행된다. 이것은 DCBlock에서 계산량을 크게 줄이는데 도움이 된다. 표 1은 두 변환의 파라미터의 수와 계산 비용의 차이를 보여준다.here

Represents the kernel size of the depth convolution, i, j, and m represent the width, height, and input channel.

Represents a feature map. The point convolution is 1x1 convolution and the size of the filter is fixed at 1x1. Unlike depth convolution, point convolution does not deal with spatial features and is performed only in channels. This helps to significantly reduce the amount of computation in DCBlock. Table 1 shows the difference between the number of parameters and computation cost of the two transformations.

<표 1><Table 1>

여기서

는 커널 크기,

은 입력 및 출력 채널 크기를 각각 나타내고

는 입력 높이, 너비를 각각 나타낸다. 표 1에 나타낸 바와 같이, 표준 컨볼루션은 계산 비용이

이고 깊이 분리가능한 컨볼루션은

이다. 차이를 보기 위해 두 비용을 나누면 비용이

감소한 것으로 나타난다. 이는 상대적으로 깊은 모델이 사용되는 적대적 생성 신경망에서 비례 적으로 증가하는 파라미터의 수를 줄이는 데 큰 효과가 있다.here

Is the kernel size,

Represents the input and output channel sizes respectively

Represents the input height and width, respectively. As shown in Table 1, the standard convolution has a computational cost

And the depth separable convolution is

to be. If you divide the two costs to see the difference,

Appears to have decreased. This has a great effect in reducing the number of proportionally increasing parameters in a hostile generated neural network where a relatively deep model is used.

도 5는 본 발명의 일 실시예에 따른 채널 어텐션 매커니즘을 설명하기 위한 도면이다. 5 is a diagram for explaining a channel attention mechanism according to an embodiment of the present invention.

채널 어텐션 메커니즘은 채널 간의 상호 의존성을 활용하여 필요한 특징점에 집중하는 어텐션의 변형 중 하나다. The channel attention mechanism is one of the variations of attention that focuses on required feature points by utilizing interdependencies between channels.

도 5에서 볼 수 있듯, GAP(Global Average Pooling) 블록, FC(Fully connected Layer)(다시 말해, 컨볼루션(Convolution)) 블록, ReLU 블록, FC(Fully connected Layer)(다시 말해, 컨볼루션(Convolution)) 블록, 시그모이드(Sigmoid) 블록을 거쳐, 입력 특징(510)에 채널 어텐션(520)을 적용하여, 글로벌 에버리지 풀링(Global Average Pooling: GAP)을 사용하여 채널 정보를 압축하고 컨볼루션 층을 통해 특징점을 복원한다. 다음 식은 GAP 를 통한 입력 특징점

을 구하는 식이다: As can be seen in FIG. 5, a Global Average Pooling (GAP) block, a fully connected layer (FC) (that is, convolution) block, a ReLU block, and a fully connected layer (FC) (that is, convolution). )) Through a block, a sigmoid block, a channel attention 520 is applied to the input feature 510, the channel information is compressed using Global Average Pooling (GAP) and a convolution layer The feature point is restored through The following equation is the input feature point through GAP

Is the equation for:

이어서, 채널의 통계치는 게이팅(gating) 메커니즘을 통해 추출된다. 이 효과로 네트워크 내에서 집중해야 할 부분에 대한 특징들이 강화되고 복원된다. 따라서 심층모델에서도 가능한 많은 특징점을 유지하고 깊이 분리가능한 컨볼루션으로 인한 정보 손실을 해결하기 위해 채널 어텐션을 사용하여 복원된 입력 특징점을 출력(530)한다. Subsequently, the channel statistics are extracted through a gating mechanism. This effect reinforces and restores the features of the part to be focused within the network. Therefore, in order to maintain as many feature points as possible even in the deep model and to solve information loss due to convolution that can be separated in depth, the restored input feature points are output 530 using channel attention.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다.　 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다.　 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다.　 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다.　 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the devices and components described in the embodiments include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It can be implemented using one or more general purpose computers or special purpose computers, such as a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of software. For the convenience of understanding, although it is sometimes described that one processing device is used, one of ordinary skill in the art, the processing device is a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.　 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다.　 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, configuring the processing unit to behave as desired or processed independently or collectively. You can command the device. Software and/or data may be interpreted by a processing device or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. Can be embodyed in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.　 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.　 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.　 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.　 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.　 The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of the program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다.　 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by the limited embodiments and drawings, various modifications and variations are possible from the above description by those of ordinary skill in the art. For example, the described techniques are performed in a different order from the described method, and/or components such as a system, structure, device, circuit, etc. described are combined or combined in a form different from the described method, or other components Alternatively, even if substituted or substituted by an equivalent, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and claims and equivalents fall within the scope of the claims to be described later.

Claims

Applying a depth separable convolution to reduce the number of parameters; And
Step of applying channel attention to balance the quality of the output image and computational cost by compensating for information loss that occurs by applying depth-separable convolution
Including,
The step of applying a depth separable convolution to reduce the number of parameters,
Convolution for each depth is applied using the kernel size, width, height, input channel and feature map of the convolution for each depth, and point convolution that is performed only in the channel without dealing with spatial features is applied,
In the depth-specific convolution, the input features and output features are supplied to each block for residual learning, and the input features

In the case of, the convolution by depth is represented by the following equation,

Means convolution by depth

Is a Depthwise Separable Session

, Channel Attention Session

, And input features

And all of the above parameters are organically connected to residual learning, and the input

and

The feature output through

And, additionally

To supply the features extracted through the channel attention session

To proceed with residual learning
Image conversion method.

delete

The method of claim 1,
The step of applying channel attention to balance between the quality of the output image and the computation cost by compensating for information loss caused by applying a depth-separable convolution,
It focuses on required input feature points by utilizing interdependence between channels, compresses channel information using Global Average Pooling (GAP), and restores input feature points through a convolution layer.
Image conversion method.

The method of claim 3,
Channel statistics are extracted through a gating mechanism in order to reinforce and restore the input feature points for the part to be focused in the network.
Image conversion method.

Depth-by-depth separation unit for applying a depth-separable convolution to reduce the number of parameters; And
Channel attention unit that applies channel attention to balance the output image quality and computation cost by compensating for information loss that occurs by applying depth-separable convolution
Including,
Separation by depth,
Convolution for each depth is applied using the kernel size, width, height, input channel and feature map of the convolution for each depth, and point convolution that is performed only in the channel without dealing with spatial features is applied,
In the depth-specific convolution, the input features and output features are supplied to each block for residual learning, and the input features

Means convolution by depth

Is a Depthwise Separable Session

, Channel Attention Session

, And input features

and

The feature output through

And, additionally

To supply the features extracted through the channel attention session

To proceed with residual learning
Image conversion device.

delete

The method of claim 5,
The channel attention department,
It focuses on required input feature points by utilizing interdependence between channels, compresses channel information using Global Average Pooling (GAP), and restores input feature points through a convolution layer.
Image conversion device.

The method of claim 7,
The channel attention department,
Channel statistics are extracted through a gating mechanism in order to reinforce and restore the input feature points for the part to be focused in the network.
Image conversion device.