KR102491057B1

KR102491057B1 - Device and Method for Image Style Transfer

Info

Publication number: KR102491057B1
Application number: KR1020200024265A
Authority: KR
Inventors: 정원진
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2023-01-19
Also published as: KR20210109244A

Abstract

이미지 변환 장치 및 방법을 개시한다.
본 발명의 일 측면에 의하면, 복수의 컨볼루션 레이어마다 이미지에 대한 복수의 특징맵을 추출하도록 훈련된 신경망; 상기 복수의 컨볼루션 레이어로부터 추출된 스타일 이미지에 대한 복수의 특징맵과 입력 이미지에 대한 복수의 특징맵에 기초하여 스타일 손실 함수를 연산하고, 하나의 컨볼루션 레이어로부터 추출된 상기 컨텐트 이미지에 대한 복수의 특징맵과 상기 입력 이미지에 대한 복수의 특징맵에 기초하여 컨텐트 손실 함수를 연산하는 연산부; 및 상기 스타일 손실 함수 및 상기 컨텐트 손실 함수를 이용하여 상기 입력 이미지의 스타일과 컨텐트를 변환하는 변환부를 포함하는 이미지 변환 장치를 제공한다.An image conversion device and method are disclosed.
According to one aspect of the present invention, a neural network trained to extract a plurality of feature maps for an image for each of a plurality of convolution layers; A style loss function is calculated based on a plurality of feature maps for style images extracted from the plurality of convolution layers and a plurality of feature maps for an input image, and a plurality of feature maps for the content image extracted from one convolution layer. a calculator for calculating a content loss function based on a feature map of and a plurality of feature maps of the input image; and a conversion unit that converts the style and content of the input image using the style loss function and the content loss function.

Description

Image conversion device and method {Device and Method for Image Style Transfer}

본 발명의 실시예들은 스타일 이미지의 스타일과 컨텐트 이미지의 컨텐트를 입력 이미지에 적용하는 이미지 변환 장치 및 방법에 관한 것이다.Embodiments of the present invention relate to an image conversion apparatus and method for applying the style of a style image and the content of a content image to an input image.

이 부분에 기술된 내용은 단순히 본 발명에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The information described in this section simply provides background information on the present invention and does not constitute prior art.

이미지 스타일 트랜스퍼(image style transfer)는 스타일 이미지의 스타일과 컨텐트 이미지의 컨텐트를 합성한 이미지를 생성하는 이미지 변환 장치이다. 이미지 스타일 트랜스퍼를 이용하는 이미지 변환 장치는 주로 합성곱 신경망(convolutional neural network: CNN)를 이용한다.An image style transfer is an image conversion device that generates an image in which the style of a style image and the content of a content image are synthesized. An image conversion device using image style transfer mainly uses a convolutional neural network (CNN).

이미지 스타일 트랜스퍼를 이용하는 이미지 변환 장치는 두 가지 방법으로 나뉜다. 첫 번째는 모델 최적화(model optimization) 방법이며, 두 번째는 이미지 최적화(image optimization) 방법이다.An image conversion device using image style transfer is divided into two methods. The first is a model optimization method, and the second is an image optimization method.

모델 최적화 방법은 합성곱 신경망(CNN), 생성적 적대 신경망(generative adversarial network: GAN) 등을 이용하여 스타일 이미지의 스타일과 컨텐트 이미지의 컨텐트가 혼합된 출력 이미지를 생성하는 방법이다. 모델 최적화 방법에 이용되는 신경망은 수 많은 이미지들을 학습하고, 파라미터를 업데이트한다. The model optimization method is a method of generating an output image in which the style of a style image and the content of a content image are mixed using a convolutional neural network (CNN) or a generative adversarial network (GAN). The neural network used in the model optimization method learns numerous images and updates parameters.

생성적 적대 신경망을 이용하는 경우, 생성적 신경망은 스타일 이미지와 컨텐트 이미지가 합성된 이미지를 생성하고 적대 신경망은 합성 이미지를 다시 분리하도록 훈련된다. 이미지 변환 장치는 다양한 이미지 데이터 셋에 대해 훈련된 신경망을 이용하여 임의의 두 이미지를 합성할 수 있다. In the case of using a generative adversarial network, the generative neural network generates a composite image of a style image and a content image, and the adversarial network is trained to separate the composite image again. The image conversion device may synthesize two arbitrary images using a neural network trained on various image data sets.

이 외에도, 학습 가능한 스타일 트랜스퍼 네트워크를 이용하는 이미지 변환 방법이 있다. 이 방법은 이미지로부터 특징을 추출하도록 훈련된 신경망에 학습 가능한 스타일 트랜스퍼 네트워크를 추가한 후 스타일 트랜스퍼 네트워크의 파라미터를 업데이트 과정을 수행한다. 이후, 미리 훈련된 신경망과 학습 과정을 거친 네트워크를 이용하여 스타일 이미지와 컨텐트 이미지를 합성할 수 있다.In addition to this, there is an image conversion method using a learnable style transfer network. In this method, after adding a learnable style transfer network to a neural network trained to extract features from images, a process of updating the parameters of the style transfer network is performed. Then, a style image and a content image may be synthesized using a pre-trained neural network and a network that has undergone a learning process.

한편, 이미지 최적화 방법은 미리 훈련이 완료된 신경망을 이용하여 스타일 이미지의 스타일과 컨텐트 이미지의 스타일 이미지를 입력 이미지에 적용함으로써, 스타일 이미지와 컨텐트 이미지가 합성된 이미지를 생성하는 방법이다. 이때, 미리 훈련이 완료된 신경망은 고정된 파라미터를 이용하여 이미지로부터 특징맵을 추출하는 데 이용된다.Meanwhile, the image optimization method is a method of generating an image in which a style image and a content image are synthesized by applying a style image of a style image and a style image of a content image to an input image using a pretrained neural network. At this time, the previously trained neural network is used to extract a feature map from the image using fixed parameters.

하지만, 이미지 최적화 방법은 미리 훈련이 완료된 신경망을 이용하므로 별도의 훈련 과정이 필요 없는 장점이 있으나, 모델 최적화 방법에 비해 한 장의 합성 이미지를 생성하더라도 새로운 이미지에 대해 같은 과정을 수행해야하므로 여러 장의 합성 이미지를 생성하는 데 상당히 많은 시간이 소요된다는 문제점이 있다. 또한, 수 많은 데이터에 대해 손실함수를 연산하고 파라미터를 훈련한 후 임의의 두 이미지를 합성하는 모델 최적화 방법과 달리 이미지 최적화 방법은 입력 데이터마다 연산 과정을 수행해야 하므로 이미지가 뭉개지는 등 안정성이 낮다는 단점이 있다.However, the image optimization method has the advantage of not requiring a separate training process because it uses a neural network that has been trained in advance. There is a problem that it takes a lot of time to generate an image. In addition, unlike the model optimization method, which computes a loss function for a large number of data, trains parameters, and synthesizes two random images, the image optimization method has to perform a calculation process for each input data, so the image is not stable. has a downside.

따라서, 이미지 최적화 방법에 있어서 연산량을 줄여 합성 이미지를 생성하는 데 드는 시간을 줄이고, 스타일 이미지와 컨텐트 이미지가 적절하게 합성될 수 있도록 하는 연구가 필요하다.Therefore, in the image optimization method, research is needed to reduce the amount of computation to reduce the time required to generate a composite image, and to properly combine a style image and a content image.

본 발명의 실시예들은, 스타일 이미지와 컨텐트 이미지를 합성하는 과정에서 제3의 손실 함수를 추가하고, 이를 이용하여 스타일 이미지와 컨텐트 이미지를 입력 이미지에 적용함으로써, 입력 이미지의 스타일과 컨텐트를 변환하는 데 소요되는 시간을 줄이고, 안정성 높은 이미지 변환 장치 및 방법을 제공하는 데 주된 목적이 있다.Embodiments of the present invention add a third loss function in the process of synthesizing a style image and a content image, and apply the style image and content image to an input image using this, thereby converting the style and content of the input image. The main object is to reduce the time required for processing and to provide an image conversion device and method with high reliability.

본 발명의 다른 실시예들은, 입력 이미지와 스타일 이미지를 혼합한 혼합 이미지를 생성하고, 미리 생성된 혼합 이미지에 스타일 이미지의 스타일과 컨텐트 이미지의 컨텐트를 반영함으로써, 입력 이미지의 스타일과 컨텐트를 변환하는 데 소요되는 시간을 줄이고, 안정성 높은 이미지 변환 장치 및 방법을 제공하는 데 일 목적이 있다.Other embodiments of the present invention generate a mixed image by mixing an input image and a style image, and reflect the style of the style image and the content of the content image to the pre-generated mixed image, thereby converting the style and content of the input image. One object of the present invention is to reduce the time required for processing and to provide an image conversion device and method with high stability.

본 발명의 일 측면에 의하면, 스타일 이미지(style image)의 스타일과 컨텐트 이미지(content image)의 컨텐트를 입력 이미지에 적용하는 이미지 변환 장치에 있어서, 복수의 컨볼루션 레이어(convolution layer)를 포함하되, 각각의 컨볼루션 레이어마다 이미지에 대한 복수의 특징맵(feature map)을 추출하도록 훈련된 신경망; 상기 복수의 컨볼루션 레이어로부터 추출된 스타일 이미지에 대한 복수의 특징맵과 입력 이미지에 대한 복수의 특징맵에 기초하여 스타일 손실 함수(loss function)를 연산하고, 하나의 컨볼루션 레이어로부터 추출된 상기 컨텐트 이미지에 대한 복수의 특징맵과 상기 입력 이미지에 대한 복수의 특징맵에 기초하여 컨텐트 손실 함수를 연산하는 연산부; 및 상기 스타일 손실 함수 및 상기 컨텐트 손실 함수를 이용하여 상기 입력 이미지의 스타일과 컨텐트를 변환하는 변환부(상기 변환된 입력 이미지는 상기 훈련된 신경망의 입력 이미지로서 이용됨)를 포함하는 이미지 변환 장치를 제공한다.According to one aspect of the present invention, in the image conversion device for applying the style of a style image and the content of a content image to an input image, including a plurality of convolution layers, a neural network trained to extract a plurality of feature maps for an image for each convolutional layer; A style loss function is calculated based on a plurality of feature maps for style images extracted from the plurality of convolution layers and a plurality of feature maps for an input image, and the content extracted from one convolution layer a calculator for calculating a content loss function based on a plurality of feature maps of an image and a plurality of feature maps of the input image; and a conversion unit for converting the style and content of the input image using the style loss function and the content loss function (the converted input image is used as an input image for the trained neural network). do.

본 실시예의 다른 측면에 의하면, 스타일 이미지의 스타일과 컨텐트 이미지의 컨텐트를 입력 이미지에 적용하는 이미지 변환 장치의 동작 방법에 있어서, 복수의 컨볼루션 레이어를 포함하되 각각의 컨볼루션 레이어마다 이미지에 대한 복수의 특징맵을 추출하도록 훈련된 신경망에 스타일 이미지를 입력하고, 상기 스타일 이미지에 대한 복수의 특징맵을 획득하는 과정; 상기 훈련된 신경망에 상기 입력 이미지를 입력하고, 상기 입력 이미지에 대한 복수의 특징맵을 획득하는 과정; 상기 훈련된 신경망에 상기 컨텐트 이미지를 입력하고, 상기 컨텐트 이미지에 대한 복수의 특징맵을 획득하는 과정; 상기 복수의 컨볼루션 레이어로부터 추출된 복수의 특징맵 중 상기 스타일 이미지에 대한 복수의 특징맵과 상기 입력 이미지에 대한 복수의 특징맵에 기초하여 스타일 손실 함수를 연산하는 과정; 하나의 컨볼루션 레이어로부터 추출된 복수의 특징맵 중 상기 컨텐트 이미지에 대한 복수의 특징맵과 상기 입력 이미지에 대한 복수의 특징맵에 기초하여 컨텐트 손실 함수를 연산하는 과정; 상기 스타일 손실 함수 및 상기 컨텐트 손실 함수를 이용하여 상기 입력 이미지의 스타일과 컨텐트를 변환하는 과정; 및 변환된 입력 이미지를 상기 훈련된 신경망에 입력하는 과정을 포함하는 이미지 변환 장치의 동작 방법을 제공한다.According to another aspect of this embodiment, in the operating method of an image conversion device for applying the style of a style image and the content of a content image to an input image, including a plurality of convolution layers, each convolution layer includes a plurality of images for the image. inputting a style image to a neural network trained to extract a feature map of and acquiring a plurality of feature maps for the style image; inputting the input image to the trained neural network and acquiring a plurality of feature maps of the input image; inputting the content image to the trained neural network and acquiring a plurality of feature maps of the content image; calculating a style loss function based on a plurality of feature maps for the style image and a plurality of feature maps for the input image among a plurality of feature maps extracted from the plurality of convolution layers; calculating a content loss function based on a plurality of feature maps for the content image and a plurality of feature maps for the input image among a plurality of feature maps extracted from one convolution layer; converting the style and content of the input image using the style loss function and the content loss function; and inputting the converted input image to the trained neural network.

이상에서 설명한 바와 같이 본 발명의 일 실시예에 의하면, 입력 이미지의 스타일에 스타일 이미지의 스타일을 적용하고, 입력 이미지의 컨텐트에 컨텐트 이미지의 컨텐트를 적용함으로써 입력 이미지의 스타일과 컨텐트를 변환하는 데 소요되는 시간을 줄이고, 변환 안정성을 높일 수 있다.As described above, according to an embodiment of the present invention, the style of the style image is applied to the style of the input image, and the content of the content image is applied to the content of the input image, thereby converting the style and content of the input image. It is possible to reduce the conversion time and increase the conversion stability.

도 1은 본 발명의 일 실시예에 따른 이미지 변환 장치의 구성요소를 나타내는 구성도이다.
도 2는 본 발명의 일 실시예에 따른 이미지 변환 장치의 동작 과정을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따라 추가 손실 함수 및 혼합 이미지를 이용하는 이미지 변환 장치의 동작 과정을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 이미지 변환 장치의 동작 방법을 설명하기 위한 순서도이다.1 is a configuration diagram showing components of an image conversion device according to an embodiment of the present invention.
2 is a diagram for explaining an operation process of an image conversion device according to an embodiment of the present invention.
3 is a diagram for explaining an operation process of an image conversion device using an additive loss function and mixed images according to an embodiment of the present invention.
4 is a flowchart illustrating an operating method of an image conversion device according to an embodiment of the present invention.

이하, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present invention will be described in detail through exemplary drawings. In adding reference numerals to components of each drawing, it should be noted that the same components have the same numerals as much as possible even if they are displayed on different drawings. In addition, in describing the present invention, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present invention, the detailed description will be omitted.

또한, 본 발명의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 '포함', '구비'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 '~부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Also, terms such as first, second, A, B, (a), and (b) may be used in describing the components of the present invention. These terms are only used to distinguish the component from other components, and the nature, order, or order of the corresponding component is not limited by the term. Throughout the specification, when a part 'includes' or 'includes' a certain component, it means that it may further include other components without excluding other components unless otherwise stated. . In addition, terms such as '~unit' and 'module' described in the specification refer to a unit that processes at least one function or operation, and may be implemented by hardware, software, or a combination of hardware and software.

도 1은 본 발명의 일 실시예에 따른 이미지 변환 장치의 구성요소를 나타내는 구성도이다.1 is a configuration diagram showing components of an image conversion device according to an embodiment of the present invention.

도 1을 참조하면, 이미지 변환 장치(미도시)는 훈련된 신경망(pre-trained neural network, 100), 연산부(110) 및 변환부(120)를 포함한다. 우선, 이미지 변환 장치는 스타일 이미지(style image)와 컨텐트 이미지(content image)를 획득한다. 도 1에서 입력 이미지는 화이트 노이즈 이미지(white noise image)이거나 컨텐트 이미지와 동일한 이미지일 수 있다.Referring to FIG. 1 , an image conversion device (not shown) includes a pre-trained neural network 100 , a calculation unit 110 and a conversion unit 120 . First, the image conversion device acquires a style image and a content image. In FIG. 1 , the input image may be a white noise image or the same image as the content image.

훈련된 신경망(100)은 복수의 컨볼루션 레이어(convolution layer)를 포함하되, 각각의 컨볼루션 레이어마다 이미지에 대한 복수의 특징맵(feature map)을 추출하도록 미리 훈련된 구성요소이다. 도 1에서, 훈련된 신경망(100)은 스타일 이미지를 입력 받은 후 스타일 이미지에 대한 복수의 특징맵(F(s))을 생성하고, 컨텐트 이미지를 입력 받은 후 컨텐트 이미지에 대한 복수의 특징맵(F(c))을 생성하고, 입력 이미지를 입력 받은 후 입력 이미지에 대한 복수의 특징맵(F(i))을 생성한다.The trained neural network 100 includes a plurality of convolution layers, and is a pre-trained component to extract a plurality of feature maps of an image for each convolution layer. In FIG. 1, the trained neural network 100 generates a plurality of feature maps (F(s)) for the style image after receiving a style image, and after receiving a content image, a plurality of feature maps (F(s)) for the content image F(c)) is generated, and after receiving an input image, a plurality of feature maps F(i) for the input image are generated.

연산부(110)는 훈련된 신경망(100)으로부터 추출된 특징맵을 이용하여 입력 이미지에 적용할 스타일 손실 함수(style loss function), 컨텐트 손실 함수(content loss function) 또는 상대적 엔트로피(relative entropy) 중 적어도 하나를 연산하는 구성요소이다. The operation unit 110 determines at least one of a style loss function, a content loss function, and relative entropy to be applied to the input image using the feature map extracted from the trained neural network 100. It is a component that computes one.

연산부(110)는 스타일 이미지에 대해 복수의 컨볼루션 레이어로부터 추출된 복수의 스타일 특징맵과 입력 이미지에 대해 복수의 컨볼루션 레이어로부터 추출된 복수의 입력 특징맵에 기초하여 스타일 손실 함수를 연산한다. 또한, 연산부(110)는 컨텐트 이미지에 대해 하나의 컨볼루션 레이어로부터 추출된 복수의 컨텐트 특징맵과 입력 이미지에 대해 하나의 컨볼루션 레이어로부터 추출된 복수의 컨텐트 특징맵에 기초하여 컨텐트 손실 함수를 연산한다. 여기서, 본 발명의 일 실시예에 따라 하나의 컨볼루션 레이어는 복수의 컨볼루션 레이어 중 임의의 한 레이어, 신경망의 마지막 출력단 레이어 또는 그 직전 레이어 중 어느 하나를 의미할 수 있다.The calculator 110 calculates a style loss function based on a plurality of style feature maps extracted from a plurality of convolution layers for a style image and a plurality of input feature maps extracted from a plurality of convolution layers for an input image. In addition, the calculator 110 calculates a content loss function based on a plurality of content feature maps extracted from one convolution layer for the content image and a plurality of content feature maps extracted from one convolution layer for the input image. do. Here, according to an embodiment of the present invention, one convolution layer may mean any one of an arbitrary one of a plurality of convolution layers, a last output layer of a neural network, or a layer immediately before it.

변환부(120)는 스타일 손실 함수 및 컨텐트 손실 함수를 이용하여 입력 이미지의 스타일과 컨텐트를 변환하는 구성요소다. 구체적으로, 변환부(120)는 손실 함수를 이용하여 입력 이미지의 픽셀 값(pixel value)을 조정함으로써, 입력 이미지의 스타일을 스타일 이미지의 스타일과 유사하게 변환하고, 입력 이미지의 컨텐트를 컨텐트 이미지의 컨텐트와 유사하게 변환한다. 변환된 이미지는 스타일 이미지의 스타일과 컨텐트 이미지의 컨텐트를 포함한다.The conversion unit 120 is a component that converts the style and content of an input image using a style loss function and a content loss function. Specifically, the conversion unit 120 converts the style of the input image to be similar to the style of the style image by adjusting the pixel value of the input image using a loss function, and transforms the content of the input image into the content image of the content image. Transform similarly to content. The converted image includes the style of the style image and the content of the content image.

다만, 전술한 과정을 통해 한 번 변환된 입력 이미지는 스타일 이미지와 컨텐트 이미지와 유사하다고 볼 수 없기 때문에 전술한 과정을 반복해야 한다. 따라서, 변환된 입력 이미지는 다시 훈련된 신경망(100)의 입력 이미지로서 이용된다. 훈련된 신경망(100)은 변환된 입력 이미지에 대한 복수의 특징맵을 추출하고, 연산부(110)는 변환된 입력 이미지에 대한 복수의 특징맵에 기초하여 스타일 손실 함수 및 컨텐트 손실 함수를 연산하며, 변환부(120)는 변환된 입력 이미지에 손실 함수를 적용하여 두 번 변환된 입력 이미지를 생성한다.However, since an input image converted once through the above-described process cannot be regarded as similar to a style image and a content image, the above-described process must be repeated. Accordingly, the transformed input image is used as an input image of the trained neural network 100 again. The trained neural network 100 extracts a plurality of feature maps for the converted input image, and the calculator 110 calculates a style loss function and a content loss function based on the plurality of feature maps for the converted input image, The conversion unit 120 applies a loss function to the converted input image to generate an input image converted twice.

이미지 변환 장치는 기 설정된 횟수, 기 설정된 시간 및 기 설정된 손실 함수 값 중 적어도 어느 하나에 의해 설정된 조건을 만족할 때까지 위의 과정을 반복하며, 마지막으로 변환된 이미지를 최종 출력 이미지로 결정한다. 결과적으로, 출력 이미지는 스타일 이미지의 스타일과 컨텐트 이미지의 컨텐트가 적절하게 합성된 이미지이다.The image conversion device repeats the above process until a condition set by at least one of a preset number of times, a preset time, and a preset loss function value is satisfied, and finally determines the converted image as the final output image. As a result, the output image is an image in which the style of the style image and the content of the content image are appropriately synthesized.

도 2는 본 발명의 일 실시예에 따른 이미지 변환 장치의 동작 과정을 설명하기 위한 도면이다.2 is a diagram for explaining an operation process of an image conversion device according to an embodiment of the present invention.

도 2를 참조하면, 이미지 변환 장치가 스타일 손실 함수(

)와 컨텐트 손실 함수(

)를 연산하고, 입력 이미지에 적용하는 과정이 도시되어 있다.Referring to Figure 2, the image conversion device is a style loss function (

) and the content loss function (

) is calculated and applied to the input image.

이미지 변환 장치는 훈련된 신경망(100)에 스타일 이미지를 제공하고, 각각의 레이어로부터 추출된 복수의 특징맵을 획득한다. 하나의 컨볼루션 레이어는 복수의 필터를 포함하고, 복수의 필터를 이용하여 입력 데이터에 대한 특징맵을 추출한다. 복수의 컨볼루션 레이어 중 입력단에 가까운 레이어일수록 이미지의 주요 컨텐트(형상, 윤곽 등)을 추출하고, 출력단에 가까운 레이어일수록 이미지의 세세한 스타일(색감, 선명도 등)을 추출한다.The image conversion device provides a style image to the trained neural network 100 and obtains a plurality of feature maps extracted from each layer. One convolution layer includes a plurality of filters, and a feature map for input data is extracted using the plurality of filters. Among the plurality of convolution layers, the closer to the input layer extracts the main content of the image (shape, outline, etc.), and the closer to the output layer, the more detailed style (color, sharpness, etc.) of the image is extracted.

도 2에서

는 첫 번째 컨볼루션 레이어가 스타일 이미지에 대해 추출한 i개의 특징맵을 의미하고,

는 두 번째 컨볼루션 레이어가 첫 번째 컨볼루션 레이어의 특징맵에 대해 추출한 i개의 특징맵을 의미한다. 여기서, 각각의 컨볼루션 레이어는 i개의 특징맵을 추출한다고 설명했지만, 이는 하나의 실시예에 불과하며 각 컨볼루션 레이어는 같은 수의 특징맵을 생성할 수도 있고, 서로 다른 수의 특징맵을 생성할 수도 있다. 또한, 이미지 변환 장치는 훈련된 신경망(100)에 입력 이미지를 제공하고, 각각의 레이어로부터 추출된 복수의 특징맵을 획득한다. in Figure 2

Means i feature maps extracted for the style image by the first convolution layer,

denotes i feature maps extracted from the feature maps of the first convolution layer by the second convolution layer. Here, it has been described that each convolution layer extracts i feature maps, but this is only one embodiment, and each convolution layer may generate the same number of feature maps or a different number of feature maps. You may. In addition, the image conversion device provides an input image to the trained neural network 100 and obtains a plurality of feature maps extracted from each layer.

이미지 변환 장치는 스타일 이미지 및 입력 이미지에 대해 모든 컨볼루션 레이어로부터 추출된 특징맵을 이용함으로써, 스타일 이미지의 세세한 스타일들을 입력 이미지에 적용할 수 있다.The image conversion device may apply detailed styles of the style image to the input image by using feature maps extracted from all convolution layers for the style image and the input image.

이후, 이미지 변환 장치는 훈련된 신경망(100)에 컨텐트 이미지를 제공하고, 복수의 컨볼루션 레이어 중 하나의 컨볼루션 레이어로부터 추출된 복수의 특징맵을 획득한다. 여기서, 하나의 컨볼루션 레이어는 복수의 컨볼루션 레이어 중 임의의 한 레이어, 신경망의 마지막 출력단 레이어 또는 그 직전 레이어 중 어느 하나를 의미할 수 있다. Thereafter, the image conversion device provides the content image to the trained neural network 100 and obtains a plurality of feature maps extracted from one convolution layer among a plurality of convolution layers. Here, one convolution layer may mean any one of an arbitrary one of a plurality of convolution layers, a last output layer of a neural network, or a layer immediately before that.

이미지 변환 장치는 컨텐트 이미지 및 입력 이미지에 대해 하나의 컨볼루션 레이어로부터 추출된 특징맵을 이용함으로써, 컨텐트 이미지의 주요 컨텐트(형상, 윤곽 등)를 입력 이미지에 적용할 수 있다.The image conversion apparatus may apply the main content (shape, contour, etc.) of the content image to the input image by using a feature map extracted from one convolution layer for the content image and the input image.

연산부(110)는 스타일 손실 함수를 연산하기 위해, 하나의 컨볼루션 레이어에서 출력된 복수의 특징맵 간 상관 관계(co-relation)를 연산한다. 여기서, 상관 관계(

란 복수의 특징맵 간 상관 관계를 의미하며, 또한 복수의 특징맵을 추출한 복수의 필터 간 상관 관계를 의미하고, 복수의 특징맵에 대한 그램 행렬(Gram matrix)을 통해 연산된다.The calculator 110 calculates a correlation between a plurality of feature maps output from one convolution layer in order to calculate a style loss function. Here, the correlation (

means a correlation between a plurality of feature maps, and also means a correlation between a plurality of filters from which a plurality of feature maps are extracted, and is calculated through a Gram matrix for a plurality of feature maps.

도 2에서

는 l번째 컨볼루션 레이어가 스타일 이미지에 대해 추출한 i개의 특징맵으로부터 연산된 그램 행렬을 의미한다. 그램 행렬은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 자명한 기술이므로 이에 대한 자세한 설명은 생략한다.in Figure 2

denotes a gram matrix calculated from i feature maps extracted from the style image by the lth convolution layer. Since the gram matrix is a technology that is obvious to those skilled in the art, a detailed description thereof will be omitted.

연산부(110)는 각각의 컨볼루션 레이어마다 컨볼루션 레이어가 스타일 이미지에 대해 추출한 복수의 특징맵 사이의 상관 관계 및 각각의 컨볼루션 레이어가 입력 이미지에 대해 추출한 복수의 특징맵 사이의 상관 관계를 연산한다. 연산부(110)는 스타일 이미지에 대한 상관 관계와 입력 이미지에 대한 상관 관계의 평균 제곱 오차(mean square error: MSE)를 연산하고, 이를 모두 더한 값을 스타일 손실 함수로 결정한다. 스타일 손실 함수는 수학식 1로 표현될 수 있다.The calculator 110 calculates a correlation between a plurality of feature maps extracted from the style image by the convolution layer and a correlation between a plurality of feature maps extracted from the input image by each convolution layer for each convolution layer. do. The calculator 110 calculates a mean square error (MSE) of the correlation for the style image and the correlation for the input image, and determines a sum of these values as the style loss function. The style loss function can be expressed as Equation 1.

수학식 1에서 l은 컨볼루션 레이어의 수, i는 하나의 컨볼루션 레이어로부터 추출된 특징맵의 수,

는 각 컨볼루션 레이어가 추출한 특징에 대한 가중치(weight), G는 그램 행렬, F(S)는 스타일 이미지에 대한 특징맵, F(I)는 입력 이미지에 대한 특징맵을 의미한다.In Equation 1, l is the number of convolutional layers, i is the number of feature maps extracted from one convolutional layer,

is the weight of the feature extracted by each convolution layer, G is the gram matrix, F(S) is the feature map for the style image, and F(I) is the feature map for the input image.

또한, 연산부(110)는 하나의 컨볼루션 레이어로부터 추출된 복수의 특징맵 중 컨텐트 이미지에 대한 복수의 특징맵과 입력 이미지에 대한 복수의 특징맵 사이의 평균 제곱 오차를 연산하고, 이를 모두 더한 값을 컨텐트 손실 함수로 결정한다. 스타일 손실 함수는 수학식 2로 표현될 수 있다.In addition, the calculation unit 110 calculates the mean square error between a plurality of feature maps for the content image and a plurality of feature maps for the input image among a plurality of feature maps extracted from one convolution layer, and calculates the sum of all of them is determined as the content loss function. The style loss function can be expressed as Equation 2.

연산부(110)는 스타일 손실 함수와 컨텐트 손실 함수에 기초하여 총 손실 함수(

)를 결정한다. 총 손실 함수는 스타일 손실 함수에 가중치가 곱해진 값과 컨텐트 손실 함수에 가중치가 곱해진 값의 합일 수 있다. 두 가중치의 비율은 적절하게 조정될 수 있다.The calculation unit 110 calculates a total loss function (based on the style loss function and the content loss function).

) to determine The total loss function may be the sum of a value obtained by multiplying a style loss function with a weight and a value obtained by multiplying a weighted value with a content loss function. The ratio of the two weights can be adjusted appropriately.

변환부(120)는 총 손실 함수를 이용하여 입력 이미지의 스타일과 컨텐트를 변환한다. 변환하는 과정은 수학식 3을 통해 수행될 수 있다.The conversion unit 120 converts the style and content of the input image using the total loss function. The conversion process may be performed through Equation 3.

수학식 3에서

은 변환된 입력 이미지,

는 변환 가중치,

는 경사 하강(gradient descent)를 의미한다.in Equation 3

is the transformed input image,

is the transform weight,

stands for gradient descent.

도 2에서 변환된 입력 이미지는 입력 이미지에 수학식 3이 적용된 이미지이다. 즉, 입력 이미지의 각 픽셀은 수학식 3을 통해 변환된 입력 이미지의 픽셀로 변환된다. 이후, 변환된 입력 이미지는 훈련된 신경망(100)의 입력 이미지로서 이용된다.The input image converted in FIG. 2 is an image to which Equation 3 is applied to the input image. That is, each pixel of the input image is converted into a pixel of the converted input image through Equation 3. Then, the converted input image is used as an input image of the trained neural network 100 .

전술한 과정을 반복할수록, 변환된 입력 이미지의 스타일 및 컨텐트는 스타일 이미지의 스타일과 컨텐트 이미지의 컨텐트에 가까워지며, 총 손실 함수는 매 과정 마다 감소한다. 변환된 입력 이미지가 다시 입력 이미지로 이용되는 횟수는 기 설정된 횟수, 기 설정된 시간, 기 설정된 손실 함수 값 등에 따라 결정된다. 이미지 변환 장치는 최종적으로 변환된 이미지를 출력 이미지로 출력한다.As the above process is repeated, the style and content of the converted input image become closer to the style of the style image and the content of the content image, and the total loss function decreases with each process. The number of times the converted input image is used again as an input image is determined according to a preset number of times, a preset time period, and a preset loss function value. The image conversion device outputs the finally converted image as an output image.

도 3은 본 발명의 일 실시예에 따라 추가 손실 함수 및 혼합 이미지를 이용하는 이미지 변환 장치의 동작 과정을 설명하기 위한 도면이다.3 is a diagram for explaining an operation process of an image conversion device using an additive loss function and mixed images according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시예에 따른 이미지 변환 장치(미도시)는 입력 이미지로서 화이트 노이즈 이미지 및 컨텐트 이미지 외에 혼합 이미지를 이용할 수 있다. 혼합 이미지란 입력 이미지의 픽셀 값과 이 픽셀 값에 대응되는 스타일 이미지의 픽셀 값을 혼합한 이미지이다. 혼합 이미지의 생성은 수학식 4와 같이 표현될 수 있다.Referring to FIG. 3 , an image conversion device (not shown) according to an embodiment of the present invention may use a mixed image as an input image in addition to a white noise image and a content image. A mixed image is an image in which pixel values of an input image and pixel values of a style image corresponding to the pixel values are mixed. Generation of the mixed image can be expressed as Equation 4.

수학식 3에서

는 혼합 이미지,

는 혼합 비율, I는 입력 이미지, S는 스타일 이미지를 의미한다. 혼합 비율은 실험적으로 결정되고, 실시예 마다 달라질 수 있다.in Equation 3

is a blended image,

is the blending ratio, I is the input image, and S is the style image. The mixing ratio is determined experimentally and may vary from example to example.

입력 이미지를 훈련된 신경망(100)에 입력하기 전에 스타일 이미지와 혼합함으로써, 스타일 이미지의 스타일을 입력 이미지에 적용하는 데 드는 시간을 줄일 수 있다. 또한, 입력 이미지가 컨텐트 이미지일 때, 스타일 이미지의 스타일과 컨텐트 이미지의 컨텐트를 입력 이미지에 적용하는데 드는 시간을 훨씬 줄일 수 있다.By mixing the input image with the style image before inputting it to the trained neural network 100, the time required to apply the style of the style image to the input image can be reduced. Also, when the input image is a content image, the time required to apply the style of the style image and the content of the content image to the input image can be significantly reduced.

이미지 변환 장치가 혼합 이미지를 이용하여 스타일 손실 함수와 컨텐트 손실 함수를 계산하는 과정은 도 2에서 설명한 바와 같으므로 자세한 설명은 생략한다.Since the process of calculating the style loss function and the content loss function by the image conversion device using the mixed images is the same as that described in FIG. 2, a detailed description thereof will be omitted.

한편, 본 발명의 일 실시예에 따른 이미지 변환 장치는 이미지 변환 시간을 줄이기 위해, 스타일 이미지와 입력 이미지 사이의 상대적 엔트로피(relative entropy)를 추가적인 손실 함수로 이용한다. 상대적 엔트로피란 두 이미지의 픽셀 분포도에 대한 닮은 정도를 의미한다. Meanwhile, the image conversion apparatus according to an embodiment of the present invention uses relative entropy between a style image and an input image as an additional loss function to reduce image conversion time. Relative entropy refers to the similarity of the pixel distributions of two images.

구체적으로, 연산부(110)는 훈련된 신경망(100)의 출력단 컨볼루션 레이어로부터 추출된 복수의 특징맵 중 스타일 이미지에 대한 복수의 특징맵과 입력 이미지에 대한 복수의 특징맵 간 상대적 엔트로피를 연산하고, 변환부(120)는 상대적 엔트로피에 추가적으로 기초하여 입력 이미지의 스타일과 컨텐트를 변환한다. 출력단 컨볼루션 레이어가 입력 이미지가 입력되는 레이어로부터 가장 먼 레이어일 때, 이로부터 추출된 특징맵은 이미지의 디테일한 정보를 포함한다. 따라서, 이미지 변환 장치는 출력단 컨볼루션 레이어로부터 연산된 상대적 엔트로피를 더 이용함으로써, 스타일 이미지의 스타일 특징을 입력 이미지에 더 빨리 반영할 수 있다.Specifically, the calculation unit 110 calculates relative entropy between a plurality of feature maps for a style image and a plurality of feature maps for an input image among a plurality of feature maps extracted from the output convolution layer of the trained neural network 100, , The conversion unit 120 converts the style and content of the input image additionally based on the relative entropy. When the output convolution layer is the farthest layer from the input image input layer, the feature map extracted therefrom includes detailed information of the image. Accordingly, the image conversion device may more quickly reflect the style characteristics of the style image to the input image by further using the relative entropy calculated from the output convolution layer.

스타일 이미지와 컨텐트 이미지의 상대적 엔트로피는 수학식 5와 같이 표현될 수 있다.The relative entropy of the style image and the content image can be expressed as Equation 5.

수학식 5에서

은 쿨백-라이블러 발산(kullback-leibler divergence),

는 입력 이미지에 대해 출력단 레이어로부터 추출된 복수의 특징맵,

는 스타일 이미지에 대해 출력단 레이어로부터 추출된 복수의 특징맵을 의미한다. 쿨백-라이블러 발산을 계산하는 과정은 해당 기술분야에서 통상의 지식을 가진 자에게 자명하므로 이에 대한 설명은 생략한다.in Equation 5

is the kullback-leibler divergence,

Is a plurality of feature maps extracted from the output stage layer for the input image,

Means a plurality of feature maps extracted from the output stage layer for the style image. Since the process of calculating the Kullback-Leibler divergence is obvious to those skilled in the art, a description thereof will be omitted.

연산부(110)는 스타일 손실 함수, 컨텐트 손실 함수 및 상대적 엔트로피에 기초하여 총 손실 함수(

)를 결정한다. 총 손실 함수는 스타일 손실 함수에 가중치가 곱해진 값, 컨텐트 손실 함수에 가중치가 곱해진 값, 및 상대적 엔트로피에 가중치가 곱해진 값의 합일 수 있다. 세 가중치의 비율은 적절하게 조정될 수 있다.The calculation unit 110 calculates a total loss function (based on the style loss function, content loss function, and relative entropy).

) to determine The total loss function may be the sum of a value obtained by multiplying a style loss function with a weight, a value obtained by multiplying a content loss function with a weight, and a value obtained by multiplying a relative entropy with a weight. The ratio of the three weights can be adjusted appropriately.

변환부(120)는 총 손실 함수를 입력 이미지에 적용함으로써, 입력 이미지를 스타일 이미지의 스타일과 컨텐트 이미지의 컨텐트를 포함하는 변환된 입력 이미지로 변환한다.The conversion unit 120 converts the input image into a converted input image including the style of the style image and the content of the content image by applying the total loss function to the input image.

도 4는 본 발명의 일 실시예에 따른 이미지 변환 장치의 동작 방법을 설명하기 위한 순서도이다.4 is a flowchart illustrating an operating method of an image conversion device according to an embodiment of the present invention.

이하에서, 과정 S400 내지 과정 S404는 각 시간 순서대로 수행될 수 있으나, 이에 한정되는 것은 아니고 세 과정이 순서를 달리하거나 병렬적으로 수행될 수도 있다. 또한, 과정 S406 및 과정 S408도 순서가 변경되거나 병렬적으로 실행될 수 있다.Hereinafter, steps S400 to S404 may be performed in each time order, but are not limited thereto, and the three steps may be performed in a different order or in parallel. Also, steps S406 and S408 may be executed out of order or in parallel.

이미지 변환 장치는 훈련된 신경망에 스타일 이미지를 입력하고, 스타일 이미지에 대한 복수의 특징맵을 획득한다(S400). 훈련된 신경망은 복수의 컨볼루션 레이어를 포함하되, 각각의 컨볼루션 레이어마다 이미지에 대한 복수의 특징맵을 추출하도록 훈련된 신경망이다.The image conversion device inputs a style image to the trained neural network and acquires a plurality of feature maps for the style image (S400). The trained neural network includes a plurality of convolutional layers and is trained to extract a plurality of feature maps of an image for each convolutional layer.

이미지 변환 장치는 훈련된 신경망에 입력 이미지를 입력하고, 입력 이미지에 대한 복수의 특징맵을 획득한다(S402).The image conversion device inputs an input image to the trained neural network and obtains a plurality of feature maps for the input image (S402).

본 발명의 일 실시예에 따른 이미지 변환 장치는 입력 이미지의 픽셀 값과 픽셀 값에 대응되는 스타일 이미지의 픽셀 값을 혼합한 혼합 이미지를 입력 이미지로 이용할 수 있다.An image conversion device according to an embodiment of the present invention may use a mixed image obtained by mixing pixel values of an input image and pixel values of a style image corresponding to the pixel values as an input image.

이미지 변환 장치는 훈련된 신경망에 컨텐트 이미지를 입력하고, 컨텐트 이미지에 대한 복수의 특징맵을 획득한다(S404).The image conversion device inputs the content image to the trained neural network and obtains a plurality of feature maps of the content image (S404).

이미지 변환 장치는 복수의 컨볼루션 레이어로부터 추출된 복수의 특징맵 중 스타일 이미지에 대한 복수의 특징맵과 입력 이미지에 대한 복수의 특징맵에 기초하여 스타일 손실 함수를 연산한다(S406). 여기서, 스타일 손실 함수는 각각의 컨볼루션 레이어마다 연산된, 스타일 이미지에 대한 상관 관계와 입력 이미지에 대한 상관 관계의 평균 제곱 오차의 합이다.The image conversion apparatus calculates a style loss function based on a plurality of feature maps for a style image and a plurality of feature maps for an input image among a plurality of feature maps extracted from a plurality of convolution layers (S406). Here, the style loss function is the sum of the mean square errors of the correlation for the style image and the correlation for the input image, calculated for each convolution layer.

이미지 변환 장치는 하나의 컨볼루션 레이어로부터 추출된 복수의 특징맵 중 컨텐트 이미지에 대한 복수의 특징맵과 입력 이미지에 대한 복수의 특징맵에 기초하여 컨텐트 손실 함수를 연산한다(S408). 여기서, 컨텐트 손실 함수는 하나의 컨볼루션 레이어로부터 추출된 복수의 특징맵 중 컨텐트 이미지에 대한 복수의 특징맵과 입력 이미지에 대한 복수의 특징맵의 평균 제곱 오차의 합이다.The image conversion apparatus calculates a content loss function based on a plurality of feature maps for a content image and a plurality of feature maps for an input image among a plurality of feature maps extracted from one convolution layer (S408). Here, the content loss function is the sum of mean square errors of a plurality of feature maps for the content image and a plurality of feature maps for the input image among a plurality of feature maps extracted from one convolution layer.

이미지 변환 장치는 스타일 손실 함수 및 컨텐트 손실 함수를 이용하여 입력 이미지의 스타일과 컨텐트를 변환한다(S410).The image conversion device converts the style and content of the input image using the style loss function and the content loss function (S410).

본 발명의 일 실시예에 따른 이미지 변환 장치는 출력단 컨볼루션 레이어로부터 추출된 복수의 특징맵 중 스타일 이미지에 대한 복수의 특징맵과 입력 이미지에 대한 복수의 특징맵 간 상대적 엔트로피를 연산하는 과정을 더 포함한다. 이때, 이미지 변환 장치가 입력 이미지의 스타일과 컨텐트를 변환하는 과정은 스타일 손실 함수, 컨텐트 손실 함수, 및 상대적 엔트로피에 기초하여 입력 이미지의 스타일과 컨텐트를 변환하는 과정이다.The image conversion apparatus according to an embodiment of the present invention further includes a process of calculating relative entropy between a plurality of feature maps for a style image and a plurality of feature maps for an input image among a plurality of feature maps extracted from an output convolution layer. include At this time, the process of converting the style and content of the input image by the image conversion device is a process of converting the style and content of the input image based on the style loss function, the content loss function, and relative entropy.

이미지 변환 장치는 변환된 입력 이미지를 훈련된 신경망에 입력한다(S412). 변환된 입력 이미지가 다시 입력 이미지로 이용되는 횟수는 기 설정된 횟수, 기 설정된 시간, 기 설정된 손실 함수 값 등에 따라 결정된다. 이미지 변환 장치는 최종적으로 변환된 이미지를 출력 이미지로 출력한다.The image conversion device inputs the converted input image to the trained neural network (S412). The number of times the converted input image is used again as an input image is determined according to a preset number of times, a preset time period, and a preset loss function value. The image conversion device outputs the finally converted image as an output image.

도 4에서는 과정 S400 내지 과정 S412를 순차적으로 실행하는 것으로 기재하고 있으나, 이는 본 발명의 일 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것이다. 다시 말해, 본 발명의 일 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 일 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 4에 기재된 순서를 변경하여 실행하거나 과정 S400 내지 과정 S412 중 하나 이상의 과정을 병렬적으로 실행하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이므로, 도 4는 시계열적인 순서로 한정되는 것은 아니다.Although it is described in FIG. 4 that steps S400 to S412 are sequentially executed, this is merely an example of the technical idea of an embodiment of the present invention. In other words, those skilled in the art to which an embodiment of the present invention pertains may change and execute the sequence described in FIG. 4 without departing from the essential characteristics of the embodiment of the present invention, or one of steps S400 to S412. Since it will be possible to apply various modifications and variations by executing the above process in parallel, FIG. 4 is not limited to a time-series sequence.

한편, 도 4에 도시된 과정들은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 즉, 이러한 컴퓨터가 읽을 수 있는　기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등의 비일시적인(non-transitory) 매체일 수 있으며, 또한 캐리어 웨이브(예를 들어, 인터넷을 통한 전송) 및 데이터 전송 매체(data transmission medium)와 같은 일시적인(transitory) 매체를 더 포함할 수도 있다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.Meanwhile, the processes shown in FIG. 4 can be implemented as computer readable codes on a computer readable recording medium. A computer-readable recording medium includes all types of recording devices in which data that can be read by a computer system is stored. That is, such a computer-readable recording medium may be a non-transitory medium such as ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc., and may also be a carrier wave (e.g. , transmission over the Internet) and a transitory medium such as a data transmission medium. In addition, the computer-readable recording medium may be distributed to computer systems connected through a network to store and execute computer-readable codes in a distributed manner.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an example of the technical idea of the present embodiment, and various modifications and variations can be made to those skilled in the art without departing from the essential characteristics of the present embodiment. Therefore, the present embodiments are not intended to limit the technical idea of the present embodiment, but to explain, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The scope of protection of this embodiment should be construed according to the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of rights of this embodiment.

100: 훈련된 신경망 110: 연산부
120: 변환부100: trained neural network 110: calculation unit
120: conversion unit

Claims

An image conversion device for applying the style of a style image and the content of a content image to an input image,
A neural network that includes a plurality of convolution layers and is trained to extract a plurality of feature maps for an image for each convolution layer, wherein the trained neural network includes pixel values of an initial image and the pixel values A plurality of feature maps for the input image are extracted in response to input of an input image, which is an image in which pixel values of style images corresponding to values are mixed, and a plurality of feature maps for the style image are extracted in response to input of the style image. Extracting a map and extracting a plurality of feature maps for the content image in response to an input of the content image -;
A style loss function is calculated based on the plurality of feature maps of the style image and the plurality of feature maps of the input image extracted from the plurality of convolution layers, and one selected from the plurality of convolution layers is selected. a calculator for calculating a content loss function based on a plurality of feature maps of the content image and a plurality of feature maps of the input image extracted from a convolution layer; and
a conversion unit adjusting pixel values of the input image using the style loss function and the content loss function;
Image conversion device comprising a.

According to claim 1,
The calculation unit further calculates a correlation between a plurality of feature maps output from one convolution layer,
The style loss function is
It is the sum of the mean square error (MSE) of the correlation for the style image and the correlation for the input image, calculated for each convolution layer,
The content loss function,
The image conversion device is the sum of mean square errors between a plurality of feature maps for the content image and a plurality of feature maps for the input image among the plurality of feature maps extracted from the one convolution layer.

According to claim 1,
The operation unit further calculates relative entropy between a plurality of feature maps for the style image and a plurality of feature maps for the input image among the plurality of feature maps extracted from the output convolution layer,
The conversion unit converts the style and content of the input image additionally based on the relative entropy.

According to claim 3,
The relative entropy is,
An image conversion device that is a Kullback Leibler divergence value between a plurality of feature maps for the style image and a plurality of feature maps for the input image among the plurality of feature maps extracted from the output convolution layer.

delete

A method of operating an image conversion device that applies the style of a style image and the content of a content image to an input image,
generating, as an input image, a mixed image obtained by mixing pixel values of an initial image and pixel values of a style image corresponding to the pixel values;
inputting a style image to a neural network trained to extract a plurality of feature maps of an image for each convolution layer including a plurality of convolution layers, and acquiring a plurality of feature maps of the style image;
inputting the input image to the trained neural network and acquiring a plurality of feature maps of the input image;
inputting the content image to the trained neural network and acquiring a plurality of feature maps of the content image;
calculating a style loss function based on a plurality of feature maps for the style image and a plurality of feature maps for the input image among a plurality of feature maps extracted from the plurality of convolution layers;
calculating a content loss function based on a plurality of feature maps of the content image and a plurality of feature maps of the input image among a plurality of feature maps extracted from one convolution layer selected from the plurality of convolution layers; ; and
Adjusting pixel values of the input image using the style loss function and the content loss function
Method of operating an image conversion device comprising a.

According to claim 6,
The style loss function is
It is the sum of the mean square errors of the correlation for the style image and the correlation for the input image, calculated for each convolution layer,
The content loss function,
The method of operating the image conversion device, which is the sum of mean square errors of a plurality of feature maps for the content image and a plurality of feature maps for the input image among a plurality of feature maps extracted from the one convolution layer.

According to claim 6,
Further comprising calculating relative entropy between a plurality of feature maps for the style image and a plurality of feature maps for the input image among the plurality of feature maps extracted from the output convolution layer,
The converting process is a process of converting the style and content of the input image additionally based on the relative entropy.

According to claim 8,
The relative entropy is,
A Kullback-Leibler divergence value between a plurality of feature maps for the style image and a plurality of feature maps for the input image among a plurality of feature maps extracted from the output convolution layer.

delete

According to claim 6,
inputting an input image whose style and content are converted into the trained neural network when a condition set by at least one of a preset number of times, a preset time, and a preset loss function is not satisfied;
Method of operating an image conversion device further comprising a.

A computer-readable recording medium recording a program for executing the method of any one of claims 6 to 9 and 11 on a computer.