KR102229572B1

KR102229572B1 - Apparatus and method for image style transfer

Info

Publication number: KR102229572B1
Application number: KR1020190155838A
Authority: KR
Inventors: 최현철
Original assignee: 영남대학교 산학협력단
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2021-03-18

Abstract

Disclosed are an image style conversion device and a method thereof. According to an embodiment of the present invention, an image style conversion model learning device comprises: a model learning unit learning an image style conversion model for converting a style of a content image based on the content image, a style image, and one or more style parameters; and a loss calculation unit calculating a conversion loss based on the content image, a style conversion image for the content image, and the style image. Therefore, distortion in the converted content image can be prevented.

Description

Apparatus and method for converting image style {APPARATUS AND METHOD FOR IMAGE STYLE TRANSFER}

개시되는 실시예들은 영상 스타일 변환 기술과 관련된다.The disclosed embodiments relate to a video style conversion technique.

영상 편집 기술 및 효과 삽입 기술의 발전에 따라, 개개인은 PC, 태블릿, 스마트폰 등 다양한 기기를 이용하여 자신이 선택한 영상을 편집하고, 자신이 원하는 효과를 적용하여 자신만의 영상을 만들어낼 수 있게 되었다.With the development of video editing technology and effect insertion technology, individuals can use various devices such as PCs, tablets, and smartphones to edit the video they choose and apply the desired effect to create their own video. Became.

최근에는, 개인의 영상에 다른 영상의 스타일(style)을 덧씌우는 변환을 통해 새로운 느낌의 영상을 생성하는 기술이 주목받고 있으나, 이러한 스타일 변환 효과의 세기를 0으로 조절할 때에도 원본 영상 대신 스타일이 변환되어 편향된 영상이 출력되는 문제점이 존재한다.Recently, a technology that creates a new feeling image through transformation in which an individual's image is overlaid with another image style is attracting attention, but even when the intensity of the style conversion effect is adjusted to 0, the style is converted instead of the original image. There is a problem in that a biased image is output.

또한, 기존의 스타일 변환 기술은 여러가지의 스타일 효과를 각기 다른 세기로 적용할 경우 원본 영상의 일부분이 검게 출력되거나, 스타일 변환 효과의 세기를 세밀하게 조절하는 경우 그 조절 정도만큼 원본 영상의 스타일을 세밀하게 변환하지 못한다는 문제점을 갖고 있다.In addition, when applying various style effects at different intensities, the existing style conversion technology makes a part of the original image black, or when the intensity of the style conversion effect is finely adjusted, the style of the original image is refined by the degree of adjustment. It has a problem that it cannot be converted properly.

대한민국 등록특허공보 제10-1882111호 (2018.07.19. 등록)Republic of Korea Patent Publication No. 10-1882111 (registered on July 19, 2018)

개시되는 실시예들은 영상의 스타일을 다양한 효과 및 세기로 변환하기 위한 것이다.The disclosed embodiments are for converting an image style into various effects and intensities.

일 실시예에 따른 영상 스타일 변환 모델 학습 장치는, 컨텐츠 영상(content image), 스타일 영상(style image) 및 하나 이상의 스타일 파라미터(style parameter)에 기초하여 상기 컨텐츠 영상의 스타일을 변환하는 영상 스타일 변환 모델을 학습시키는 모델 학습부 및 상기 컨텐츠 영상, 상기 컨텐츠 영상에 대한 스타일 변환 영상 및 상기 스타일 영상에 기초하여 변환 손실(loss)을 산출하는 손실 계산부를 포함하고, 상기 하나 이상의 스타일 파라미터 각각의 값은, 상기 영상 스타일 변환 모델이 기 설정된 횟수만큼 반복하여 학습될 때마다 기 설정된 범위 내의 값 중에서 임의로 선택되고, 상기 모델 학습부는, 상기 변환 손실에 기초하여 상기 영상 스타일 변환 모델을 학습시킨다.An image style transformation model learning apparatus according to an embodiment is an image style transformation model that converts a style of the content image based on a content image, a style image, and one or more style parameters. A model learning unit for learning the content image, a style conversion image for the content image, and a loss calculator configured to calculate a conversion loss based on the style image, wherein each value of the one or more style parameters, Each time the video style conversion model is repeatedly trained a predetermined number of times, it is randomly selected from values within a preset range, and the model learning unit learns the video style conversion model based on the conversion loss.

상기 모델 학습부는, 상기 하나 이상의 스타일 파라미터의 값이 모두 상기 기 설정된 범위 내의 최소 값인 경우, 상기 영상 스타일 변환 모델이 상기 스타일 영상 대신 상기 컨텐츠 영상을 사용하도록 상기 영상 스타일 변환 모델을 학습시킬 수 있다.When all values of the one or more style parameters are minimum values within the preset range, the model learning unit may train the image style transformation model so that the image style transformation model uses the content image instead of the style image.

상기 영상 스타일 변환 모델은, 상기 컨텐츠 영상 및 상기 스타일 영상 각각으로부터 복수의 특징 벡터(feature vector)를 생성하는 하나 이상의 인코더, 상기 하나 이상의 스타일 파라미터 및 상기 스타일 영상의 복수의 특징 벡터에 기초하여 상기 컨텐츠 영상의 복수의 특징 벡터 중 적어도 일부를 변환하는 하나 이상의 변환부 및 상기 변환된 특징 벡터와 상기 컨텐츠 영상의 복수의 특징 벡터 중 변환되지 않은 특징 벡터에 기초하여, 상기 스타일 변환 영상을 생성하는 하나 이상의 디코더를 포함할 수 있다.The video style conversion model includes at least one encoder that generates a plurality of feature vectors from each of the content image and the style image, the at least one style parameter, and the content based on a plurality of feature vectors of the style image. One or more conversion units for converting at least a portion of a plurality of feature vectors of an image, and one or more generating the style-converted image based on the converted feature vector and an unconverted feature vector among a plurality of feature vectors of the content image. It may include a decoder.

상기 하나 이상의 스타일 파라미터 각각의 값은, 독립적으로 선택될 수 있고, 상기 하나 이상의 변환부는, 상기 하나 이상의 스타일 파라미터 중 각각 대응되는 스타일 파라미터의 값에 기초하여 상기 컨텐츠 영상의 복수의 특징 벡터 중 적어도 일부를 변환할 수 있다.Each value of the one or more style parameters may be independently selected, and the one or more conversion units may include at least some of a plurality of feature vectors of the content image based on a value of a style parameter corresponding to each of the one or more style parameters. Can be converted.

상기 하나 이상의 변환부는, 각각 상기 하나 이상의 인코더 중 대응되는 인코더에 의해 생성된 상기 컨텐츠 영상의 특징 벡터 및 상기 스타일 영상의 특징 벡터를 입력 받을 수 있다.Each of the one or more converters may receive a feature vector of the content image and a feature vector of the style image generated by a corresponding encoder among the one or more encoders.

상기 손실 계산부는, 컨텐츠 손실 및 스타일 손실에 기초하여 제1 손실을 산출하고, 상기 컨텐츠 영상 및 복원된 컨텐츠 영상에 기초하여 제2 손실을 산출하고, 상기 제1 손실 및 상기 제2 손실에 기초하여 상기 변환 손실을 산출할 수 있다.The loss calculator calculates a first loss based on a content loss and a style loss, calculates a second loss based on the content image and the restored content image, and calculates a second loss based on the first loss and the second loss. The conversion loss can be calculated.

일 실시예에 따른 영상 스타일 변환 모델 실행 장치는, 컨텐츠 영상, 스타일 영상 및 하나 이상의 스타일 파라미터에 대한 설정 값을 획득하는 입력부 및 상기 컨텐츠 영상, 상기 스타일 영상 및 상기 하나 이상의 스타일 파라미터에 대한 설정 값을 입력 데이터로 이용하는 사전 학습된 영상 스타일 변환 모델을 이용하여 상기 컨텐츠 영상에 대한 스타일 변환 영상을 생성하는 스타일 변환 영상 생성부를 포함하고, 상기 하나 이상의 스타일 파라미터에 대한 설정 값은, 기 설정된 범위 내에서 선택된 값이다.An image style conversion model execution apparatus according to an embodiment includes an input unit for acquiring a content image, a style image, and setting values for one or more style parameters, and a setting value for the content image, the style image, and the one or more style parameters. A style conversion image generator that generates a style conversion image for the content image using a pre-learned image style conversion model used as input data, and the setting values for the one or more style parameters are selected within a preset range. It's a value.

상기 스타일 변환 영상 생성부는, 상기 하나 이상의 스타일 파라미터에 대한 설정 값이 모두 상기 기 설정된 범위 내의 최소 값인 경우, 상기 영상 스타일 변환 모델이 상기 스타일 영상 대신 상기 컨텐츠 영상을 사용하도록 제어할 수 있다.When all of the set values for the one or more style parameters are minimum values within the preset range, the style conversion image generator may control the image style conversion model to use the content image instead of the style image.

상기 하나 이상의 스타일 파라미터에 대한 설정 값은, 독립적으로 선택될 수 있고, 상기 하나 이상의 변환부는, 상기 하나 이상의 스타일 파라미터 중 각각 대응되는 스타일 파라미터에 대한 설정 값에 기초하여 상기 컨텐츠 영상의 복수의 특징 벡터 중 적어도 일부를 변환할 수 있다.The setting values for the one or more style parameters may be independently selected, and the one or more conversion units may include a plurality of feature vectors of the content image based on a setting value for each corresponding style parameter among the one or more style parameters. At least some of them can be converted.

일 실시예에 따른 영상 스타일 변환 모델 학습 방법은, 컨텐츠 영상, 스타일 영상 및 하나 이상의 스타일 파라미터 각각의 값을 포함하는 입력 데이터를 영상 스타일 변환 모델에 입력하는 단계, 상기 입력 데이터에 기초하여 상기 영상 스타일 변환 모델에 의해 생성된 스타일 변환 영상, 상기 컨텐츠 영상 및 상기 스타일 영상에 기초하여 변환 손실을 산출하는 단계 및 상기 변환 손실에 기초하여 상기 영상 스타일 변환 모델의 학습 파라미터를 학습시키는 단계를 포함하고, 상기 입력하는 단계, 상기 학습시키는 단계 및 상기 산출하는 단계는 기 설정된 조건이 만족될 때까지 반복 수행되고, 상기 하나 이상의 스타일 파라미터 각각의 값은, 상기 반복 수행 시마다, 기 설정된 범위 내의 값 중에서 임의로 선택된다.According to an exemplary embodiment, a method for training a video style conversion model includes inputting input data including values of a content image, a style image, and one or more style parameters into an image style conversion model, based on the input data. Comprising a conversion loss based on the style conversion image generated by the conversion model, the content image, and the style image, and learning a learning parameter of the image style conversion model based on the conversion loss, the The step of inputting, the step of learning and the step of calculating are repeatedly performed until a preset condition is satisfied, and each value of the one or more style parameters is randomly selected from among values within a preset range each time the repetition is performed. .

상기 학습시키는 단계는, 상기 하나 이상의 스타일 파라미터 각각의 값이 모두 상기 기 설정된 범위 내의 최소 값인 경우, 상기 영상 스타일 변환 모델이 상기 스타일 영상 대신 상기 컨텐츠 영상을 사용하도록 상기 영상 스타일 변환 모델을 학습시킬 수 있다.The training may include training the image style transformation model so that the image style transformation model uses the content image instead of the style image when all values of each of the one or more style parameters are minimum values within the preset range. have.

상기 영상 스타일 변환 모델은, 상기 컨텐츠 영상 및 상기 스타일 영상 각각으로부터 복수의 특징 벡터(feature vector)를 생성하는 하나 이상의 인코더, 상기 하나 이상의 스타일 파라미터 및 상기 스타일 영상의 복수의 특징 벡터에 기초하여 상기 컨텐츠 영상의 복수의 특징 벡터 중 적어도 일부를 변환하는 하나 이상의 변환부 및 상기 변환된 특징 벡터와 상기 컨텐츠 영상의 복수의 특징 벡터 중 변환되지 않은 특징 벡터에 기초하여, 상기 컨텐츠 영상에 대한 스타일 변환 영상을 생성하는 하나 이상의 디코더를 포함할 수 있다.The video style conversion model includes at least one encoder that generates a plurality of feature vectors from each of the content image and the style image, the at least one style parameter, and the content based on a plurality of feature vectors of the style image. Based on at least one conversion unit for converting at least a portion of a plurality of feature vectors of an image and an unconverted feature vector from among the converted feature vectors and a plurality of feature vectors of the content image, a style-converted image for the content image is It may include one or more decoders to generate.

상기 산출하는 단계는, 컨텐츠 손실 및 스타일 손실에 기초하여 제1 손실을 산출하고, 상기 컨텐츠 영상 및 복원된 컨텐츠 영상에 기초하여 제2 손실을 산출하고, 상기 제1 손실 및 상기 제2 손실에 기초하여 상기 변환 손실을 산출할 수 있다.The calculating may include calculating a first loss based on a content loss and a style loss, calculating a second loss based on the content image and the restored content image, and calculating a second loss based on the first loss and the second loss. Thus, the conversion loss can be calculated.

일 실시예에 따른 영상 스타일 변환 모델 실행 방법은, 컨텐츠 영상, 스타일 영상 및 하나 이상의 스타일 파라미터에 대한 설정 값을 획득하는 단계 및 상기 컨텐츠 영상, 상기 스타일 영상 및 상기 하나 이상의 스타일 파라미터에 대한 설정 값을 입력 데이터로 이용하는 사전 학습된 영상 스타일 변환 모델을 이용하여 상기 컨텐츠 영상에 대한 스타일 변환 영상을 생성하는 단계를 포함하고, 상기 하나 이상의 스타일 파라미터에 대한 설정 값은, 기 설정된 범위 내에서 선택된다.A method of executing an image style conversion model according to an embodiment includes the steps of obtaining a content image, a style image, and setting values for one or more style parameters, and determining a setting value for the content image, the style image, and the one or more style parameters. And generating a style converted image for the content image by using a pre-trained image style conversion model used as input data, and setting values for the one or more style parameters are selected within a preset range.

상기 생성하는 단계는, 상기 하나 이상의 스타일 파라미터에 대한 설정 값이 모두 상기 기 설정된 범위 내의 최소 값인 경우, 상기 영상 스타일 변환 모델이 상기 스타일 영상 대신 상기 컨텐츠 영상을 사용하도록 제어할 수 있다.In the generating step, when all of the set values for the one or more style parameters are minimum values within the preset range, the image style conversion model may control to use the content image instead of the style image.

개시되는 실시예들에 따르면, 무작위 추출된 스타일 세기를 적용하여 영상 스타일 변환 모델을 학습시킴으로써, 영상 스타일 변환 모델 학습 시 계산의 복잡도 및 메모리 사용량을 증가시키지 않으면서 확률 회귀 학습(stochastic regression learning)을 수행할 수 있다.According to the disclosed embodiments, by applying a randomly extracted style strength to train an image style transformation model, stochastic regression learning is performed without increasing computational complexity and memory usage when training an image style transformation model. You can do it.

또한 개시되는 실시예들에 따르면, 실제로 컨텐츠 영상의 스타일을 변환하고자 할 때 적용되는 스타일 세기가 0인 경우, 영상 스타일 변환 모델에서 스타일 영상 대신 컨텐츠 영상을 사용하도록 함으로써, 변환된 컨텐츠 영상에서 왜곡이 발생하는 것을 방지할 수 있다.In addition, according to the disclosed embodiments, when the style intensity applied when actually converting the style of the content image is 0, distortion in the converted content image is caused by using the content image instead of the style image in the image style conversion model. It can be prevented from occurring.

또한 개시되는 실시예들에 따르면, 복수 개의 변환부에 서로 다른 인코더로부터 획득된 특징 벡터를 입력하고 및 서로 독립적으로 스타일 파라미터를 적용함으로써, 컨텐츠 영상에 제각기 다른 정도로 스타일 변환 효과를 입힐 수 있다.In addition, according to the disclosed embodiments, by inputting feature vectors obtained from different encoders to a plurality of transform units and independently applying style parameters to each other, style conversion effects may be applied to content images to different degrees.

도 1은 일 실시예에 따른 영상 스타일 변환 모델 학습 장치를 설명하기 위한 블록도
도 2는 일 실시예에 따른 영상 스타일 변환 모델을 설명하기 위한 블록도
도 3은 일 실시예에 따른 영상 스타일 변환 모델 및 손실 계산부의 구조를 상세히 설명하기 위한 도면
도 4는 일 실시예에 따른 영상 스타일 변환 모델 실행 장치를 설명하기 위한 블록도
도 5는 일 실시예에 따른 스타일 파라미터 선택 방법을 설명하기 위한 도면
도 6은 일 실시예에 따른 영상 스타일 변환 모델의 동작을 설명하기 위한 도면
도 7은 일 실시예에 따른 영상 스타일 변환 모델 학습 방법을 설명하기 위한 흐름도
도 8은 일 실시예에 따른 영상 스타일 변환 모델 실행 방법을 설명하기 위한 흐름도
도 9 내지 도 12는 일 실시예에 따른 영상 스타일 변환 결과를 나타낸 도면
도 13은 일 실시예에 따른 영상 스타일 변환 서비스의 제공 화면을 나타낸 도면
도 14는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도1 is a block diagram illustrating an apparatus for training an image style transformation model according to an exemplary embodiment;
2 is a block diagram illustrating an image style conversion model according to an exemplary embodiment;
3 is a diagram for explaining in detail the structure of an image style conversion model and a loss calculation unit according to an embodiment;
4 is a block diagram illustrating an apparatus for executing an image style conversion model according to an exemplary embodiment;
5 is a diagram for explaining a method of selecting a style parameter according to an exemplary embodiment
6 is a diagram for describing an operation of an image style conversion model according to an exemplary embodiment;
7 is a flowchart illustrating a method of training an image style transformation model according to an exemplary embodiment;
8 is a flowchart illustrating a method of executing an image style conversion model according to an exemplary embodiment;
9 to 12 are diagrams showing a result of converting an image style according to an exemplary embodiment;
13 is a diagram showing a screen for providing a video style conversion service according to an embodiment
14 is a block diagram illustrating and describing a computing environment including a computing device suitable for use in example embodiments.

이하, 도면을 참조하여 구체적인 실시형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 개시되는 실시예들은 이에 제한되지 않는다.Hereinafter, a specific embodiment will be described with reference to the drawings. The following detailed description is provided to aid in a comprehensive understanding of the methods, devices, and/or systems described herein. However, this is only an example, and the disclosed embodiments are not limited thereto.

실시예들을 설명함에 있어서, 관련된 공지기술에 대한 구체적인 설명이 개시되는 실시예들의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 개시되는 실시예들에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다.In describing the embodiments, if it is determined that a detailed description of related known technologies may unnecessarily obscure the subject matter of the disclosed embodiments, a detailed description thereof will be omitted. In addition, terms to be described later are terms defined in consideration of functions in the disclosed embodiments and may vary according to the intention or custom of users or operators. Therefore, the definition should be made based on the contents throughout the present specification. The terms used in the detailed description are only for describing the embodiments, and should not be limiting. Unless explicitly used otherwise, expressions in the singular form include the meaning of the plural form. In the present description, expressions such as "comprising" or "feature" are intended to refer to certain features, numbers, steps, actions, elements, some or combination thereof, and one or more It should not be construed to exclude the presence or possibility of other features, numbers, steps, actions, elements, any part or combination thereof.

이하의 실시예들에 있어서, 기계 학습(Machine Learning) 또는 딥 러닝(Deep Learning)은 기 저장된 데이터를 이용하여 특정 모델(model)의 학습을 진행함으로써 새로이 입력되는 데이터에 대해서도 해당 모델이 학습된 요소를 반영한 결과 데이터를 출력할 수 있도록 하기 위한 기술적 방법론이다. 이를 위해서, 상술한 모델은 컨볼루션 신경망(Convolutional Neural Network, CNN) 구조, 다층 퍼셉트론(Multi-layer Perceptron) 구조 등의 인공 신경망(Artificial Neural Network) 구조를 포함할 수 있다.In the following embodiments, machine learning or deep learning is an element in which the model is trained for newly input data by learning a specific model using previously stored data. This is a technical methodology for outputting the result data reflecting the data. To this end, the above-described model may include an artificial neural network structure such as a convolutional neural network (CNN) structure and a multi-layer perceptron structure.

또한, 영상 스타일 변환(Image Style Transfer)은 어느 하나의 영상에 다른 영상의 스타일을 일정 부분 반영하여 영상이 가지는 명도, 채도, 명암 대비(contrast) 값 등을 변화시키는 작업을 의미한다. 이를 통해 실제 현실세계를 촬영한 영상에 희망하는 화가의 작품의 화풍을 반영한 스타일 변환 영상을 생성할 수도 있다.In addition, image style transfer refers to an operation of changing a brightness, saturation, contrast value, etc. of an image by reflecting a certain part of the style of another image on one image. Through this, it is possible to create a style-transformed image that reflects the style of the artist's work desired in the image of the real world.

이하에서는, 영상 스타일 변환 시 입력되는 영상 중 스타일이 변화되는 대상이 되는 영상을 컨텐츠 영상(content image)으로 지칭한다. 한편, 영상 스타일 변환 시 입력되는 영상 중 컨텐츠 영상에 반영되는 스타일을 제공하는 영상을 스타일 영상(style image)으로 지칭한다. 또한, 스타일 영상의 스타일이 반영된 컨텐츠 영상을 스타일 변환 영상으로 지칭한다. 이때, 스타일 영상의 스타일이 컨텐츠 영상에 어느 정도로 반영되는지 그 세기를 결정하는 변수 값을 스타일 파라미터(style parameter)로 지칭한다. 구체적으로, 복수의 스타일 파라미터가 존재하는 경우, 각 스타일 파라미터는 서로 다른 스타일 요소(예를 들어, 명도, 채도, 명암 대비 등)와 관계될 수 있다.Hereinafter, an image to which the style is changed among images input when the image style is converted is referred to as a content image. Meanwhile, an image that provides a style reflected in a content image among images input during image style conversion is referred to as a style image. Also, a content image in which the style of the style image is reflected is referred to as a style conversion image. In this case, a variable value that determines the intensity of the style of the style image is reflected in the content image is referred to as a style parameter. Specifically, when there are a plurality of style parameters, each style parameter may be related to a different style element (eg, brightness, saturation, contrast, etc.).

또한, 영상 스타일 변환 모델(200)에서의 학습은 입력 데이터를 이용하여 계산되는 손실(loss)을 최소화하는 과정을 의미하며, 이 과정에서 학습되는 학습 파라미터(train parameter)는 손실을 가능한 한 최소화하는 방향으로 학습되는 매개변수, 즉 가중치(weight) 및 바이어스(bias)를 의미한다.In addition, learning in the image style conversion model 200 refers to a process of minimizing the loss calculated using input data, and the training parameter learned in this process minimizes the loss as much as possible. It refers to parameters that are learned in the direction, that is, weight and bias.

또한, 컨텐츠 영상의 컨텐츠 정보는 컨텐츠 영상의 정보 중 스타일 변환 시 변경되지 않아야 하는 정보를 의미하며, 스타일 영상의 컨텐츠 정보는 스타일 영상의 정보 중 스타일 변환 시 컨텐츠 영상에 반영되지 않아야 하는 정보를 의미한다.In addition, the content information of the content image refers to information that should not be changed during style conversion among the information of the content image, and the content information of the style image refers to information that should not be reflected in the content image when the style is changed among the information of the style image. .

또한 컨텐츠 영상의 스타일 정보는 컨텐츠 영상의 정보 중 스타일 변환 시 변경되어야 하는 정보를 의미하며, 스타일 영상의 스타일 정보는 스타일 영상의 정보 중 스타일 변환 시 컨텐츠 영상에 반영되어야 하는 정보를 의미한다.In addition, the style information of the content image refers to information that must be changed during style conversion among the information of the content image, and the style information of the style image refers to information that must be reflected in the content image when the style is changed among the information of the style image.

도 1은 일 실시예에 따른 영상 스타일 변환 모델 학습 장치(100)를 설명하기 위한 블록도이다. 도 1을 참조하면, 일 실시예에 따른 영상 스타일 변환 모델 학습 장치(100)는 모델 학습부(110) 및 손실 계산부(120)를 포함한다.1 is a block diagram illustrating an apparatus 100 for training an image style transformation model according to an exemplary embodiment. Referring to FIG. 1, an apparatus 100 for training an image style conversion model according to an embodiment includes a model learning unit 110 and a loss calculation unit 120.

모델 학습부(110)는 컨텐츠 영상, 스타일 영상 및 하나 이상의 스타일 파라미터에 기초하여 컨텐츠 영상의 스타일을 변환하는 영상 스타일 변환 모델(200)을 학습시킨다.The model learning unit 110 trains an image style conversion model 200 that converts a style of a content image based on a content image, a style image, and one or more style parameters.

이하의 실시예에서, 영상 스타일 변환 모델(200)은 모델 학습부(110)에 의해 학습되는 과정에서 가중치 및 바이어스를 최적화(optimization)함으로써, 입력된 컨텐츠 영상에 입력된 스타일 파라미터의 값에 대응되는 세기만큼 스타일 영상의 스타일을 반영하여 스타일 변환 영상을 출력한다. 이와 관련하여서는, 도 2 및 도 3을 참조하여 아래에서 상세히 설명하기로 한다.In the following embodiment, the image style conversion model 200 optimizes weights and biases in the process of being trained by the model learning unit 110 to correspond to the values of the style parameters input to the input content image. The style converted image is output by reflecting the style of the style image by the intensity. In this regard, it will be described in detail below with reference to FIGS. 2 and 3.

일 실시예에 따르면, 모델 학습부(110)는, 하나 이상의 스타일 파라미터의 값이 모두 기 설정된 범위 내의 최소 값인 경우, 영상 스타일 변환 모델(200)이 스타일 영상 대신 컨텐츠 영상을 사용하도록 영상 스타일 변환 모델(200)을 학습시킬 수 있다.According to an embodiment, the model learning unit 110, when all values of one or more style parameters are minimum values within a preset range, the image style conversion model so that the image style conversion model 200 uses the content image instead of the style image. (200) can be learned.

예를 들어, 0 이상 1 이하의 범위 내의 값을 가지는 하나 이상의 스타일 파라미터에 대해, 모든 스타일 파라미터의 값이 0인 경우, 영상 스타일 변환 모델(200)은 입력된 스타일 영상과 관계없이 컨텐츠 영상으로부터 추출한 정보를 컨텐츠 영상의 스타일을 변환하기 위한 계산에 이용할 수 있다. 이와 관련하여, 도 6을 참조하여 아래에서 상세히 설명하기로 한다.For example, for one or more style parameters having a value in the range of 0 to 1, if all style parameters are 0, the image style conversion model 200 is extracted from the content image regardless of the input style image. The information can be used in calculations for converting the style of the content image. In this regard, it will be described in detail below with reference to FIG. 6.

이때, 하나 이상의 스타일 파라미터 각각의 값은, 영상 스타일 변환 모델(200)이 기 설정된 횟수만큼 반복하여 학습될 때마다 기 설정된 범위 내의 값 중에서 임의로 선택된다. 이와 관련하여, 도 5를 참조하여 아래에서 상세히 설명하기로 한다.At this time, each value of one or more style parameters is randomly selected from among values within a preset range whenever the image style conversion model 200 is repeatedly trained for a preset number of times. In this regard, it will be described in detail below with reference to FIG. 5.

일 실시예에 따르면, 기 설정된 횟수는 배치 크기(batch size)일 수 있으며, 배치 크기는 배치 하나에 포함되는 입력 데이터의 개수를 의미한다.According to an embodiment, the preset number may be a batch size, and the batch size means the number of input data included in one batch.

일 실시예에 따르면, 모델 학습부(110)는 영상 스타일 변환 모델(200)을 반복 학습시킴에 있어서, 오류 역전파(error backpropagation) 알고리즘을 이용할 수 있다. 오류 역전파 알고리즘은 손실 계산부(120)에서 산출된 변환 손실이 본래 영상 스타일 변환 시 입력 데이터를 이용한 계산이 이루어지는 방향과 반대 방향으로 전파됨으로써 영상 스타일 변환 모델(200)의 학습을 수행하는 방법을 지칭한다.According to an embodiment, the model learning unit 110 may use an error backpropagation algorithm in iteratively learning the image style transformation model 200. The error backpropagation algorithm is a method of learning the image style transformation model 200 by propagating the transformation loss calculated by the loss calculator 120 in a direction opposite to the direction in which the calculation using the input data is performed when converting the original image style. Refers to.

일 실시예에 따르면, 하나 이상의 스타일 파라미터 각각의 값은, 독립적으로 선택될 수 있다.According to an embodiment, each value of one or more style parameters may be independently selected.

구체적으로, 모델 학습부(110)는 영상 스타일 변환 모델(200)이 기 설정된 횟수만큼 반복 학습된 경우, 다음 학습의 수행에 앞서, 기 설정된 범위 내의 값 중에서 하나 이상의 스타일 파라미터 각각의 값을 별개로 임의 추출(random sampling)할 수 있다. 이때, 모델 학습부(110)는 기 설정된 범위 내의 값을 치역(range)으로 갖는 기 설정된 함수 f의 정의역(domain)을 임의로 선택함으로써, 특정된 정의역 값에 대응되는 함수 f 값을 스타일 파라미터의 값으로 판단할 수 있다.Specifically, when the image style conversion model 200 is repeatedly trained a preset number of times, the model learning unit 110 separately separates values of one or more style parameters among values within a preset range before performing the next training. Random sampling can be performed. At this time, the model learning unit 110 randomly selects a domain of a preset function f having a value within a preset range as a range, so that the value of the function f corresponding to the specified domain value is the value of the style parameter. It can be judged as.

손실 계산부(120)는 컨텐츠 영상, 컨텐츠 영상에 대한 스타일 변환 영상 및 스타일 영상에 기초하여 변환 손실(loss)을 산출한다.The loss calculator 120 calculates a conversion loss based on a content image, a style converted image for the content image, and a style image.

구체적으로, 변환 손실은 하나 이상의 스타일 파라미터 값에 기초하여 변환된 스타일 변환 영상을 정답(ground truth)과 비교할 때, 스타일 변환 영상이 갖는 오차를 의미한다. 이때, 정답은 기존 컨텐츠 영상 내 정보 중 변경되지 않아야 하는 컨텐츠 정보 및 기존 스타일 영상 중 스타일 변환 영상에 반영되어야 하는 스타일 정보를 의미할 수 있다.Specifically, the conversion loss refers to an error of the style converted image when comparing the converted style converted image based on one or more style parameter values with a ground truth. In this case, the correct answer may mean content information that should not be changed among information in the existing content image and style information that should be reflected in a style conversion image among the existing style images.

일 실시예에 따르면, 손실 계산부(120)는 컨텐츠 손실 및 스타일 손실에 기초하여 제1 손실을 산출하고, 컨텐츠 영상 및 복원된 컨텐츠 영상에 기초하여 제2 손실을 산출하고, 제1 손실 및 제2 손실에 기초하여 변환 손실을 산출할 수 있다.According to an embodiment, the loss calculator 120 calculates a first loss based on a content loss and a style loss, calculates a second loss based on the content image and the restored content image, and calculates the first loss and the second loss. 2 The conversion loss can be calculated based on the loss.

일 실시예에 따르면, 손실 계산부(120)는 아래의 수학식 1에 의해 변환 손실을 산출할 수 있다.According to an embodiment, the loss calculator 120 may calculate the conversion loss by Equation 1 below.

[수학식 1][Equation 1]

이때,

는 변환 손실,

는 제1 손실,

는 제2 손실,

는 제2 손실 가중치를 나타낸다.At this time,

Is the conversion loss,

Is the first loss,

Is the second loss,

Represents the second loss weight.

구체적으로, 손실 계산부(120)는 아래의 수학식 2에 의해 제1 손실을 산출할 수 있다.Specifically, the loss calculation unit 120 may calculate the first loss by Equation 2 below.

[수학식 2][Equation 2]

이때,

는 컨텐츠 손실,

는 컨텐츠 손실 가중치,

는 스타일 손실,

는 스타일 손실 가중치,

는 스타일 변환 영상,

는 컨텐츠 영상,

는 스타일 영상,

는 스타일 파라미터를 나타낸다.At this time,

Loss of content,

Is the content loss weight,

Loss of style,

Is the style loss weight,

Is the style conversion video,

Is the content video,

The style video,

Represents the style parameter.

더욱 상세하게, 손실 계산부(120)는 아래의 수학식 3에 의해 컨텐츠 손실을 산출하고, 아래의 수학식 4 및 수학식 5에 의해 스타일 손실을 산출할 수 있다.In more detail, the loss calculator 120 may calculate the content loss by Equation 3 below, and calculate the style loss by Equation 4 and Equation 5 below.

[수학식 3][Equation 3]

이때,

는 L2 norm,

는 스타일 변환 영상의 컨텐츠 정보와 대응되는 특징 벡터,

는 컨텐츠 영상의 컨텐츠 정보와 대응되는 특징 벡터를 나타낸다.At this time,

Is L2 norm,

Is a feature vector corresponding to the content information of the style converted video,

Denotes a feature vector corresponding to the content information of the content image.

[수학식 4][Equation 4]

[수학식 5][Equation 5]

이때,

는 스타일 변환 영상의 스타일 정보와 대응되는 특징 벡터,

는 스타일 파라미터가

인 경우 정답 중의 스타일 정보와 대응되는 특징 벡터,

는 스타일 영상의 스타일 정보와 대응되는 특징 벡터,

는 컨텐츠 영상의 스타일 정보와 대응되는 특징 벡터를 나타낸다.At this time,

Is a feature vector corresponding to the style information of the style conversion video,

Is the style parameter

In the case of, a feature vector corresponding to the style information in the correct answer,

Is a feature vector corresponding to the style information of the style image,

Denotes a feature vector corresponding to the style information of the content image.

또한 구체적으로, 손실 계산부(120)는 아래의 수학식 6에 의해 제2 손실을 산출할 수 있다.In addition, specifically, the loss calculation unit 120 may calculate the second loss by Equation 6 below.

[수학식 6][Equation 6]

이때,

은 L1 norm,

은 복원된 컨텐츠 영상을 나타낸다. 복원된 컨텐츠 영상이라 함은, 컨텐츠 영상이 영상 스타일 변환 모델(200)에 의해 부호화 된 후 스타일 영상의 정보가 반영되지 않은 채 다시 복호화 되어 출력된 영상을 의미한다.At this time,

Is L1 norm,

Represents the reconstructed content image. The reconstructed content image refers to an image that is encoded by the image style conversion model 200 and then decoded again without reflecting information on the style image, and then output.

이후, 모델 학습부(110)는 변환 손실에 기초하여 영상 스타일 변환 모델(200)을 학습시킨다.Thereafter, the model learning unit 110 trains the image style transformation model 200 based on the transformation loss.

한편, 일 실시예에 따르면, 모델 학습부(110) 및 손실 계산부(120)는 하나 이상의 컨볼루션 신경망(Convolutional Neural Network, CNN)을 포함하는 인공 신경망 구조일 수 있다.Meanwhile, according to an embodiment, the model learning unit 110 and the loss calculation unit 120 may be an artificial neural network structure including one or more convolutional neural networks (CNNs).

도 2는 일 실시예에 따른 영상 스타일 변환 모델(200)을 설명하기 위한 블록도이다. 도 2를 참조하면, 일 실시예에 따른 영상 스타일 변환 모델(200)은 하나 이상의 인코더(210), 하나 이상의 변환부(220) 및 하나 이상의 디코더(230)를 포함할 수 있다.2 is a block diagram illustrating an image style conversion model 200 according to an exemplary embodiment. Referring to FIG. 2, the image style conversion model 200 according to an embodiment may include one or more encoders 210, one or more transform units 220, and one or more decoders 230.

인코더(210)는 컨텐츠 영상 및 스타일 영상 각각으로부터 복수의 특징 벡터(feature vector)를 생성할 수 있다.The encoder 210 may generate a plurality of feature vectors from each of the content image and the style image.

구체적으로, 특징 벡터는 컨텐츠 영상 및 스타일 영상 각각을 이루는 복수의 화소(pixel) 각각에 대응되는 RGB 값 정보, 각 화소가 컨텐츠 영상 및 스타일 영상 내에서 위치하는 위치 값 정보를 포함할 수 있다.Specifically, the feature vector may include RGB value information corresponding to each of a plurality of pixels constituting the content image and the style image, and position value information at which each pixel is located in the content image and the style image.

일 실시예에 따르면, 인코더(210)는 컨텐츠 영상 및 스타일 영상을 각각 입력 받아, 컨텐츠 영상 및 스타일 영상 각각으로부터 상술한 화소의 RGB 값 정보 및 화소의 위치 값 정보를 포함하는 특징 벡터를 추출할 수 있다.According to an embodiment, the encoder 210 may receive a content image and a style image, respectively, and extract a feature vector including the RGB value information of the pixel and the position value information of the pixel from each of the content image and the style image. have.

변환부(220)는 하나 이상의 스타일 파라미터 및 스타일 영상의 복수의 특징 벡터에 기초하여 컨텐츠 영상의 복수의 특징 벡터 중 적어도 일부를 변환할 수 있다.The converter 220 may convert at least some of the plurality of feature vectors of the content image based on one or more style parameters and the plurality of feature vectors of the style image.

일 실시예에 따르면, 변환부(220)는 컨텐츠 영상의 복수의 특징 벡터에 포함된 복수의 정보 중 컨텐츠 영상의 스타일 정보에 해당하는 정보를, 스타일 영상의 복수의 특징 벡터에 포함된 복수의 정보 중 스타일 영상의 스타일 정보에 해당하는 정보로 변경할 수 있다.According to an embodiment, the conversion unit 220 converts information corresponding to style information of a content image among a plurality of pieces of information included in a plurality of feature vectors of a content image, and a plurality of pieces of information included in a plurality of feature vectors of the style image. It can be changed to information corresponding to the style information of the style image.

일 실시예에 따르면, 변환부(220)는 하나 이상의 스타일 파라미터 중 각각 대응되는 스타일 파라미터의 값에 기초하여 컨텐츠 영상의 복수의 특징 벡터 중 적어도 일부를 변환할 수 있다. 이와 관련하여, 도 3을 참조하여 아래에서 상세히 설명하기로 한다.According to an embodiment, the converter 220 may convert at least some of a plurality of feature vectors of a content image based on values of a style parameter corresponding to each of one or more style parameters. In this regard, it will be described in detail below with reference to FIG. 3.

일 실시예에 따르면, 변환부(220)는 각각 하나 이상의 인코더 중 대응되는 인코더에 의해 생성된 컨텐츠 영상의 특징 벡터 및 스타일 영상의 특징 벡터를 입력 받을 수 있다. 이와 관련하여, 도 3을 참조하여 아래에서 상세히 설명하기로 한다.According to an embodiment, the converter 220 may receive a feature vector of a content image and a feature vector of a style image generated by a corresponding encoder among one or more encoders, respectively. In this regard, it will be described in detail below with reference to FIG. 3.

디코더(230)는 변환된 특징 벡터와 컨텐츠 영상의 복수의 특징 벡터 중 변환되지 않은 특징 벡터에 기초하여, 스타일 변환 영상을 생성할 수 있다.The decoder 230 may generate a style transformed image based on the transformed feature vector and the unconverted feature vector among a plurality of feature vectors of the content image.

일 실시예에 따르면, 디코더(230)는 컨텐츠 영상의 특징 벡터 중 스타일 변환을 통해 변환된 특징 벡터 및 변환되지 않은 특징 벡터를 모두 복호화함으로써, 스타일 변환 영상을 생성할 수 있다.According to an embodiment, the decoder 230 may generate a style-converted image by decoding both a feature vector converted through style conversion and an unconverted feature vector among feature vectors of a content image.

도 3은 일 실시예에 따른 영상 스타일 변환 모델(200) 및 손실 계산부(120)의 구조를 상세히 설명하기 위한 도면(300)이다. 도 3을 참조하면, 일 실시예에 따른 영상 스타일 변환 모델은 하나 이상의 인코더(210), 하나 이상의 변환부(220), 하나 이상의 디코더(230) 및 손실 계산부(120)를 포함할 수 있다.3 is a diagram 300 for explaining in detail the structures of the image style conversion model 200 and the loss calculation unit 120 according to an exemplary embodiment. Referring to FIG. 3, a video style conversion model according to an embodiment may include at least one encoder 210, at least one conversion unit 220, at least one decoder 230, and a loss calculation unit 120.

하나 이상의 인코더(210)는 컨텐츠 영상 및 스타일 영상을 입력받는다. 예를 들어, 서로 다른 크기(size)를 갖는 각 인코더(E_0,E₁, E₂, E₃)는 입력된 컨텐츠 영상을 부호화하여 각기 다른 크기를 갖는 특징 벡터 f_c.0, f_c.1, f_c.2, f_c.3를 추출하고, 입력된 스타일 영상을 부호화하여 각기 다른 크기를 갖는 특징 벡터 f_s.0, f_s.1, f_s.2, f_s.3를 추출한다.At least one encoder 210 receives a content image and a style image. For example, each encoder (E _0, E ₁ , E ₂ , E ₃ ) having different sizes encodes the input content image, and thus feature vectors f _c.0 , f _{c. 1,} f extract _c.2, _c.3 and f, by encoding an input style image extracting each feature having a different size _s.0 vector f, f _s.1, _s.2 f, f _s.3 do.

하나 이상의 변환부(220)를 이루는 각 변환부(T₀, T₁, T₂, T₃)는 컨텐츠 영상으로부터 추출된 특징 벡터 f_c.0, f_c.1, f_c.2, f_c.3 중 자신과 크기가 대응되는 각 인코더에 의해 추출된 특징 벡터 및 스타일 영상으로부터 추출된 특징 벡터 f_s.0, f_s.1, f_s.2, f_s.3 중 자신과 대응되는 각 인코더에 의해 추출된 특징 벡터를 입력 받는다. 이후, 각 변환부는 대응되는 각 스타일 파라미터의 값 및 스타일 영상으로부터 추출된 특징 벡터 내 정보에 기초하여 컨텐츠 영상으로부터 추출된 이와 대응되는 특징 벡터 내 정보 중 적어도 일부를 변경한다.Each of the transform units (T ₀ , T ₁ , T ₂ , T ₃ _{) constituting one or more transform units 220 is a feature vector f c.0} , f _c.1 , f _c.2 , f _c extracted from the content image. _0.3 of each of the feature vectors corresponding to their _s.0 f, f _s.1, _s.2 f, f _s.3 extracted from the feature vector and style image extracted by each encoder that their size and the corresponding The feature vector extracted by the encoder is input. Thereafter, each conversion unit changes at least some of information in the corresponding feature vector extracted from the content image based on the value of each corresponding style parameter and information in the feature vector extracted from the style image.

하나 이상의 디코더(230)를 이루는 각 디코더(D₁, D₂, D₃, D₄)는 상술한 각 변환부(T₀, T₁, T₂, T₃)의 사이마다 배치되어, 자신과 크기가 대응되는 변환부에 의해 변환된 특징 벡터 및 컨텐츠 영상으로부터 추출된 자신과 크기가 대응되는 특징 벡터를 입력 받는다. 이후 각 디코더는 입력 받은 특징 벡터들을 복호화하여 스타일 변환 영상 및 복원된 컨텐츠 영상을 생성한다. _{Each decoder (D 1} , D ₂ , D ₃ , D ₄ ) constituting one or more decoders 230 is disposed between each of the above-described transform units (T ₀ , T ₁ , T ₂ , T _{3 ), and} A feature vector that has been transformed by a transform unit corresponding to the size and a feature vector that has a size corresponding to the one extracted from the content image are input. Thereafter, each decoder decodes the received feature vectors to generate a style-converted image and a reconstructed content image.

손실 계산부(120)는 제1 손실(

) 및 제2 손실(

)에 기초하여 변환 손실(

)을 산출하고, 산출된 변환 손실을 각 디코더(D₁, D₂, D₃, D₄)에 되먹임(feedback)하여, 영상 스타일 변환 모델(200)이 변환 손실을 최소화하는 방향으로 학습될 수 있도록 한다.The loss calculation unit 120 is the first loss (

) And the second loss (

) Based on the conversion loss (

), and the calculated conversion loss _{is fed back to each decoder (D 1} , D ₂ , D ₃ , D ₄ ), so that the video style conversion model 200 can be trained in a direction that minimizes the conversion loss. To be there.

도 4는 일 실시예에 따른 영상 스타일 변환 모델 실행 장치(400)를 설명하기 위한 블록도이다. 도 4를 참조하면, 일 실시예에 따른 영상 스타일 변환 모델 실행 장치(400)는 입력부(410) 및 스타일 변환 영상 생성부(420)를 포함한다.4 is a block diagram illustrating an apparatus 400 for executing an image style conversion model according to an exemplary embodiment. Referring to FIG. 4, an apparatus 400 for executing an image style conversion model according to an embodiment includes an input unit 410 and a style conversion image generation unit 420.

입력부(410)는 컨텐츠 영상, 스타일 영상 및 하나 이상의 스타일 파라미터에 대한 설정 값을 획득한다.The input unit 410 acquires a content image, a style image, and setting values for one or more style parameters.

이때, 하나 이상의 스타일 파라미터에 대한 설정 값은 기 설정된 범위 내에서 선택된 값이다. 예를 들어, 영상 스타일 변환 모델 실행 장치(400)의 사용자는 입력부(410)를 통해 기 설정된 범위 내에서 하나 이상의 스타일 파라미터에 대한 설정 값을 입력할 수 있다.In this case, the setting value for one or more style parameters is a value selected within a preset range. For example, a user of the image style conversion model execution apparatus 400 may input setting values for one or more style parameters within a preset range through the input unit 410.

일 실시예에 따르면, 하나 이상의 스타일 파라미터에 대한 설정 값은 독립적으로 선택될 수 있다. 예를 들어, 제1 스타일 파라미터 및 제2 스타일 파라미터가 존재하는 경우, 영상 스타일 변환 모델 실행 장치(400)의 사용자는 입력부(410)를 통해 기 설정된 범위 내에서 제1 스타일 파라미터와 제2 스타일 파라미터를 따로 입력함으로써, 두 파라미터를 독립적으로 선택할 수 있다.According to an embodiment, setting values for one or more style parameters may be independently selected. For example, when a first style parameter and a second style parameter exist, the user of the video style conversion model execution device 400 may use the input unit 410 to set the first style parameter and the second style parameter within a preset range. By separately entering the two parameters can be selected independently.

스타일 변환 영상 생성부(420)는 컨텐츠 영상, 스타일 영상 및 하나 이상의 스타일 파라미터에 대한 설정 값을 입력 데이터로 이용하는 사전 학습된 영상 스타일 변환 모델(200)을 이용하여 컨텐츠 영상에 대한 스타일 변환 영상을 생성한다.The style conversion image generation unit 420 generates a style conversion image for a content image using a pre-learned image style conversion model 200 that uses a content image, a style image, and setting values for one or more style parameters as input data. do.

이때, 영상 스타일 변환 모델(200)은 도 2를 참조하여 설명한 바와 같이, 하나 이상의 인코더(210), 하나 이상의 변환부(220) 및 하나 이상의 디코더(230)를 포함할 수 있고, 인코더(210), 변환부(220) 및 디코더(230) 각각은 도 2를 참조하여 상술한 실시예에서와 동일 또는 유사한 기능을 수행하므로, 이에 대한 보다 구체적인 설명은 생략하도록 한다.In this case, the video style conversion model 200 may include one or more encoders 210, one or more conversion units 220, and one or more decoders 230, as described with reference to FIG. 2, and the encoder 210 , Since each of the transform unit 220 and the decoder 230 performs the same or similar function as in the embodiment described above with reference to FIG. 2, a more detailed description thereof will be omitted.

일 실시예에 따르면, 스타일 변환 영상 생성부(420)는 하나 이상의 스타일 파라미터에 대한 설정 값이 모두 기 설정된 범위 내의 최소 값인 경우, 영상 스타일 변환 모델(200)이 스타일 영상 대신 컨텐츠 영상을 사용하도록 제어할 수 있다. 이러한 실시예는 도 1을 참조하여 상술한 모델 학습부(110)의 영상 스타일 변환 모델(200)에 대한 학습 과정에서 설명된 바와 동일 내지는 유사하므로, 이에 대한 자세한 설명은 생략하기로 한다.According to an embodiment, the style conversion image generation unit 420 controls the image style conversion model 200 to use a content image instead of a style image when all set values for one or more style parameters are minimum values within a preset range. can do. This embodiment is the same as or similar to that described in the learning process for the image style conversion model 200 of the model learning unit 110 described above with reference to FIG. 1, and thus a detailed description thereof will be omitted.

도 5는 일 실시예에 따른 스타일 파라미터 선택 방법을 설명하기 위한 도면(500)이다. 5 is a diagram 500 for describing a method of selecting a style parameter according to an exemplary embodiment.

도 5를 참조하면, 스타일 파라미터(style parameter, control parameter)는 기 설정된 범위(예를 들어, 0 이상 1 이하) 내에서 임의로 선택된다. 스타일 파라미터가 선택되면, 그에 대응되는 스타일 세기(style strength)가 결정되고, 영상 스타일 변환 모델(200)은 결정된 스타일 세기에 따라 입력된 스타일 영상에 기초하여 입력된 컨텐츠 영상의 스타일을 변환한다.Referring to FIG. 5, a style parameter (control parameter) is randomly selected within a preset range (eg, 0 or more and 1 or less). When a style parameter is selected, a style strength corresponding thereto is determined, and the image style conversion model 200 converts the style of the input content image based on the input style image according to the determined style strength.

예를 들어, 스타일 파라미터가 1인 경우, 스타일 세기는 영상 스타일 변환 모델(200)의 설정 가능한 스타일 세기 범위(controllable range) 내에서 최대의 값을 가지고, 스타일 파라미터가 0인 경우, 스타일 세기는 영상 스타일 변환 모델(200)의 설정 가능한 스타일 세기 범위 내에서 최소의 값을 갖는다.For example, when the style parameter is 1, the style intensity has a maximum value within the controllable range of the image style conversion model 200, and when the style parameter is 0, the style intensity is the image It has a minimum value within the range of the style intensity that can be set in the style conversion model 200.

도 6은 일 실시예에 따른 영상 스타일 변환 모델의 동작을 설명하기 위한 도면(600)이다.6 is a diagram 600 for explaining an operation of an image style conversion model according to an exemplary embodiment.

도 6을 참조하면, 모델 학습부(110)에 의해 학습된 영상 스타일 변환 모델(200)은 하나 이상의 스타일 파라미터의 값 중 적어도 하나의 값이 기 설정된 범위 내의 최소 값이 아니라면, 영상 스타일 변환 모델(200)에 입력된 컨텐츠 영상 및 스타일 영상(610)에 기초하여 컨텐츠 영상의 스타일을 변환한다.6, if at least one of the values of one or more style parameters is not a minimum value within a preset range, the video style conversion model 200 learned by the model learning unit 110 is the video style conversion model ( 200), the style of the content image is converted based on the content image and the style image 610.

그러나 하나 이상의 스타일 파라미터의 값이 모두 기 설정된 범위 내의 최소 값이라면, 영상 스타일 변환 모델(200)은 입력된 스타일 영상 대신 컨텐츠 영상(620)을 중복으로 이용한다.However, if the values of one or more style parameters are all minimum values within a preset range, the image style conversion model 200 redundantly uses the content image 620 instead of the input style image.

이 경우, 컨텐츠 영상에 포함된 정보가 동일한 컨텐츠 영상에 포함된 동일한 정보로 대체됨으로써, 결과적으로 기존 컨텐츠 영상의 정보가 변경되지 않게 되므로 영상 스타일 변환 모델(200)은 하나 이상의 스타일 파라미터의 값이 모두 기 설정된 범위 내에서 최소 값을 가지는 경우 스타일 변환 영상으로써 복원된 컨텐츠 영상을 생성한다.In this case, since the information included in the content image is replaced with the same information included in the same content image, as a result, the information of the existing content image is not changed, so that the image style conversion model 200 has all values of one or more style parameters. When it has a minimum value within a preset range, a reconstructed content image is generated as a style conversion image.

도 7은 일 실시예에 따른 영상 스타일 변환 모델 학습 방법을 설명하기 위한 흐름도이다. 도 7에 도시된 방법은 예를 들어, 상술한 영상 스타일 변환 모델 학습 장치(100)에 의해 수행될 수 있다.7 is a flowchart illustrating a method of training an image style conversion model according to an exemplary embodiment. The method illustrated in FIG. 7 may be performed, for example, by the apparatus 100 for learning the image style conversion model described above.

도 7을 참조하면, 영상 스타일 변환 모델 학습 장치(100)는 우선, 컨텐츠 영상, 스타일 영상 및 하나 이상의 스타일 파라미터 각각의 값을 포함하는 입력 데이터를 영상 스타일 변환 모델(200)에 입력한다(710).Referring to FIG. 7, the apparatus 100 for learning an image style conversion model first inputs input data including values of a content image, a style image, and one or more style parameters into the image style conversion model 200 (710). .

이후, 영상 스타일 변환 모델 학습 장치(100)는, 입력 데이터에 기초하여 영상 스타일 변환 모델(200)에 의해 생성된 스타일 변환 영상, 컨텐츠 영상 및 스타일 영상에 기초하여 변환 손실을 산출한다(720).Thereafter, the video style conversion model learning apparatus 100 calculates a conversion loss based on the style conversion image, the content image, and the style image generated by the video style conversion model 200 based on the input data (720).

일 실시예에 따르면, 영상 스타일 변환 모델 학습 장치(100)는, 컨텐츠 손실 및 스타일 손실에 기초하여 제1 손실을 산출하고, 컨텐츠 영상 및 복원된 컨텐츠 영상에 기초하여 제2 손실을 산출하고, 제1 손실 및 제2 손실에 기초하여 변환 손실을 산출할 수 있다.According to an embodiment, the image style conversion model training apparatus 100 calculates a first loss based on a content loss and a style loss, calculates a second loss based on the content image and the restored content image, and The conversion loss can be calculated based on the first loss and the second loss.

이후, 영상 스타일 변환 모델 학습 장치(100)는, 변환 손실에 기초하여 영상 스타일 변환 모델(200)의 학습 파라미터를 학습시킨다(730).Thereafter, the video style conversion model training apparatus 100 learns a learning parameter of the video style conversion model 200 based on the conversion loss (730 ).

일 실시예에 따르면, 영상 스타일 변환 모델 학습 장치(100)는, 입력 데이터 내 하나 이상의 스타일 파라미터 각각의 값이 모두 기 설정된 범위 내의 최소 값인 경우, 영상 스타일 변환 모델(200)이 스타일 영상 대신 컨텐츠 영상을 사용하도록 영상 스타일 변환 모델(200)을 학습시킬 수 있다. 이러한 실시예는 도 1을 참조하여 상술한 모델 학습부(110)의 영상 스타일 변환 모델(200)에 대한 학습 과정에서 설명된 바와 동일 내지는 유사하므로, 이에 대한 자세한 설명은 생략하기로 한다.According to an embodiment, the video style conversion model training apparatus 100 may use the video style conversion model 200 to set the content video instead of the style video when the values of each of the one or more style parameters in the input data are all minimum values within a preset range. The image style conversion model 200 may be trained to use. This embodiment is the same as or similar to that described in the learning process for the image style conversion model 200 of the model learning unit 110 described above with reference to FIG. 1, and thus a detailed description thereof will be omitted.

이후, 영상 스타일 변환 모델 학습 장치(100)는, 기 설정된 학습 종료 조건이 만족되었는지 여부를 판단하여(740), 만족되지 않은 경우 이 조건이 만족될 때까지 위 단계 710 내지 730을 반복하여 수행한다.Thereafter, the image style conversion model training apparatus 100 determines whether a preset learning end condition is satisfied (740), and if not, repeats steps 710 to 730 until the condition is satisfied. .

이때, 하나 이상의 스타일 파라미터 각각의 값은, 반복 수행 시마다 기 설정된 범위 내의 값 중에서 임의로 선택된다.At this time, each value of one or more style parameters is randomly selected from among values within a preset range for each repetition.

도시된 흐름도에서는 상기 방법을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.In the illustrated flowchart, the method is described by dividing the method into a plurality of steps, but at least some of the steps are performed in a different order, combined with other steps, performed together, omitted, divided into detailed steps, or not shown. One or more steps may be added and performed.

도 8은 일 실시예에 따른 영상 스타일 변환 모델 실행 방법을 설명하기 위한 흐름도이다. 도 8에 도시된 방법은 예를 들어, 상술한 영상 스타일 변환 모델 실행 장치(400)에 의해 수행될 수 있다.8 is a flowchart illustrating a method of executing an image style conversion model according to an exemplary embodiment. The method illustrated in FIG. 8 may be performed, for example, by the image style conversion model execution apparatus 400 described above.

도 8을 참조하면, 영상 스타일 변환 모델 실행 장치(400)는 우선, 컨텐츠 영상, 스타일 영상 및 하나 이상의 스타일 파라미터에 대한 설정 값을 획득한다(810).Referring to FIG. 8, the apparatus 400 for executing an image style conversion model first obtains a content image, a style image, and setting values for one or more style parameters (810).

이때, 하나 이상의 스타일 파라미터에 대한 설정 값은, 기 설정된 범위 내에서 선택된 값이다.At this time, the setting values for one or more style parameters are values selected within a preset range.

이후, 영상 스타일 변환 모델 실행 장치(400)는, 컨텐츠 영상, 스타일 영상 및 하나 이상의 스타일 파라미터에 대한 설정 값을 입력 데이터로 이용하는 사전 학습된 영상 스타일 변환 모델(200)을 이용하여 컨텐츠 영상에 대한 스타일 변환 영상을 생성한다(820).Thereafter, the image style conversion model execution apparatus 400 uses a pre-learned image style conversion model 200 that uses a content image, a style image, and a setting value for one or more style parameters as input data to determine the style for the content image. A transformed image is generated (820).

일 실시예에 따르면, 영상 스타일 변환 모델 실행 장치(400)는, 하나 이상의 스타일 파라미터에 대한 설정 값이 모두 기 설정된 범위 내의 최소 값인 경우, 영상 스타일 변환 모델(200)이 스타일 영상 대신 컨텐츠 영상을 사용하도록 제어할 수 있다. 이러한 실시예는 도 1을 참조하여 상술한 모델 학습부(110)의 영상 스타일 변환 모델(200)에 대한 학습 과정에서 설명된 바와 동일 내지는 유사하므로, 이에 대한 자세한 설명은 생략하기로 한다.According to an embodiment, the image style conversion model execution apparatus 400 uses a content image instead of a style image when the setting values for one or more style parameters are all minimum values within a preset range. Can be controlled to do. This embodiment is the same as or similar to that described in the learning process for the image style conversion model 200 of the model learning unit 110 described above with reference to FIG. 1, and thus a detailed description thereof will be omitted.

도 9 내지 도 12는 일 실시예에 따른 영상 스타일 변환 결과를 나타낸 도면이다. 도 9 내지 도 12에 도시된 스타일 변환 영상은 예를 들어, 상술한 영상 스타일 변환 모델(200)에 의해 생성될 수 있다.9 to 12 are diagrams illustrating a result of converting an image style according to an exemplary embodiment. The style conversion images shown in FIGS. 9 to 12 may be generated, for example, by the image style conversion model 200 described above.

도 9 내지 도 12를 참조하면, 컨텐츠 영상 및 스타일 영상이 영상 스타일 변환 모델(200)에 입력되어 컨텐츠 영상에 대한 스타일 변환 영상이 생성될 때, 영상 스타일 변환 모델(200)은 각 변환부(T₀, T₁, T₂, T₃)에 일 대 일 대응되는 하나 이상의 스타일 파라미터의 설정 값(

)이 0과 1 사이에서 변화됨에 따라 상이한 스타일 변환 영상을 생성한다.9 to 12, when a content image and a style image are input to the image style conversion model 200 to generate a style conversion image for the content image, the image style conversion model 200 includes each conversion unit (T Setting values of one or more style parameters corresponding to ₀ , T ₁ , T ₂ , T _{3, one-to-one (}

) Is changed between 0 and 1, creating a different style transformed image.

구체적으로, 도 9 내지 도 12의 상단에 표시된 0.0부터 1.0까지의 수치는 각 변환부(T₀, T₁, T₂, T₃)에 일 대 일 대응되는 하나 이상의 스타일 파라미터의 설정 값(

)을 나타낸 것이다. 이때, 하나의 스타일 파라미터의 설정 값이 상단에 표시된 특정 값을 갖는 경우, 나머지 스타일 파라미터의 설정 값은 모두 0.0이라고 가정한다.Specifically, the values from 0.0 to 1.0 displayed at the top of FIGS. 9 to 12 are set values of one or more style parameters corresponding to _{each conversion unit (T 0} , T ₁ , T ₂ , T _{3) one-to-one (}

). In this case, when the setting value of one style parameter has a specific value displayed at the top, it is assumed that all of the setting values of the other style parameters are 0.0.

도 9 내지 도 12에서 예를 들어, 변환부 T₀에 대응되는 스타일 파라미터의 설정 값

는 스타일 변환 영상의 색상 변환과 관련된 값으로써, 영상 스타일 변환 모델(200)은

의 값이 1.0에 가까워질수록 스타일 영상의 색상과 유사한 색상의 스타일 변환 영상을 생성한다.In FIGS. 9 to 12, for example, a setting value of a style parameter corresponding to the _{conversion unit T 0}

Is a value related to color conversion of the style converted image, and the image style conversion model 200 is

As the value of is closer to 1.0, a style conversion image with a color similar to the color of the style image is generated.

또한, 변환부 T₃에 대응되는 스타일 파라미터의 설정 값

는 스타일 변환 영상의 명암 변환과 관련된 값으로써, 영상 스타일 변환 모델(200)은

의 값이 1.0에 가까워질수록 스타일 영상의 명암과 유사한 명암의 스타일 변환 영상을 생성한다.Also, the setting value of the style parameter corresponding to the conversion unit T ₃

Is a value related to the contrast conversion of the style converted image, and the image style conversion model 200 is

As the value of is closer to 1.0, a style conversion image with a contrast similar to that of the style image is generated.

도 13은 일 실시예에 따른 영상 스타일 변환 서비스의 제공 화면을 나타낸 도면이다.13 is a diagram illustrating a screen for providing a video style conversion service according to an exemplary embodiment.

도 13을 참조하면, 스타일 변환을 수행하고자 하는 컨텐츠 영상은 화면의 좌측에 위치한 이미지 입력 칸에 입력되며, 스타일 영상은 화면의 우측에 위치한 이미지 입력 칸에 입력된다.Referring to FIG. 13, a content image for performing style conversion is input into an image input box located on the left side of the screen, and a style image is input into an image input box located on the right side of the screen.

또한, 하나 이상의 스타일 파라미터의 설정 값은 화면 중앙의 상단에 표시된 막대 상의 조절부를 0과 1 사이에서 움직임에 따라서 독립적으로 조절된다.In addition, the setting values of one or more style parameters are independently adjusted according to the movement of the control unit on the bar displayed at the top of the center of the screen between 0 and 1.

또한, 컨텐츠 영상 및 스타일 영상이 이미지 입력 칸에 각각 입력되고, 화면 중앙의 상단에 표시된 막대 상의 조절부가 조절되면, 그 결과 스타일 변환이 수행되고 스타일 변환 영상이 화면 중앙의 하단에 표시된다.In addition, when a content image and a style image are respectively input into the image input box and the adjustment unit on the bar displayed at the top of the screen is adjusted, as a result, style conversion is performed and the style conversion image is displayed at the bottom of the center of the screen.

도 14는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경(10)을 예시하여 설명하기 위한 블록도이다. 도시된 실시예에서, 각 컴포넌트들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술된 것 이외에도 추가적인 컴포넌트를 포함할 수 있다.14 is a block diagram illustrating and describing a computing environment 10 including a computing device suitable for use in example embodiments. In the illustrated embodiment, each component may have different functions and capabilities in addition to those described below, and may include additional components in addition to those described below.

도시된 컴퓨팅 환경(10)은 컴퓨팅 장치(12)를 포함한다. 일 실시예에서, 컴퓨팅 장치(12)는 영상 스타일 변환 모델 학습 장치(100)일 수 있다. 또한, 컴퓨팅 장치(12)는 영상 스타일 변환 모델 실행 장치(400)일 수 있다.The illustrated computing environment 10 includes a computing device 12. In an embodiment, the computing device 12 may be the image style conversion model training device 100. In addition, the computing device 12 may be an image style conversion model execution device 400.

컴퓨팅 장치(12)는 적어도 하나의 프로세서(14), 컴퓨터 판독 가능 저장 매체(16) 및 통신 버스(18)를 포함한다. 프로세서(14)는 컴퓨팅 장치(12)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(14)는 컴퓨터 판독 가능 저장 매체(16)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 상기 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(14)에 의해 실행되는 경우 컴퓨팅 장치(12)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.The computing device 12 includes at least one processor 14, a computer-readable storage medium 16 and a communication bus 18. The processor 14 may cause the computing device 12 to operate in accordance with the aforementioned exemplary embodiments. For example, the processor 14 may execute one or more programs stored in the computer-readable storage medium 16. The one or more programs may include one or more computer-executable instructions, and the computer-executable instructions are configured to cause the computing device 12 to perform operations according to an exemplary embodiment when executed by the processor 14 Can be.

컴퓨터 판독 가능 저장 매체(16)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능 저장 매체(16)에 저장된 프로그램(20)은 프로세서(14)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능 저장 매체(16)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 컴퓨팅 장치(12)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.The computer-readable storage medium 16 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. The program 20 stored in the computer-readable storage medium 16 includes a set of instructions executable by the processor 14. In one embodiment, the computer-readable storage medium 16 includes memory (volatile memory such as random access memory, nonvolatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash It may be memory devices, other types of storage media that can be accessed by computing device 12 and store desired information, or a suitable combination thereof.

통신 버스(18)는 프로세서(14), 컴퓨터 판독 가능 저장 매체(16)를 포함하여 컴퓨팅 장치(12)의 다른 다양한 컴포넌트들을 상호 연결한다.The communication bus 18 interconnects the various other components of the computing device 12, including the processor 14 and computer readable storage medium 16.

컴퓨팅 장치(12)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(22) 및 하나 이상의 네트워크 통신 인터페이스(26)를 포함할 수 있다. 입출력 인터페이스(22) 및 네트워크 통신 인터페이스(26)는 통신 버스(18)에 연결된다. 입출력 장치(24)는 입출력 인터페이스(22)를 통해 컴퓨팅 장치(12)의 다른 컴포넌트들에 연결될 수 있다. 예시적인 입출력 장치(24)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(24)는 컴퓨팅 장치(12)를 구성하는 일 컴포넌트로서 컴퓨팅 장치(12)의 내부에 포함될 수도 있고, 컴퓨팅 장치(12)와는 구별되는 별개의 장치로 컴퓨팅 장치(102)와 연결될 수도 있다.Computing device 12 may also include one or more input/output interfaces 22 and one or more network communication interfaces 26 that provide interfaces for one or more input/output devices 24. The input/output interface 22 and the network communication interface 26 are connected to the communication bus 18. The input/output device 24 may be connected to other components of the computing device 12 through the input/output interface 22. The exemplary input/output device 24 includes a pointing device (mouse or track pad, etc.), a keyboard, a touch input device (touch pad or touch screen, etc.), a voice or sound input device, various types of sensor devices, and/or photographing devices. Input devices, and/or output devices such as display devices, printers, speakers, and/or network cards. The exemplary input/output device 24 may be included in the computing device 12 as a component constituting the computing device 12, and may be connected to the computing device 102 as a separate device distinct from the computing device 12. May be.

한편, 본 발명의 실시예는 본 명세서에서 기술한 방법들을 컴퓨터상에서 수행하기 위한 프로그램, 및 상기 프로그램을 포함하는 컴퓨터 판독 가능 기록매체를 포함할 수 있다. 상기 컴퓨터 판독 가능 기록매체는 프로그램 명령, 로컬 데이터 파일, 로컬 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나, 또는 컴퓨터 소프트웨어 분야에서 통상적으로 사용 가능한 것일 수 있다. 컴퓨터 판독 가능 기록매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광 기록 매체, 및 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 상기 프로그램의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다.Meanwhile, an embodiment of the present invention may include a program for performing the methods described in the present specification on a computer, and a computer-readable recording medium including the program. The computer-readable recording medium may include a program command, a local data file, a local data structure, or the like alone or in combination. The media may be specially designed and configured for the present invention, or may be commonly used in the field of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and specially configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Includes hardware devices. Examples of the program may include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상에서 본 발명의 대표적인 실시예들을 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 청구범위뿐만 아니라 이 청구범위와 균등한 것들에 의해 정해져야 한다.Although the exemplary embodiments of the present invention have been described in detail above, those of ordinary skill in the art to which the present invention pertains will understand that various modifications can be made to the above-described embodiments without departing from the scope of the present invention. . Therefore, the scope of the present invention is limited to the described embodiments and should not be determined, and should not be determined by the claims to be described later, but also by the claims and equivalents.

10: 컴퓨팅 환경
12: 컴퓨팅 장치
14: 프로세서
16: 컴퓨터 판독 가능 저장 매체
18: 통신 버스
20: 프로그램
22: 입출력 인터페이스
24: 입출력 장치
26: 네트워크 통신 인터페이스
100: 영상 스타일 변환 모델 학습 장치
110: 모델 학습부
120: 손실 계산부
200: 영상 스타일 변환 모델
210: 인코더
220: 변환부
230: 디코더
400: 영상 스타일 변환 모델 실행 장치
410: 입력부
420: 스타일 변환 영상 생성부10: computing environment
12: computing device
14: processor
16: computer readable storage medium
18: communication bus
20: program
22: input/output interface
24: input/output device
26: network communication interface
100: image style conversion model learning device
110: Model Learning Department
120: loss calculation unit
200: image style conversion model
210: encoder
220: conversion unit
230: decoder
400: image style conversion model execution device
410: input
420: style conversion image generation unit

Claims

A model learning unit that trains an image style conversion model for converting a style of the content image based on a content image, a style image, and one or more style parameters; And
A loss calculator configured to calculate a conversion loss based on the content image, a style conversion image for the content image, and the style image,
Each value of the one or more style parameters is randomly selected from values within a preset range each time the image style conversion model is repeatedly trained a preset number of times,
The model learning unit trains the image style transformation model based on the transformation loss,
The video style conversion model,
At least one encoder for generating a plurality of feature vectors from each of the content image and the style image;
At least one conversion unit for converting at least a portion of a plurality of feature vectors of the content image based on the one or more style parameters and a plurality of feature vectors of the style image; And
And one or more decoders configured to generate the style transformed image based on the transformed feature vector and a feature vector that has not been transformed from among a plurality of feature vectors of the content image.

The method according to claim 1,
The model learning unit, when all values of the one or more style parameters are minimum values within the preset range, trains the image style conversion model so that the image style conversion model uses the content image instead of the style image. Transformation model learning device.

delete

The method according to claim 1,
The values of each of the one or more style parameters are independently selected,
The at least one conversion unit converts at least a part of a plurality of feature vectors of the content image based on values of a style parameter corresponding to each of the at least one style parameter.

The method according to claim 1,
The one or more transforming units, respectively, receiving a feature vector of the content image generated by a corresponding encoder among the one or more encoders and a feature vector of the style image, the image style transformation model training apparatus.

The method according to claim 1,
The loss calculator calculates a first loss based on a content loss and a style loss, calculates a second loss based on the content image and the restored content image, and calculates a second loss based on the first loss and the second loss. A video style conversion model training device that calculates the conversion loss.

An input unit for obtaining a content image, a style image, and setting values for one or more style parameters; And
A style conversion image generator for generating a style conversion image for the content image by using a pre-learned image style conversion model that uses the content image, the style image, and setting values for the one or more style parameters as input data, ,
The setting value for the one or more style parameters is a value selected within a preset range,
The video style conversion model,
At least one encoder for generating a plurality of feature vectors from each of the content image and the style image;
At least one conversion unit for converting at least a portion of a plurality of feature vectors of the content image based on the one or more style parameters and a plurality of feature vectors of the style image; And
And one or more decoders configured to generate the style transformed image based on the transformed feature vector and a feature vector that has not been transformed from among a plurality of feature vectors of the content image.

The method of claim 7,
The style conversion image generation unit, when all of the set values for the one or more style parameters are minimum values within the preset range, controls the image style conversion model to use the content image instead of the style image. Running device.

delete

The method of claim 7,
Setting values for the one or more style parameters are independently selected,
The at least one conversion unit converts at least a part of a plurality of feature vectors of the content image based on a setting value for each corresponding style parameter among the at least one style parameter.

The method of claim 7,
The one or more transforming units, respectively, receiving a feature vector of the content image generated by a corresponding encoder among the one or more encoders and a feature vector of the style image, a video style conversion model execution apparatus.

Inputting input data including values of each of a content image, a style image, and one or more style parameters into an image style conversion model;
Calculating a conversion loss based on the style converted image generated by the image style conversion model based on the input data, the content image, and the style image; And
And learning a learning parameter of the image style transformation model based on the transformation loss,
The inputting, learning, and calculating steps are repeatedly performed until a preset condition is satisfied,
Each value of the one or more style parameters is randomly selected from among values within a preset range each time the repetition is performed,
The video style conversion model,
At least one encoder for generating a plurality of feature vectors from each of the content image and the style image;
At least one conversion unit for converting at least a portion of a plurality of feature vectors of the content image based on the one or more style parameters and a plurality of feature vectors of the style image; And
And at least one decoder for generating a style transformed image for the content image based on the transformed feature vector and an unconverted feature vector among a plurality of feature vectors of the content image.

The method of claim 12,
The training may include training the image style transformation model so that the image style transformation model uses the content image instead of the style image when all values of each of the one or more style parameters are minimum values within the preset range, How to train the image style transformation model.

delete

The method of claim 12,
The values of each of the one or more style parameters are independently selected,
The at least one conversion unit converts at least a part of a plurality of feature vectors of the content image based on values of a style parameter corresponding to each of the at least one style parameter.

The method of claim 12,
The one or more transforming units each receive a feature vector of the content image and a feature vector of the style image generated by a corresponding encoder among the one or more encoders, respectively.

The method of claim 12,
The calculating may include calculating a first loss based on a content loss and a style loss, calculating a second loss based on the content image and the restored content image, and calculating a second loss based on the first loss and the second loss. The method for training a video style conversion model to calculate the conversion loss.

Obtaining a content image, a style image, and setting values for one or more style parameters; And
Generating a style conversion image for the content image using a pre-learned image style conversion model using the content image, the style image, and a setting value for the one or more style parameters as input data,
The setting value for the one or more style parameters is a value selected within a preset range,
The video style conversion model,
At least one encoder for generating a plurality of feature vectors from each of the content image and the style image;
At least one conversion unit for converting at least a portion of a plurality of feature vectors of the content image based on the one or more style parameters and a plurality of feature vectors of the style image; And
And at least one decoder for generating a style transformed image for the content image based on the transformed feature vector and an unconverted feature vector among a plurality of feature vectors of the content image.

The method of claim 18,
In the generating step, when all the setting values of the one or more style parameters are minimum values within the preset range, the video style conversion model is controlled to use the content video instead of the style video, executing a video style conversion model. Way.

delete

The method of claim 18,
Setting values for the one or more style parameters are independently selected,
The at least one conversion unit converts at least a part of a plurality of feature vectors of the content image based on a setting value for each corresponding style parameter among the at least one style parameter.

The method of claim 18,
The one or more transforming units respectively receive a feature vector of the content image and a feature vector of the style image generated by a corresponding encoder among the one or more encoders.