KR102507460B1

KR102507460B1 - Cartoon background automatic generation method and apparatus

Info

Publication number: KR102507460B1
Application number: KR1020210043823A
Authority: KR
Inventors: 황인준; 이지은
Original assignee: 고려대학교 산학협력단
Priority date: 2021-04-05
Filing date: 2021-04-05
Publication date: 2023-03-07
Also published as: KR20220138112A

Abstract

카툰 배경 자동 생성 방법 및 그 장치가 개시된다. 카툰 배경 자동 생성 방법은, (a) 원본 이미지를 이용하여 불필요한 영역을 마스크로 선택받은 후 마스크 이미지를 생성하는 단계; (b) 상기 원본 이미지와 상기 마스크 이미지를 학습된 제1 신경망 기반 복원 모델에 적용하여 상기 원본 이미지에서 마스크 영역이 제거된 복원 이미지를 생성하는 단계; 및 (c) 상기 복원 이미지를 학습된 제2 신경망 기반 스타일 전이 모델에 적용하여 타겟 스타일 이미지의 윤곽선 특징과 추상화 특징을 반영한 작가의 카툰 스타일로 전이시켜 카툰 배경 이미지를 생성하는 단계를 포함한다. A method and device for automatically generating a cartoon background are disclosed. The method of automatically generating a cartoon background includes: (a) generating a mask image after selecting an unnecessary area as a mask using an original image; (b) generating a reconstructed image from which a mask region is removed from the original image by applying the original image and the mask image to a first neural network-based reconstruction model; and (c) generating a cartoon background image by applying the reconstructed image to the learned second neural network-based style transition model to transform the target style image into an author's cartoon style reflecting the contour and abstraction characteristics of the target style image.

Description

Cartoon background automatic generation method and apparatus {Cartoon background automatic generation method and apparatus}

본 발명은 적대적 생성 신경망을 이용한 카툰 배경 자동 생성 방법 및 그 장치에 관한 것이다. The present invention relates to a method and apparatus for automatically generating a cartoon background using an adversarial generative neural network.

인터넷이 발전함에 따라 오프라인상에서만 접할 수 있었던 카툰을 PC나 스마트폰 환경에서도 손쉽게 접할 수 있게 되었다. 웹툰은 인터넷을 통해 연재하고 배포하는 카툰으로 웹(web)과 카툰(cartoon)의 합성어이다. 작가들은 행해야 하는 작업의 종류(스토리 구상, 콘티 작성, 스케치, 펜 터치, 채색, 배경 작업 등)와 양이 많아 대부분의 작가들이 정신적, 육체적 건강 악화에 시달리고 있다. With the development of the Internet, cartoons that were only available offline can now be easily accessed on PCs or smartphones. Webtoon is a compound word of web and cartoon, which is a cartoon that is serialized and distributed through the Internet. Most writers are suffering from deteriorating mental and physical health due to the type and amount of work that writers have to do (story design, storyboard writing, sketching, pen touch, coloring, background work, etc.).

카툰의 배경 이미지를 만드는 작업은 창의력을 요구하는 창작의 영역 보다는 반복적인 단순 노동의 영역에 가까운 경향이 있다. 종래에는 배경 이미지를 만들기 위해 포토샵, 클립스튜디오와 같은 편집 프로그램을 이용해 원본 사진에 그림 효과를 주거나 원본 사진을 트레이싱 하는 방법이 종종 사용되었다. The task of creating cartoon background images tends to be closer to the realm of repetitive simple labor than the realm of creation that requires creativity. Conventionally, in order to create a background image, a method of applying a picture effect to an original photo or tracing the original photo using an editing program such as Photoshop or Clip Studio was often used.

그러나, 종래의 이들 프로그램은 작가의 그림체나 스타일을 적용하지는 못하는 문제점이 있다. However, these conventional programs have a problem in that the artist's drawing style or style cannot be applied.

본 발명은 적대적 생성 신경망을 이용한 카툰 배경 자동 생성 방법 및 그 장치를 제공하기 위한 것이다. An object of the present invention is to provide a method and apparatus for automatically generating a cartoon background using an adversarial generation neural network.

본 발명은 사진 등을 작가의 화풍을 반영한 카툰 배경으로 자동 생성 가능케 하여 단순 노동에 따른 작업 시간을 줄일 수 있을 뿐만 아니라 작가들이 창작 작업에 더욱 집중하도록 할 수 있는 적대적 생성 신경망을 이용한 카툰 배경 자동 생성 방법 및 그 장치를 제공하기 위한 것이다. The present invention enables automatic generation of a cartoon background reflecting the artist's painting style from photos, etc., thereby reducing work time due to simple labor, as well as automatically generating a cartoon background using an adversarial generative neural network that enables artists to focus more on their creative work It is to provide a method and an apparatus therefor.

본 발명의 일 측면에 따르면, 적대적 생성 신경망을 이용한 카툰 배경 자동 생성 방법이 제공된다. According to one aspect of the present invention, a method for automatically generating a cartoon background using an adversarial generation neural network is provided.

본 발명의 일 실시예에 따르면, (a) 원본 이미지를 이용하여 불필요한 영역을 마스크로 선택받은 후 마스크 이미지를 생성하는 단계; (b) 상기 원본 이미지와 상기 마스크 이미지를 학습된 제1 신경망 기반 복원 모델에 적용하여 상기 원본 이미지에서 마스크 영역이 제거된 복원 이미지를 생성하는 단계; 및 (c) 상기 복원 이미지를 학습된 제2 신경망 기반 스타일 전이 모델에 적용하여 타겟 스타일 이미지의 윤곽선 특징과 추상화 특징을 반영한 작가의 카툰 스타일로 전이시켜 카툰 배경 이미지를 생성하는 단계를 포함하는 카툰 배경 자동 생성 방법이 제공될 수 있다. According to one embodiment of the present invention, (a) generating a mask image after selecting an unnecessary area as a mask using an original image; (b) generating a reconstructed image from which a mask region is removed from the original image by applying the original image and the mask image to a first neural network-based reconstruction model; and (c) generating a cartoon background image by applying the reconstructed image to the learned second neural network-based style transition model to transform the target style image into a cartoon style reflecting the outline and abstraction characteristics of the target style image. An automatic generation method may be provided.

상기 타겟 스타일 이미지는 특정 작가의 카툰 이미지이며, 상기 제2 신경망 기반 스타일 전이 모델은 스타일 전이 강도 UI를 통해 설정되는 윤곽선 강도값과 추상화 강도값을 초매개변수로 반영하여 상기 카툰 이미지의 윤곽선 강도와 추상화 강도를 변경하여 상기 카툰 배경 이미지를 생성할 수 있다. The target style image is a cartoon image of a specific artist, and the second neural network-based style transition model reflects the outline intensity value and the abstraction intensity value set through the style transition intensity UI as hyperparameters to determine the outline intensity and The cartoon background image can be created by changing the abstraction strength.

상기 제2 신경망 기반 스타일 전이 모델은 하기 수학식을 이용하여 학습되되, The second neural network-based style transition model is learned using the following equation,

여기서,

는 카툰 배경 이미지와 타겟 스타일 이미지간의 적대적 손실 함수를 나타내며,

는 윤곽선 강도값과 추상화 강도값을 나타내는 초매개변수이며,

는 카툰 배경 이미지와 타겟 스타일 이미지간 각각에서 추출된 윤곽선 정보를 기반으로 도출되는 윤곽선에 대한 적대적 손실 함수이며,

는 카툰 배경 이미지와 타겟 스타일 이미지 각각에서 추출된 추상화 표현 정보를 기반으로 도출되는 추상화 표현에 대한 적대적 손실 함수이며,

는 카툰 배경 이미지와 타겟 스타일 이미지간의 불변성에 대한 손실 함수이다. here,

Represents the adversarial loss function between the cartoon background image and the target style image,

is a hyperparameter representing the contour strength value and the abstraction strength value,

is an adversarial loss function for the contour derived based on the contour information extracted from each of the cartoon background image and the target style image,

is an adversarial loss function for the abstract expression derived based on the abstract expression information extracted from each of the cartoon background image and the target style image,

is the loss function for the invariance between the cartoon background image and the target style image.

상기 스타일 전이 강도 UI는 윤곽선 강도 축과 추상화 강도 축을 가지는 2차원 그래프 UI로 제공되되, 상기 2차원 그래프 UI상에서 선택되는 포인트에 따라 윤곽선 강도값과 추상화 강도값이 변경되며, 상기 변경된 윤곽선 강도값과 추상화 강도값을 고려하여 상기 복원 이미지를 카툰 스타일로 전이된 카툰 배경 이미지가 출력 화면에 실시간으로 디스플레이될 수 있다. The style transition intensity UI is provided as a two-dimensional graph UI having an outline intensity axis and an abstraction intensity axis, and the outline intensity value and the abstraction intensity value are changed according to a point selected on the two-dimensional graph UI, and the changed outline intensity value and A cartoon background image obtained by converting the reconstructed image into a cartoon style in consideration of an abstraction strength value may be displayed on an output screen in real time.

본 발명의 다른 측면에 따르면, 적대적 생성 신경망을 이용한 카툰 배경 자동 생성 장치가 제공된다. According to another aspect of the present invention, an apparatus for automatically generating a cartoon background using an adversarial generating neural network is provided.

본 발명의 일 실시예에 따르면, 원본 이미지를 이용하여 불필요한 영역을 마스크로 선택받은 후 마스크 이미지를 생성하는 마스크 생성부; 상기 원본 이미지와 상기 마스크 이미지를 학습된 제1 신경망 기반 복원 모델에 적용하여 상기 원본 이미지에서 마스크 영역이 제거된 복원 이미지를 생성하는 복원부; 및 상기 복원 이미지를 학습된 제2 신경망 기반 스타일 전이 모델에 적용하여 타겟 스타일 이미지의 윤곽선 특징과 추상화 특징을 반영한 카툰 스타일로 전이시켜 카툰 이미지를 생성하는 스타일 전이부를 포함하는 카툰 배경 자동 생성 장치가 제공될 수 있다. According to one embodiment of the present invention, a mask generator for generating a mask image after selecting an unnecessary region as a mask using an original image; a restoration unit generating a restoration image in which a mask region is removed from the original image by applying the original image and the mask image to a first neural network-based restoration model; and a style transition unit for generating a cartoon image by applying the reconstructed image to a learned second neural network-based style transition model and transitioning the restored image into a cartoon style reflecting the outline and abstraction characteristics of the target style image. It can be.

본 발명의 일 실시예에 따른 적대적 생성 신경망을 이용한 카툰 배경 자동 생성 방법 및 그 장치를 제공함으로써, 사진 등을 작가의 화풍을 반영한 카툰 배경으로 자동 생성 가능케 하여 단순 노동에 따른 작업 시간을 줄일 수 있을 뿐만 아니라 작가들이 창작 작업에 더욱 집중하도록 할 수 있다. By providing a method and apparatus for automatically generating a cartoon background using an adversarial generative neural network according to an embodiment of the present invention, it is possible to automatically create a cartoon background reflecting an artist's painting style, thereby reducing work time due to simple labor. It can also free writers to focus more on their creative work.

도 1은 본 발명의 일 실시예에 따른 카툰 배경 자동 생성 방법을 나타낸 순서도.
도 2는 본 발명의 일 실시예에 따른 원본 이미지와 마스크 이미지를 설명하기 위해 도시한 도면.
도 3은 본 발명의 일 실시예에 따른 복원 이미지를 설명하기 위해 도시한 도면.
도 4는 본 발명의 일 실시예에 따른 제1 신경망 기반 복원 모델의 생성 모듈의 세부 구조를 도시한 도면.
도 5는 본 발명의 일 실시예에 따른 제1 신경망 기반 복원 모델의 판별 모듈의 세부 구조를 도시한 도면.
도 6은 본 발명의 일 실시예에 따른 제2 신경망 기반 스타일 전이 모듈의 세부 구조를 도시한 도면.
도 7 내지 도 9는 본 발명의 일 실시예에 따른 스타일 전이 강도 UI를 설명하기 위해 도시한 도면.
도 10 내지 도 14는 본 발명의 일 실시예에 따른 원본 이미지에서 카툰 이미지를 생성하는 과정에 대한 인터페이스 출력 화면을 예시한 도면.
도 15는 본 발명의 일 실시예에 따른 카툰 배경 자동 생성 장치의 내부 구성을 개략적으로 도시한 블록도.
도 16은 본 발명의 일 실시예에 따른 원본 이미지를 이용하여 생성된 카툰 배경 이미지를 예시한 도면.1 is a flowchart illustrating a method for automatically generating a cartoon background according to an embodiment of the present invention.
2 is a diagram illustrating an original image and a mask image according to an embodiment of the present invention;
3 is a diagram for explaining a reconstructed image according to an embodiment of the present invention;
4 is a diagram showing a detailed structure of a generation module of a first neural network-based reconstruction model according to an embodiment of the present invention;
5 is a diagram showing a detailed structure of a discrimination module of a first neural network-based reconstruction model according to an embodiment of the present invention.
6 is a diagram showing a detailed structure of a second neural network-based style transition module according to an embodiment of the present invention;
7 to 9 are diagrams for explaining a style transition strength UI according to an embodiment of the present invention.
10 to 14 illustrate interface output screens for a process of generating a cartoon image from an original image according to an embodiment of the present invention.
15 is a block diagram schematically showing the internal configuration of an automatic cartoon background generating device according to an embodiment of the present invention.
16 is a diagram illustrating a cartoon background image created using an original image according to an embodiment of the present invention.

본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.Singular expressions used herein include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "consisting of" or "comprising" should not be construed as necessarily including all of the various components or steps described in the specification, and some of the components or some of the steps It should be construed that it may not be included, or may further include additional components or steps. In addition, terms such as "...unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. .

이하, 첨부된 도면들을 참조하여 본 발명의 실시예를 상세히 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 카툰 배경 자동 생성 방법을 나타낸 순서도이고, 도 2는 본 발명의 일 실시예에 따른 원본 이미지와 마스크 이미지를 설명하기 위해 도시한 도면이고, 도 3은 본 발명의 일 실시예에 따른 복원 이미지를 설명하기 위해 도시한 도면이며, 도 4는 본 발명의 일 실시예에 따른 제1 신경망 기반 복원 모델의 생성 모듈의 세부 구조를 도시한 도면이고, 도 5는 본 발명의 일 실시예에 따른 제1 신경망 기반 복원 모델의 판별 모듈의 세부 구조를 도시한 도면이며, 도 6은 본 발명의 일 실시예에 따른 제2 신경망 기반 스타일 전이 모듈의 세부 구조를 도시한 도면이고, 도 7 내지 도 9는 본 발명의 일 실시예에 따른 스타일 전이 강도 UI를 설명하기 위해 도시한 도면이고, 도 10 내지 도 14는 본 발명의 일 실시예에 따른 원본 이미지에서 카툰 이미지를 생성하는 과정에 대한 인터페이스 출력 화면을 예시한 도면이다. 1 is a flowchart illustrating a method for automatically generating a cartoon background according to an embodiment of the present invention, FIG. 2 is a diagram for explaining an original image and a mask image according to an embodiment of the present invention, and FIG. This is a diagram for explaining a reconstructed image according to an embodiment of the present invention, FIG. 4 is a diagram showing a detailed structure of a generation module of a first neural network-based restoration model according to an embodiment of the present invention, and FIG. This is a diagram showing the detailed structure of a discrimination module of a first neural network-based reconstruction model according to an embodiment of the present invention, and FIG. 6 shows a detailed structure of a second neural network-based style transition module according to an embodiment of the present invention. FIGS. 7 to 9 are diagrams for explaining a style transition intensity UI according to an embodiment of the present invention, and FIGS. 10 to 14 show a cartoon image from an original image according to an embodiment of the present invention. It is a diagram illustrating an interface output screen for the creation process.

단계 110에서 카툰 배경 자동 생성 장치(100)는 원본 이미지를 입력받고, 해당 원본 이미지에서 불필요한 영역을 지정하여 마스크 이미지를 생성한다. In step 110, the apparatus 100 for automatically generating a cartoon background receives an original image and designates an unnecessary area in the original image to generate a mask image.

여기서, 원본 이미지는 사진일 수 있다. 이러한 사진 영상을 카툰 배경으로 만들기 위해서는 배경과 함께 찍힌 불필요한 영역을 제거한 후 작가의 그림체나 스타일에 적합하도록 변경하는 과정을 거쳐야 한다. Here, the original image may be a photograph. In order to make such a photo image into a cartoon background, it is necessary to go through the process of changing it to suit the artist's drawing style or style after removing unnecessary areas taken with the background.

이를 위해, 카툰 배경 자동 생성 장치는 카툰 배경으로 변경할 원본 사진을 입력받은 후 해당 원본 사진에서 불필요한 영역(즉, 사람등과 같은 객체 영역)을 지정하여 마스크 이미지를 생성한다. 마스크 이미지는 원본 이미지에서 지정된 불필요한 영역에 대한 정보만을 포함하되, 해당 지정된 불필요한 영역의 픽셀 값을 0으로 바꾸기 위해 이용될 수 있다. To this end, the device for automatically creating a cartoon background generates a mask image by designating an unnecessary area (ie, an object area such as a person) in the original picture after receiving an original picture to be changed into a cartoon background. The mask image includes only information on a designated unnecessary area in the original image, but may be used to change pixel values of the designated unnecessary area to 0.

또한, 카툰 배경 자동 생성 장치는 원본 이미지에서 불필요한 영역이 마스크로 가려진 입력 이미지를 생성할 수 있다. 도 2의 (a)는 원본 이미지가 예시되어 있으며, 도 2의 (b)에는 마스크 이미지가 예시되어 있으며, 도 2의 (c)에는 입력 이미지가 예시되어 있다. Also, the apparatus for automatically generating a cartoon background may generate an input image in which an unnecessary region of the original image is covered with a mask. In (a) of FIG. 2, an original image is illustrated, in (b) of FIG. 2, a mask image is illustrated, and in (c) of FIG. 2, an input image is illustrated.

단계 115에서 카툰 배경 자동 생성 장치(100)는 입력 이미지와 마스크 이미지를 제1 신경망 기반 복원 모델에 적용하여 마스크 이미지가 제거된 복원 이미지를 생성한다. In step 115, the apparatus 100 for automatically generating a cartoon background generates a reconstructed image from which the mask image is removed by applying the input image and the mask image to a first neural network-based reconstruction model.

도 3의 (a)는 원본 이미지가 예시되어 있으며, 도 3의 (b)에는 원본 이미지에서 불필요한 영역이 지정된 입력 이미지가 예시되어 있으며, 도 3의 (c)에는 불필요한 영역을 제거하여 원본 이미지의 주변 픽셀로 복원한 복원 이미지가 예시되어 있다. 3(a) illustrates an original image, FIG. 3(b) illustrates an input image in which unnecessary areas are designated in the original image, and FIG. 3(c) shows the original image by removing unnecessary areas. A reconstructed image reconstructed with neighboring pixels is exemplified.

이하에서는 원본 이미지에서 불필요한 영역을 제거하여 주변 픽셀로 복원한 복원 이미지를 생성하는 제1 신경망 기반 복원 모델에 대해 상세히 설명하기로 한다. Hereinafter, a first neural network-based reconstruction model for generating a reconstructed image reconstructed with neighboring pixels by removing unnecessary regions from an original image will be described in detail.

복원 모델은 적대적 생성 신경망(GAN: Generative Adversarial Network) 기반 모델일 수 있다. The restoration model may be a Generative Adversarial Network (GAN) based model.

이러한 복원 모델은 생성 모듈과 판별 모듈을 포함하여 구성된다. This reconstruction model includes a generating module and a discriminating module.

생성 모듈은 제1 생성 모듈과 제2 생성 모듈을 가지는 2단계 네트워크로 구성되며, 인코더-디코더 구조를 가질 수 있다. The generation module is composed of a two-level network having a first generation module and a second generation module, and may have an encoder-decoder structure.

생성 모듈의 상세 구조는 도 4에 도시된 바와 같다. The detailed structure of the generating module is as shown in FIG. 4 .

제1 생성 모듈은 원본 이미지에 마스크 이미지를 적용한 입력 이미지와 마스크 이미지를 입력받아 이미지를 개략적으로 복원하여 제1 단계 복원 이미지를 출력한다. The first generation module receives an input image obtained by applying a mask image to an original image and a mask image, roughly restores the image, and outputs a first stage restored image.

도 4에서 도시된 바와 같이, 제1 생성 모듈은 인코더-디코더 구조로 구성되며, Gated Convolution과 Dilated Gated Convolution을 기반으로 구성된다. As shown in FIG. 4, the first generation module has an encoder-decoder structure and is based on gated convolution and dilated gated convolution.

제2 생성 모듈은 제1 생성 모듈의 출력 결과인 제1 단계 복원 이미지를 이용하여 주변 영역을 활용하여 정밀 복원을 수행하여 제2 단계 복원 이미지를 출력할 수 있다. 이를 위해, 제2 생성 모듈은 Contextual Attention 모듈을 가지며, 이를 통해 주변 영역을 활용하여 정밀 복원을 수행하여 제2 단계 복원 이미지를 생성할 수 있다. The second generation module may output the second stage reconstructed image by performing precise restoration using the surrounding area using the first stage reconstructed image, which is an output result of the first generation module. To this end, the second generation module has a contextual attention module, and through this, it is possible to generate a second stage reconstructed image by performing precise restoration using the surrounding area.

2단계 네트워크 구조를 가지는 생성 모듈을 통해 최종 출력된 생성 이미지는 판별 모듈을 통해 진짜 데이터인지 가짜 데이터인지 판별된다. 판별 모듈의 상세 구조는 도 5에 도시된 바와 같다. The generated image finally output through the generation module having a two-step network structure is determined whether it is real data or fake data through the discrimination module. The detailed structure of the determination module is as shown in FIG. 5 .

판별 모듈은 PatchGAN 구조를 가지며, 생성 이미지의 전체 영역이 아니라 특정 크기의 패치 단위로 진짜와 가짜를 판별함으로써 기존의 전체 이미지를 판별하는 방법보다 학습 속도가 빠르며 고해상도의 이미지를 얻을 수 있는 장점이 있다. The discrimination module has a PatchGAN structure, and by discriminating real from fake in units of patches of a specific size rather than the entire area of the generated image, the learning speed is faster than the existing method of discriminating the entire image, and it has the advantage of obtaining high-resolution images. .

또한, 본 발명의 일 실시예에 따른 판별 모듈은 Spectral Normalization을 사용함으로써 GAN 학습을 더욱 안정화시킬 수 있는 이점이 있다. In addition, the discrimination module according to an embodiment of the present invention has an advantage of further stabilizing GAN learning by using spectral normalization.

복원 모듈은 손실 함수를 통해 학습되며, 수학식 1은 생성 모듈을 학습시키는데 이용되는 손실 함수이며, 수학식 2는 판별 모듈을 학습시키는데 이용되는 손실 함수이다. The restoration module is learned through a loss function, Equation 1 is a loss function used to learn the generation module, and Equation 2 is a loss function used to learn the discrimination module.

여기서,

는 초매개변수(hyper-parameter)를 나타내며, 사용자가 설정할 수 있다. 예를 들어,

와 같이 초매개변수가 설정될 수 있다. 이는 일 예일 뿐이며, 매개변수의 값은 적용 방법에 따라 다양하게 설정될 수 있다. here,

represents a hyper-parameter, which can be set by the user. for example,

Hyperparameters can be set as This is just an example, and the value of the parameter may be set in various ways according to the application method.

는 수학식 3과 같이 나타낼 수 있다.

can be expressed as in Equation 3.

여기서,

는 복원 모듈을 통해 생성된 이미지(

)에서 마스크되지 않은 영역을 원본 이미지(

)로 바꾼 이미지이다. here,

is the image created through the restore module (

) in the original image (

) is the image changed to .

는 L1-손실 함수(L1 loss function)가 적용되며 원본 이미지(

)와 복원 모듈을 통해 생성된 이미지(

)이 일치되도록 유도하며, 수학식 4와 같다.

is applied to the L1-loss function and the original image (

) and the image created through the restore module (

) is induced to match, and is shown in Equation 4.

여기서,

는 제1 단계 생성 이미지를 나타내고,

는 제2 단계 생성 이미지를 나타낸다. here,

Represents the first stage generated image,

denotes a second stage generated image.

또한,

는 수학식 5와 같이 도출될 수 있다. also,

Can be derived as in Equation 5.

여기서,

는 L1 손실 함수를 적용하지만, ImageNet으로 사전 학습된 VGG-16 네트워크를 이용해 이미지를 특징 공간으로 투영한 후 계산된다.

는 p번째 레이어의 특징맵을 나타내고, pool1, pool2, pool3 레이어가 사용될 수 있다. 또한,

는

의 원소의 개수를 나타낸다. here,

applies the L1 loss function, but is computed after projecting the image into feature space using the VGG-16 network pre-trained with ImageNet.

denotes a feature map of the p-th layer, and pool1, pool2, and pool3 layers may be used. also,

Is

represents the number of elements in

는 수학식 6을 이용하여 도출될 수 있다.

Can be derived using Equation 6.

여기서,

는 두 이미지 사이의 스타일 차이를 줄여주는 함수이다. 스타일이란 서로 다른 특징맵간의 상관관계로 정의하기로 한다. 사전 학습된 VGG-16 네트워크를 이용해 서로 다른 이미지의 특징맵을 추출하고 특징맵 간의 상관관계를 계산하기 위해 그람 행렬(gram matrix)이 사용될 수 있다. here,

is a function that reduces the style difference between two images. The style will be defined as a correlation between different feature maps. A gram matrix may be used to extract feature maps of different images using a pretrained VGG-16 network and calculate a correlation between feature maps.

는 그람 행렬이다.

is the Gram matrix.

복원 모듈을 통해 복원된 이미지는 도 3의 (c)에 도시된 바와 같다. 도 3에서 보여지는 바와 같이, 원본 이미지에서 불필요한 영역을 지정하여 마스크 설정하고, 마스크 부분을 원본 이미지의 주변 픽셀을 이용하여 복원하여 불필요한 영역을 제거하여 카툰 배경으로 이용하게 적합한 복원 이미지가 생성될 수 있다. The image restored through the restoration module is shown in (c) of FIG. 3 . As shown in FIG. 3, a restored image suitable for use as a cartoon background can be created by specifying an unnecessary area in the original image to set a mask, restoring the mask portion using pixels around the original image, and removing the unnecessary area. there is.

복원 이미지는 사진 이미지를 복원한 것이며, 작가의 화풍이나 스타일이 반영되지 않아 카툰 배경으로 그대로 이용하기에는 적합하지 않다. The restored image is a restored photographic image, and is not suitable for use as a cartoon background as it does not reflect the artist's painting style or style.

따라서, 단계 120에서 카툰 배경 자동 생성 장치(100)는 복원 이미지를 제2 신경망 기반 스타일 전이 모델에 적용하여 타겟 스타일 이미지의 윤곽선과 추상화 표현을 반영하여 카툰 스타일로 전이된 카툰 이미지를 생성한다. Accordingly, in step 120, the apparatus 100 for automatically generating a cartoon background applies the reconstructed image to a second neural network-based style transition model to reflect the outline and abstract expression of the target style image to generate a cartoon image transitioned into a cartoon style.

제2 신경망 기반 스타일 전이 모델은 타겟 스타일 이미지의 윤곽선과 추상화 표현을 기반으로 복원 이미지를 카툰 스타일로 전이하도록 생성된 모델이다. 이하에서는 도 6을 참조하여 제2 신경망 기반 스타일 전이 모델에 대해 보다 상세히 설명하기로 한다. The second neural network-based style transition model is a model generated to transfer a reconstructed image into a cartoon style based on the outline and abstract expression of the target style image. Hereinafter, the second neural network-based style transition model will be described in more detail with reference to FIG. 6 .

제2 신경망 기반 스타일 전이 모델은 GAN 기반 모델이다. 따라서, 제2 신경망 기반 스타일 전이 모델은 도 6에 도시된 바와 같이, 생성 모듈과 판별 모듈을 포함한다. The second neural network-based style transfer model is a GAN-based model. Accordingly, the second neural network-based style transition model includes a generating module and a discriminating module, as shown in FIG. 6 .

생성 모듈은 제1 신경망 기반 복원 모델을 통해 복원된 복원 이미지를 입력받은 후 타겟 스타일 이미지의 윤곽선과 추상화 표현을 반영하여 카툰 스타일 이미지를 생성할 수 있다. 이를 위해, 생성 모듈은 U-Net 구 조를 가지며, 컨볼루션 레이어, Leaky ReLu, bilinear-resize 레이어로 구성될 수 있다. The generating module may generate a cartoon style image by receiving the reconstructed image reconstructed through the first neural network-based restoration model and reflecting the outline and abstract expression of the target style image. To this end, the generation module has a U-Net structure and can be composed of a convolution layer, a leaky ReLu, and a bilinear-resize layer.

판별 모듈은 생성 모듈에서 생성된 카툰 이미지와 타겟 스타일 이미지를 판별하여 두 이미지의 스타일을 비슷하게 만들기 위한 수단이다. 판별 모듈은 PatchGAN 구조를 가지며, Spectral Normalization을 사용함으로써 GAN 학습을 더욱 안정화시킬 수 있다. The discrimination module is a means for making the styles of the two images similar by discriminating between the cartoon image generated by the generation module and the target style image. The discrimination module has a PatchGAN structure, and GAN learning can be further stabilized by using spectral normalization.

제2 신경망 기반 스타일 전이 모델은 타겟 스타일 이미지를 작가별로 군집화하여 학습시킴으로써 작가별 화풍 반영이 가능케 할 수 있다. The second neural network-based style transfer model can reflect the painting style of each artist by clustering target style images for each artist and learning them.

본 발명의 일 실시예에 따른 제2 신경망 기반 스타일 전이 모델은 카툰의 대표적인 특징인 윤곽선과 추상화 표현을 학습하도록 구성된다. The second neural network-based style transition model according to an embodiment of the present invention is configured to learn outlines and abstract expressions, which are typical characteristics of cartoons.

카툰은 일반적은 이미지나 사진과 달리 명확한 펜 선이 존재하며, 이를 통해 객체를 구분한다. 따라서, 제2 신경망 기반 스타일 전이 모델은 타겟 스타일 이미지에서 윤곽선 정보를 추출한 후 작가 카툰의 펜 선을 표현하도록 학습될 수 있다. 예를 들어, 윤곽선 정보는 Canny Edge 검출기를 통해 추출될 수 있다. 이에 대해 간략히 설명하기로 한다. Unlike general images or photos, cartoons have clear pen lines, and objects are identified through them. Accordingly, the second neural network-based style transfer model may be trained to express the pen lines of the artist's cartoon after extracting outline information from the target style image. For example, contour information may be extracted through a Canny Edge detector. We will briefly explain this.

우선 카툰 배경 자동 생성 장치(100)는 이미지에서 가우시안 필터(Gaussian filter)를 이용하여 노이즈를 줄여준 후 이미지의 수평 및 수직 방향을 미분하여 에지를 찾는다. 여기서, 에지는 이미지의 강도(intensity)가 급변하는 부분으로 정의될 수 있다. 또한, 카툰 배경 자동 생성 장치는 에지가 아님에도 불구하고 검출되는 영역이 존재할 수 있으므로 양 방향과 음 방향으로 에지 강도와 현재 픽셀의 에지 강도를 비교 판단하여 최대값이 아닌 약한 에지를 제거할 수 있다. 또한, 카툰 배경 자동 생성 장치는 임계값 설정을 통해 확실한 에지를 찾는 과정을 수행하여 이미지에서 윤곽선만을 추출하도록 할 수 있다. First, the apparatus 100 for automatically generating a cartoon background reduces noise in an image using a Gaussian filter, and then differentiates the horizontal and vertical directions of the image to find an edge. Here, an edge may be defined as a part where the intensity of an image rapidly changes. In addition, since the device for automatically generating a cartoon background may have a region that is detected even though it is not an edge, it is possible to remove a weak edge that is not the maximum value by comparing and determining the edge strength of the current pixel with the edge strength in the positive and negative directions. . In addition, the apparatus for automatically generating a cartoon background may extract only the outline from the image by performing a process of finding a certain edge through setting a threshold.

본 발명의 일 실시예에서는 제2 신경망 기반 스타일 전이 모델이 Canny Edge 검출기를 이용하는 것을 가정하여 이를 중심으로 설명하였으나, 이는 이해와 설명의 편의를 위한 것일 뿐 반드시 윤곽선 특징이 Canny Edge 검출기만을 이용하는 것은 아니며 공지된 에지 검출기는 모두 적용될 수 있다. In an embodiment of the present invention, the second neural network-based style transition model assumes that the Canny Edge detector is used, and the explanation is centered on this, but this is for convenience of understanding and explanation, and the contour feature does not necessarily use only the Canny Edge detector. Any known edge detector can be applied.

카툰의 또 다른 특징 중 하나는 객체들이 실체보다 추상화 되어 표현되는 것이다. 객체의 세밀한 텍스처들이 사라지고 표면은 매끄럽게 표현되는 특징이 있다. 따라서, 제2 신경망 기반 스타일 전이 모델은 타겟 스타일 이미지의 추상화 특징을 추출하여 사전 학습될 수 있다. Another characteristic of cartoons is that objects are expressed in abstract rather than substance. The detailed texture of an object disappears and the surface is expressed smoothly. Accordingly, the second neural network-based style transfer model may be pretrained by extracting abstract features of the target style image.

추상화 표현 특징은 예를 들어, 딥 가이드 필터(Deep Guided Filter)를 이용하여 추출될 수 있다. 딥 가이드 필터는 이미지에 스무딩 효과를 주면서 윤곽선 부근의 정보가 소실되어 이미지가 왜곡되는 것을 방지할 수 있다. Abstract expression features may be extracted using, for example, a deep guided filter. The deep guide filter can prevent image distortion due to loss of information around the contour while providing a smoothing effect to the image.

제2 신경망 기반 스타일 전이 모델은 상술한 바와 같이, 윤곽선과 추상화 특징을 고려하여 손실 함수를 학습할 수 있다. 제2 신경망 기반 스타일 전이 모델의 손실 함수는 수학식 7과 같다. As described above, the second neural network-based style transfer model may learn a loss function in consideration of contour lines and abstraction characteristics. The loss function of the second neural network-based style transfer model is shown in Equation 7.

여기서,

는 카툰 이미지와 타겟 스타일 이미지간의 적대적 손실 함수를 나타낸다. here,

represents the adversarial loss function between the cartoon image and the target style image.

는 수학식 8과 같이 도출될 수 있다.

Can be derived as in Equation 8.

여기서,

는 타겟 스타일 이미지(즉, 작가의 카툰 이미지)를 나타내며,

는 제2 신경망 기반 스타일 전이 모델을 통해 생성된 카툰 이미지를 나타낸다. 즉, 제2 신경망 기반 스타일 전이 모델은 입력되는 복원 이미지를 이용하여

와 같은 이미지를 생성하여 판별 모듈을 속이려 하고, 판별 모듈은

와

를 판별할 수 잇다. here,

represents the target style image (ie, the author's cartoon image),

represents a cartoon image generated through the second neural network-based style transfer model. That is, the second neural network-based style transition model uses the input restored image

Trying to trick the discrimination module by generating an image such as , and the discrimination module

and

can determine

제2 신경망 기반 스타일 전이 모델은 GAN 기반 모델로 생성 모듈과 판별 모듈이 적대적으로 싸우면서 학습해나가는 딥러닝 모델로써,

이

와 비슷해지도록 학습될 수 있다. The second neural network-based style transfer model is a GAN-based model and is a deep learning model in which the generation module and the discrimination module learn while fighting against each other.

this

can be learned to be similar to

는 초매개변수로, 윤곽선 강도값과 추상화 강도값을 나타낸다.

을 조절함으로써 도 7 내지 도 9에 도시된 바와 같이 스타일 조절이 가능하도록 할 수 있다.

is a hyperparameter, representing the contour intensity value and the abstraction intensity value.

It is possible to adjust the style as shown in FIGS. 7 to 9 by adjusting the .

는 윤곽선 강도를 조절하는 초매개변수로 값이 커지면 윤곽선이 진해지고 값이 작아지면 윤곽선이 연해진다.

는 추상화 강도를 조절하는 초매개변수로 값이 커지면 구조물 형태가 추상적이고 텍스처가 줄며 값이 작아지면 텍스처가 살아나게 된다.

is a hyperparameter that controls the strength of the outline. As the value increases, the outline becomes thicker, and as the value decreases, the outline becomes softer.

is a hyperparameter that controls the intensity of abstraction. As the value increases, the structure shape becomes abstract and the texture decreases, and as the value decreases, the texture becomes alive.

이와 같이, 타겟 스타일 이미지의 윤곽선과 추상화 특징을 반영하여 복원 이미지를 카툰 스타일로 변경함에 있어, 윤곽선 강도값과 추상화 강도값을 조절하여 편집의 자유도를 높일 수 있는 이점이 있다. In this way, in changing the reconstructed image into a cartoon style by reflecting the outline and abstraction characteristics of the target style image, there is an advantage in that the degree of freedom in editing can be increased by adjusting the outline intensity value and the abstraction intensity value.

는 카툰 이미지와 타겟 스타일 이미지의 윤곽선에 대한 적대적 손실 함수를 나타낸다.

denotes the adversarial loss function for the contours of the cartoon image and the target style image.

는 수학식 9와 같이 나타낼 수 있다.

Can be expressed as in Equation 9.

여기서,

는 이미지에서 추출된 윤곽선 정보(특징)을 나타낸다. 즉,

는 카툰 이미지(

)에서 추출한 윤곽선 특징이 타겟 스타일 이미지(

)에서 추출한 윤곽선 특징과 비슷해지도록 적대적으로 학습하는 손실 함수이다. here,

represents the contour information (feature) extracted from the image. in other words,

is a cartoon image (

), the contour features extracted from the target style image (

) is a loss function that learns adversarially to resemble the contour features extracted from

는 카툰 이미지와 타겟 스타일 이미지의 추상화 표현에 대한 적대적 손실 함수를 나타낸다.

denotes the adversarial loss function for abstract representations of cartoon images and target-style images.

는 수학식 10과 같이 나타낼 수 있다.

Can be expressed as in Equation 10.

여기서,

는 딥 가이드 필터를 이용하여 추출된 추상화 특징을 나타낸다.

는 카툰 이미지(

)에서 추출한 추상화 특징이 타겟 스타일 이미지(

)에서 추출한 추상화 특징과 비슷해지도록 적대적으로 학습하는 손실 함수이다. here,

represents an abstraction feature extracted using a deep guide filter.

is a cartoon image (

), the abstraction features extracted from the target style image (

) is a loss function that learns adversarially to resemble the abstraction features extracted from

는 수학식 11과 같이 나타낼 수 있다.

Can be expressed as in Equation 11.

는 타겟 스타일 이미지로 전이된 이미지와 복원된 카툰 이미지가 의미상 불변성을 갖도록 사용된다.

is used so that the image converted to the target style image and the restored cartoon image have semantic invariance.

전술한 바와 같이, 제2 신경망 기반 스타일 전이 모델은 복원 이미지를 타겟 스타일 이미지의 윤곽선 특징과 추상화 특징을 반영하여 카툰 스타일로 전이시켜 카툰 배경(이미지)를 생성할 수 있다. As described above, the second neural network-based style transition model may create a cartoon background (image) by transitioning the reconstructed image into a cartoon style by reflecting the outline and abstraction characteristics of the target style image.

도 10 내지 도 14는 원본 이미지에서 카툰 이미지를 생성하는 과정에 대한 인터페이스 출력 화면을 예시한 도면이다. 10 to 14 are diagrams illustrating interface output screens for a process of generating a cartoon image from an original image.

도 10에 도시된 바와 같이, 원본 이미지 불러오기를 선택하여 다수의 이미지 중 카툰 이미지로 변환할 원본 이미지를 로드하여 출력 화면의 일 영역에 디스플레이할 수 있다. As shown in FIG. 10 , an original image to be converted into a cartoon image among a plurality of images may be loaded and displayed on one area of the output screen by selecting Load original image.

도 11은 원본 이미지에서 마스크 이미지 생성을 위한 인터페이스를 도시한 것으로, 마스크 생성 버튼이 눌려지면 사용자가 선택할 수 있는 도구 툴이 출력되며, 도구 툴에서 어느 하나의 도구를 선택하여 불필요한 영역을 지정할 수 있다. 11 shows an interface for creating a mask image from an original image. When the mask creation button is pressed, a tool tool that can be selected by the user is output, and an unnecessary area can be specified by selecting any one tool from the tool tool. .

불필요한 영역 지정이 완료된 상태에서 마스크 생성 버튼이 다시 눌려지면 도 12에 도시된 바와 같이 불필요한 영역만 포함하는 마스크 이미지가 생성되며, 생성된 마스크 이미지가 출력 화면의 다른 영역을 통해 디스플레이될 수 있다. When the mask creation button is pressed again in a state in which the unnecessary area designation is completed, a mask image including only the unnecessary area is created as shown in FIG.

도 13과 같이 이미지 복원 버튼이 눌려지는 경우, 입력 이미지와 마스크 이미지가 제1 신경망 기반 복원 모델에 적용되어 마스크가 제거된 후 입력 이미지의 주변 픽셀을 기반으로 마스크 영역이 자연스럽게 복원된 결과 이미지(복원 이미지)가 출력 화면의 일 영역을 통해 출력될 수 있다. When the image restoration button is pressed as shown in FIG. 13, the input image and the mask image are applied to the first neural network-based restoration model to remove the mask, and then the resulting image in which the mask area is naturally restored based on the surrounding pixels of the input image (restoration image) may be output through one area of the output screen.

도 7에 예시된 바와 같이 스타일 전이 버튼이 눌려지면, 작가별 카툰 스타일 이미지를 선택할 수 있는 바가 출력된 후 타겟 스타일 이미지를 선택하면 스타일 강도를 설정할 수 있는 스타일 전이 강도 UI가 2차원 그래프 형태로 제공된다. 여기서, 2차원 그래프의 한 축은 윤곽선 강도이며, 다른 축은 추상화 강도를 나타낸다. 2차원 그래프에서 선택되는 포인트에 따라 윤곽선 강도값과 추상화 강도값이 결정될 수 있다. As illustrated in FIG. 7, when the style transition button is pressed, a bar for selecting a cartoon style image for each artist is displayed, and then when a target style image is selected, a style transition strength UI for setting style strength is provided in the form of a two-dimensional graph. do. Here, one axis of the two-dimensional graph represents the intensity of the outline, and the other axis represents the intensity of abstraction. A contour strength value and an abstraction strength value may be determined according to points selected from the 2D graph.

이와 같이, 윤곽선 강도값과 추상화 강도값이 결정되면, 복원 이미지가 제2 신경망 기반 스타일 전이 모델에 적용되며, 제2 신경망 기반 스타일 전이 모델을 통해 타겟 스타일 이미지의 윤곽선과 추상화 특징을 반영하여 카툰 스타일로 전이된 카툰 이미지가 생성되어 출력 화면의 일 영역에 디스플레이될 수 있다.In this way, when the contour strength value and the abstraction strength value are determined, the reconstructed image is applied to the second neural network-based style transition model, and the cartoon style is reflected by the contour line and abstraction characteristics of the target style image through the second neural network-based style transition model. A cartoon image transitioned to may be generated and displayed on one area of the output screen.

스타일 전이 강도 UI를 통해 윤곽선 강도값과 추상화 강도값을 다양하게 조절하여 카툰 이미지의 윤곽선과 추상화 표현을 변경할 수 있다.You can change the outline and abstract expression of a cartoon image by adjusting the outline intensity value and abstraction intensity value in various ways through the style transition intensity UI.

즉, 도 7은 중간 정도의 윤곽선 강도값과 추상화 강도값을 선택한 경우, 이에 따라 생성된 카툰 이미지가 출력 화면의 일 영역에 디스플레이된 일 예가 도시되어 있으며, 도 8은 윤곽선 강도값을 낮추고 추상화 강도값을 높여 생성된 카툰 이미지가 출력 화면의 일 영역에 디스플레이된 일 예가 도시되어 있다. 또한, 도 9에는 윤곽선 강도값을 높이고 추상화 강도값을 낮춘 경우 생성된 카툰 이미지가 출력 화면의 일 영역에 디스플레이된 일 예가 도시되어 있다. That is, FIG. 7 shows an example in which a cartoon image generated accordingly is displayed on one area of an output screen when medium-level outline strength values and abstraction strength values are selected. FIG. 8 shows an example in which outline strength values are lowered and abstraction strength An example in which a cartoon image generated by increasing the value is displayed on one area of the output screen is shown. In addition, FIG. 9 shows an example in which a cartoon image generated when an outline strength value is increased and an abstraction strength value is decreased is displayed on one area of an output screen.

즉, 본 발명의 일 실시예에 따른 제2 신경망 기반 스타일 전이 모델을 기반으로 작가가 원하는 스타일로 카툰 배경을 생성하도록 윤곽선 강도값과 추상화 강도값을 세밀하게 조절이 가능케 할 수 있는 이점이 있다. That is, based on the second neural network-based style transition model according to an embodiment of the present invention, there is an advantage in that the outline strength value and the abstraction strength value can be finely adjusted so as to create a cartoon background in a style desired by the artist.

원하는 스타일로 전이된 카툰 이미지를 저장하기 위해 도 14에 도시된 바와 같이 인터페이스상에서 이미지 저장 버튼을 눌러 최종적으로 카툰 배경 이미지를 저장할 수 있다. In order to save the cartoon image converted into a desired style, as shown in FIG. 14, the cartoon background image can be finally saved by pressing the image save button on the interface.

도 15는 본 발명의 일 실시예에 따른 카툰 배경 자동 생성 장치의 내부 구성을 개략적으로 도시한 블록도이고, 도 16은 본 발명의 일 실시예에 따른 원본 이미지를 이용하여 생성된 카툰 배경 이미지를 예시한 도면이다. 15 is a block diagram schematically showing the internal configuration of an apparatus for automatically generating a cartoon background according to an embodiment of the present invention, and FIG. 16 is a cartoon background image generated using an original image according to an embodiment of the present invention. It is an illustrated drawing.

도 15를 참조하면, 본 발명의 일 실시예에 따른 카툰 배경 자동 생성 장치(100)는 마스크 생성부(1510), 복원부(1515), 스타일 전이부(1520), 메모리(1525) 및 프로세서(1530)를 포함하여 구성된다. Referring to FIG. 15, the apparatus 100 for automatically generating a cartoon background according to an embodiment of the present invention includes a mask generating unit 1510, a restoration unit 1515, a style transition unit 1520, a memory 1525, and a processor ( 1530).

마스크 생성부(1510)는 원본 이미지에서 불필요한 객체 영역을 마스크 영역으로 지정한 후 해당 마스크 영역만 포함하는 마스크 이미지를 생성하기 위한 수단이다. The mask generator 1510 is a means for generating a mask image including only the mask area after designating an unnecessary object area in the original image as a mask area.

복원부(1515)는 제1 신경망 기반 복원 모델을 구비하며, 원본 이미지와 마스크 이미지를 제1 신경망 기반 복원 모델에 적용하여 원본 이미지에서 마스크 영역이 제거된 후 원본 이미지의 주변 픽셀을 이용하여 복원된 복원 이미지를 생성하기 위한 수단이다. The restoration unit 1515 includes a first neural network-based restoration model, applies the original image and the mask image to the first neural network-based restoration model, removes the mask region from the original image, and then restores the original image using neighboring pixels. It is a means for generating a restoration image.

제1 신경망 기반 복원 모델의 세부 구조 및 기능은 전술한 바와 동일하므로 중복되는 설명은 생략하기로 한다. Since the detailed structure and functions of the first neural network-based restoration model are the same as those described above, duplicate descriptions will be omitted.

스타일 전이부(1520)는 복원 이미지를 작가의 카툰 스타일로 전이시켜 카툰 배경 이미지를 생성하기 위한 수단이다. 스타일 전이부(1520)는 제2 신경망 기반 스타일 전이 모델을 포함한다. 스타일 전이부(1520)는 복원 이미지를 학습된 제2 신경망 기반 스타일 전이 모델에 적용하여 작가의 카툰 스타일 이미지(즉, 타겟 스타일 이미지)의 윤곽선 특징과 추상화 특징을 반영한 카툰 스타일로 전이시켜 카툰 배경 이미지를 생성할 수 있다. 이때, 제2 신경망 기반 스타일 전이 모델은 윤곽선 강도값과 추상화 강도값을 초매개변수로 입력받은 후 이를 더 반영하여 복원 이미지를 작가의 카툰 스타일 이미지의 윤곽선 특징과 추상화 특징이 반영된 카툰 배경 이미지를 생성할 수 있다. The style transition unit 1520 is a means for generating a cartoon background image by transitioning the restored image into the artist's cartoon style. The style transition unit 1520 includes a second neural network-based style transition model. The style transition unit 1520 applies the reconstructed image to the learned second neural network-based style transfer model and transforms it into a cartoon style that reflects the outline and abstraction characteristics of the author's cartoon style image (ie, the target style image), thereby converting the cartoon background image can create At this time, the second neural network-based style transfer model receives the outline intensity value and the abstraction intensity value as hyperparameters and further reflects them to generate a cartoon background image in which the cartoon style image outline and abstraction characteristics of the artist are reflected. can do.

제2 신경망 기반 스타일 전이 모델에 대해서는 전술한 바와 동일하므로 중복되는 설명은 생략하기로 한다. Since the second neural network-based style transition model is the same as described above, duplicate descriptions will be omitted.

메모리(1525)는 본 발명의 일 실시예에 따른 카툰 배경 자동 생성 방법을 수행하기 위한 명령어들을 저장하기 위한 수단이다.The memory 1525 is a means for storing instructions for performing the method of automatically generating a cartoon background according to an embodiment of the present invention.

프로세서(1530)는 본 발명의 일 실시예에 따른 카툰 배경 자동 생성 장치(100)의 내부 구성 요소들(예를 들어, 마스크 생성부(1510), 복원부(1515), 스타일 전이부(1520), 메모리(1525) 등)을 제어하기 위한 수단이다. The processor 1530 includes internal components (eg, a mask generator 1510, a restoration unit 1515, and a style transition unit 1520) of the automatic cartoon background generation device 100 according to an embodiment of the present invention. , memory 1525, etc.).

본 발명의 일 실시예에 따르면, 카툰 배경 자동 생성 장치는 제1 신경망 기반 복원 모델과 제2 신경망 기반 스타일 전이 모델을 포함한다. 여기서, 제1 신경망 기반 복원 모델과 제2 신경망 기반 스타일 전이 모델은 딥러닝 모델을 기반으로 하며, 이러한 딥러닝 모델은 파라미터 수가 많고 연산량이 많아서 계산 시간이 오래 걸리고 저장 공간을 많이 차지한다는 문제가 있다. According to an embodiment of the present invention, the apparatus for automatically generating a cartoon background includes a first neural network-based restoration model and a second neural network-based style transition model. Here, the first neural network-based restoration model and the second neural network-based style transition model are based on a deep learning model, and such a deep learning model has a large number of parameters and a large amount of computation, so it takes a long time to calculate and takes up a lot of storage space. .

더욱이 제1 신경망 기반 복원 모델과 제2 신경망 기반 스타일 전이 모델로 구성되기 때문에, 더욱 계산 시간이 오래 걸리고 저장 공간을 많이 차지하게 된다. 이에 따라, 본 발명의 다른 실시예에 따르면, 이를 경량화하여 노트북, 모바일 환경에서도 적용이 가능하도록 할 수 있다. Moreover, since it is composed of the first neural network-based reconstruction model and the second neural network-based style transition model, it takes a long time to calculate and occupies a large amount of storage space. Accordingly, according to another embodiment of the present invention, it can be applied in a notebook or mobile environment by reducing the weight.

모델 경량화(model compression)를 위해 Model distillation, channel pruning, quantization 등의 방법을 사용할 수 있다. Methods such as model distillation, channel pruning, and quantization can be used for model compression.

Model distillation은 미리 잘 학습된 큰 네트워크(teacher network)의 지식을 실제로 사용하고자 하는 작은 네트워크(student network)에게 전달하는 방법으로, 큰 네트워크는 제1 신경망 기반 복원 모델과 제2 신경망 기반 스타일 전이 모델을 포함하는 학습된 모델이며, 작은 네트워크는 카툰 배경 생성 프로그램에 적용할 모델일 수 있다. 즉, Model distillation은 큰 네트워크와 작은 네트워크의 분류 결과 차이를 손실 함수에 포함시켜 두 네트워크를 비슷하게 만들 수 있다. Model distillation is a method of transferring the knowledge of a large network (teacher network) that has been well trained in advance to a small network (student network) that actually wants to use it. The large network uses a first neural network-based restoration model and a second neural network-based style transfer model. It is a learned model that includes, and the small network can be a model to be applied to a cartoon background generating program. In other words, model distillation can make the two networks similar by including the difference between the classification result of the large network and the small network in the loss function.

딥러닝 모델에서 채널 수는 파라미터 수와 연산량에 직결되기 때문에 상당히 많은 자원이 낭비될 수 있다. 따라서 channel pruning은 결과에 영향을 거의 주지 않는 채널을 제거함으로써 모델을 경량화시킬 수도 있다. Since the number of channels in a deep learning model is directly related to the number of parameters and the amount of computation, a considerable amount of resources can be wasted. Therefore, channel pruning can also lighten the model by removing channels that have little effect on the results.

양자화(quantization)는 정확하고 세밀한 단위로 표현한 입력값을 보다 단순화한 단위의 값으로 변환하는 기술이다. 딥러닝 기법과 같은 인공지능 분야에서 연산량을 줄이면서 전력 효율성을 향상시키기 위해 많이 사용된다. 예를 들면, 실제 세계의 연속적인 컬러 스펙트럼을 불연속적인 컬러로 표현하는 것으로, 흑백 이미지는 픽셀당 1비트로 표현하고, 컬러 이미지는 픽셀 당 24비트로 표현하여 경량화시킬 수도 있다. Quantization is a technique of converting input values expressed in precise and detailed units into values in simpler units. It is widely used in artificial intelligence fields such as deep learning techniques to improve power efficiency while reducing the amount of computation. For example, the continuous color spectrum of the real world can be expressed in discontinuous colors, black and white images can be expressed with 1 bit per pixel, and color images can be expressed with 24 bits per pixel to reduce weight.

이와 같은 카툰 배경 자동 생성 장치(100)는 경량화 모델을 적용하여 연산 속도를 빠르게 하며 저장 공간을 더 작게 사용할 수도 있고, GPU가 없는 노트북이나 모바일 환경에도 적용이 가능하도록 할 수 있다. The apparatus 100 for automatically generating a cartoon background as described above can speed up computation by applying a lightweight model, use a smaller storage space, and can be applied to a laptop or mobile environment without a GPU.

도 16은 본 발명의 일 실시예에 따른 카툰 배경 자동 생성 장치(100)가 복원 이미지를 제2 신경망 기반 스타일 전이 모델에 적용하여 작가의 카툰 스타일 이미지의 윤곽선 특징과 추상화 특징을 반영함으로써 카툰 스타일로 전이하여 생성된 카툰 배경 이미지가 예시되어 있다. 16 is a cartoon style by the apparatus 100 for automatically generating a cartoon background according to an embodiment of the present invention by applying the reconstructed image to a second neural network-based style transition model to reflect the outline characteristics and abstraction characteristics of the artist's cartoon style image. A cartoon background image generated by transition is exemplified.

전술한 바와 같이, 사진 등에서 불필요한 객체 영역을 제거한 후 복원된 이미지를 작가의 카툰 스타일(윤곽선 특징과 추상화 특징)을 반영한 카툰 스타일로 전이시켜 카툰 배경 이미지를 자동으로 생성함으로써 작가의 작업 시간을 단축시켜줄 수 있는 이점이 있다. As described above, after removing unnecessary object areas from a photo, etc., the restored image is transformed into a cartoon style that reflects the artist's cartoon style (outline features and abstraction features) to automatically generate a cartoon background image, thereby reducing the artist's work time. There are advantages that can be

본 발명의 실시 예에 따른 장치 및 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야 통상의 기술자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. Devices and methods according to embodiments of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in computer readable media. Computer readable media may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on a computer readable medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art in the field of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - Includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media and ROM, RAM, flash memory, etc. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이제까지 본 발명에 대하여 그 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at mainly by its embodiments. Those skilled in the art to which the present invention pertains will be able to understand that the present invention can be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from a descriptive point of view rather than a limiting point of view. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the equivalent scope will be construed as being included in the present invention.

100: 카툰 배경 자동 생성 장치
1510: 마스크 생성부
1515: 복원부
1520: 스타일 전이부
1525: 메모리
1530: 프로세서100: automatic cartoon background generating device
1510: mask generating unit
1515: Restoration unit
1520: style transition unit
1525: memory
1530: processor

Claims

(a) generating a mask image after selecting an unnecessary area as a mask using an original image;
(b) generating a reconstructed image from which a mask region is removed from the original image by applying the original image and the mask image to a first neural network-based reconstruction model; and
(c) generating a cartoon background image by applying the reconstructed image to the learned second neural network-based style transition model and transitioning it into an author's cartoon style reflecting the outline and abstraction characteristics of the target style image;
The second neural network-based style transition model generates the cartoon background image by changing the outline strength and abstraction strength of the target style image by reflecting the outline strength value and the abstraction strength value set through the style transition strength UI as hyperparameters. but
The style transition intensity UI is provided as a two-dimensional graph UI having an outline intensity axis and an abstraction intensity axis,
Depending on the point selected on the 2D graph UI, the contour intensity value and the abstraction intensity value are changed.
The method of automatically generating a cartoon background according to claim 1 , wherein a cartoon background image converted into a cartoon style of the reconstructed image is displayed on an output screen in real time in consideration of the changed outline strength value and abstraction strength value.

According to claim 1,
The method of automatically generating a cartoon background, characterized in that the target style image is a cartoon image of a specific artist.

delete

According to claim 1,
The method of automatically generating a cartoon background, characterized in that the second neural network-based style transition model is learned using the following equation.

here,

Is a loss function for invariance between a cartoon background image and a target style image.

delete

A computer-readable recording medium on which program code for performing the method of any one of claims 1, 2, or 4 is recorded.

a mask generator for generating a mask image after selecting an unnecessary area as a mask using an original image;
a restoration unit generating a restoration image in which a mask region is removed from the original image by applying the original image and the mask image to a first neural network-based restoration model; and
A style transition unit for generating a cartoon image by applying the reconstructed image to a learned second neural network-based style transition model and transitioning the restored image into a cartoon style reflecting the outline characteristics and abstraction characteristics of the target style image,
The second neural network-based style transition model reflects the outline strength value and the abstraction strength value set through the style transition strength UI as hyperparameters to change the outline strength and abstraction strength of the target style image to generate a cartoon background image, ,
The style transition intensity UI is provided as a two-dimensional graph UI having an outline intensity axis and an abstraction intensity axis,
Depending on the point selected on the 2D graph UI, the contour intensity value and the abstraction intensity value are changed.
The apparatus for automatically generating a cartoon background according to claim 1 , wherein a cartoon background image obtained by converting the reconstructed image into a cartoon style in consideration of the changed outline strength value and abstraction strength value is displayed on an output screen in real time.

delete