KR20240027509A

KR20240027509A - Electronic device and method for generating image with enhanced depth perceived by viewer

Info

Publication number: KR20240027509A
Application number: KR1020220151991A
Authority: KR
Inventors: 신하늘; 송영찬; 강수민; 이태미
Original assignee: 삼성전자주식회사
Priority date: 2022-08-23
Filing date: 2022-11-14
Publication date: 2024-03-04

Abstract

일 실시 예에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 전자 장치는 하나 이상의 인스트럭션들을 저장하도록 구성된 메모리 및 하나 이상의 프로세서를 포함하며, 상기 하나 이상의 프로세서는 상기 하나 이상의 인스트럭션들을 실행함으로써, 입력 이미지를 수신하고, 출력 이미지에 포함되는 적어도 하나의 화상 깊이 단서의 레벨을 조절하기 위한 컨트롤 파라미터를 수신하고, 딥 뉴럴 네트워크를 이용하여 상기 입력 이미지로부터 상기 컨트롤 파라미터에 의해 조절된 레벨의 적어도 하나의 화상 깊이 단서를 포함하는 출력 이미지를 생성한다.According to one embodiment, an electronic device for generating an image with enhanced depth perceived by a viewer includes a memory configured to store one or more instructions and one or more processors, wherein the one or more processors execute the one or more instructions, Receive an input image, receive a control parameter for adjusting the level of at least one image depth cue included in the output image, and at least one of the levels adjusted by the control parameter from the input image using a deep neural network. Generates an output image containing image depth clues.

Description

Electronic device and method for generating image with enhanced depth perceived by viewer}

다양한 실시 예들은 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 전자 장치 및 방법에 관한 것이다.Various embodiments relate to an electronic device and method for generating an image with enhanced depth perceived by a viewer.

깊이 감각(depth perception)은 깊이를 인지할 수 있는 시각 능력을 의미한다. 깊이 감각은 다양한 깊이 단서(depth cues)에 의해 발생할 수 있다. 깊이 단서의 종류에는 단안 단서(monocular cues) 및 양안 단서(binocular cues)가 있다. 단안 단서는 한쪽 눈만을 이용하여 깊이를 인지할 수 있는 단서로서, 단안 단서로는 중첩(occlusion), 상대적 크기(relative size), 및 빛과 그림자(light and shadow) 등이 있다. 양안 단서는 양쪽 눈이 함께 작용하여 깊이를 인지할 수 있는 단서로서, 양안 단서로는 시선수렴(convergence) 및 양안 부등(binocular disparity) 등이 있다. 화상 깊이 단서(pictorial depth cues)는 2D 이미지에서 깊이를 인지할 수 있는 정보를 의미한다. 뷰어가 2D 이미지에서 깊이를 인지할 수 있는 모든 정보는 화상 깊이 단서가 될 수 있다.Depth perception refers to the visual ability to perceive depth. Depth sensation can be caused by a variety of depth cues. Types of depth cues include monocular cues and binocular cues. Monocular clues are clues that allow depth perception using only one eye. Monocular clues include occlusion, relative size, and light and shadow. Binocular cues are cues that allow both eyes to work together to perceive depth, and binocular cues include convergence and binocular disparity. Pictorial depth cues refer to information that allows depth perception in a 2D image. Any information that allows the viewer to perceive depth in a 2D image can be an image depth cue.

뷰어가 인지하는 깊이가 강화된 이미지를 생성하기 위해, 이미지의 깊이 맵(depth maps)을 이용하는 방법들이 있다. 단일 이미지로부터 생성된 깊이 맵은 정확도가 낮으며, 낮은 정확도의 깊이 맵으로 인해 뷰어에게 잘못된 깊이 정보가 제공될 수 있다.There are methods that use depth maps of images to create images with enhanced depth perceived by the viewer. Depth maps generated from a single image have low accuracy, and a low-accuracy depth map may provide incorrect depth information to the viewer.

일 실시 예에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 전자 장치는, 하나 이상의 인스트럭션들을 저장하도록 구성된 메모리; 및 하나 이상의 프로세서를 포함하며, 상기 하나 이상의 프로세서는 상기 하나 이상의 인스트럭션들을 실행함으로써, 입력 이미지를 수신하고, 출력 이미지에 포함되는 적어도 하나의 화상 깊이 단서의 레벨을 조절하기 위한 컨트롤 파라미터를 수신하고, 딥 뉴럴 네트워크를 이용하여 상기 입력 이미지로부터 상기 컨트롤 파라미터에 의해 조절된 레벨의 적어도 하나의 화상 깊이 단서를 포함하는 출력 이미지를 생성하고, 상기 딥 뉴럴 네트워크는, 상기 입력 이미지를 수신하고, 상기 입력 이미지에 대한 특징 벡터를 출력하도록 구성된 제1 뉴럴 네트워크, 상기 입력 이미지를 수신하고, 상기 입력 이미지의 공간 해상도를 감소시킴으로써 특징 맵을 출력하도록 구성된 제2 뉴럴 네트워크, 상기 제2 뉴럴 네트워크에서 출력된 특징 맵을 수신하고, 상기 제2 뉴럴 네트워크에서 출력된 특징 맵의 공간 해상도를 증가시킴으로써 상기 출력 이미지를 출력하도록 구성된 제3 뉴럴 네트워크, 및 상기 컨트롤 파라미터에 기초하여 특징 맵을 변환하도록 구성된 적어도 하나의 컨트롤 가능한 변환 모듈을 포함하고, 상기 제2 뉴럴 네트워크는, 상기 제2 뉴럴 네트워크의 히든 레이어의 특징 맵과 상기 제1 뉴럴 네트워크에서 출력된 특징 벡터가 결합되도록 구성되고, 상기 적어도 하나의 컨트롤 가능한 변환 모듈은, 상기 제1 뉴럴 네트워크의 히든 레이어의 특징 맵을 수신하고, 상기 컨트롤 파라미터에 기초하여 상기 수신한 제1 뉴럴 네트워크의 히든 레이어의 특징 맵을 변환하여 출력하도록 구성되고, 상기 제3 뉴럴 네트워크는, 상기 제3 뉴럴 네트워크의 히든 레이어의 특징 맵과 상기 변환된 제1 뉴럴 네트워크의 히든 레이어의 특징 맵이 결합되도록 구성될 수 있다.According to one embodiment, an electronic device for generating an image with enhanced depth perceived by a viewer includes: a memory configured to store one or more instructions; and one or more processors, wherein the one or more processors receive an input image by executing the one or more instructions, and receive a control parameter for adjusting the level of at least one image depth cue included in the output image, Generate an output image including at least one image depth cue at a level adjusted by the control parameter from the input image using a deep neural network, wherein the deep neural network receives the input image, and receives the input image. a first neural network configured to output a feature vector for, a second neural network configured to receive the input image and output a feature map by reducing the spatial resolution of the input image, the feature map output from the second neural network a third neural network configured to receive and output the output image by increasing the spatial resolution of the feature map output from the second neural network, and at least one controllable device configured to transform the feature map based on the control parameter. and a transformation module, wherein the second neural network is configured to combine a feature map of a hidden layer of the second neural network and a feature vector output from the first neural network, and the at least one controllable transformation module is , configured to receive a feature map of a hidden layer of the first neural network, and convert and output the received feature map of a hidden layer of the first neural network based on the control parameters, and the third neural network is configured to: The feature map of the hidden layer of the third neural network may be configured to be combined with the feature map of the hidden layer of the converted first neural network.

일 실시 예에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 방법은, 입력 이미지를 수신하는 단계; 출력 이미지에 포함되는 적어도 하나의 화상 깊이 단서의 레벨을 조절하기 위한 컨트롤 파라미터를 수신하는 단계; 및 딥 뉴럴 네트워크를 이용하여 상기 입력 이미지로부터 상기 컨트롤 파라미터에 의해 조절된 레벨의 적어도 하나의 화상 깊이 단서를 포함하는 출력 이미지를 생성하는 단계를 포함하고, 상기 딥 뉴럴 네트워크는, 상기 입력 이미지를 수신하고, 상기 입력 이미지에 대한 특징 벡터를 출력하도록 구성된 제1 뉴럴 네트워크, 상기 입력 이미지를 수신하고, 상기 입력 이미지의 공간 해상도를 감소시킴으로써 특징 맵을 출력하도록 구성된 제2 뉴럴 네트워크, 상기 제2 뉴럴 네트워크에서 출력된 특징 맵을 수신하고, 상기 제2 뉴럴 네트워크에서 출력된 특징 맵의 공간 해상도를 증가시킴으로써 상기 출력 이미지를 출력하도록 구성된 제3 뉴럴 네트워크, 및 상기 컨트롤 파라미터에 기초하여 특징 맵을 변환하도록 구성된 적어도 하나의 컨트롤 가능한 변환 모듈을 포함하고, 상기 제2 뉴럴 네트워크는, 상기 제2 뉴럴 네트워크의 히든 레이어의 특징 맵과 상기 제1 뉴럴 네트워크에서 출력된 특징 벡터가 결합되도록 구성되고, 상기 적어도 하나의 컨트롤 가능한 변환 모듈은, 상기 제1 뉴럴 네트워크의 히든 레이어의 특징 맵을 수신하고, 상기 컨트롤 파라미터에 기초하여 상기 수신한 제1 뉴럴 네트워크의 히든 레이어의 특징 맵을 변환하여 출력하도록 구성되고, 상기 제3 뉴럴 네트워크는, 상기 제3 뉴럴 네트워크의 히든 레이어의 특징 맵과 상기 변환된 제1 뉴럴 네트워크의 히든 레이어의 특징 맵이 결합되도록 구성될 수 있다.A method for generating an image with enhanced depth perceived by a viewer according to an embodiment includes receiving an input image; Receiving control parameters for adjusting the level of at least one image depth cue included in the output image; and generating an output image including at least one image depth cue at a level adjusted by the control parameter from the input image using a deep neural network, wherein the deep neural network receives the input image. a first neural network configured to output a feature vector for the input image, a second neural network configured to receive the input image and output a feature map by reducing the spatial resolution of the input image, the second neural network A third neural network configured to receive a feature map output from and output the output image by increasing the spatial resolution of the feature map output from the second neural network, and configured to transform the feature map based on the control parameter. Comprising at least one controllable transformation module, wherein the second neural network is configured to combine a feature map of a hidden layer of the second neural network and a feature vector output from the first neural network, and the at least one The controllable transformation module is configured to receive a feature map of a hidden layer of the first neural network, convert and output the received feature map of a hidden layer of the first neural network based on the control parameters, and 3 The neural network may be configured to combine the feature map of the hidden layer of the third neural network and the feature map of the converted hidden layer of the first neural network.

도 1은 일 실시 예에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 설명하기 위한 도면이다.
도 2는 일 실시 예에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 전자 장치를 나타낸 도면이다.
도 3은 일 실시 예에 따른 딥 뉴럴 네트워크를 나타낸 도면이다.
도 4는 일 실시 예에 따른 제1 뉴럴 네트워크의 학습을 설명하기 위한 도면이다.
도 5는 일 실시 예에 따른 딥 뉴럴 네트워크의 학습을 설명하기 위한 도면이다.
도 6은 일 실시 예에 따른 제1 뉴럴 네트워크, 제2 뉴럴 네트워크, 제3 뉴럴 네트워크, 및 컨트롤 가능한 변환 모듈을 나타낸 도면이다.
도 7은 일 실시 예에 따른 인코더 모듈을 나타낸 도면이다.
도 8은 일 실시 예에 따른 컨트롤 가능한 변환 모듈을 나타낸 도면이다.
도 9는 일 실시 예에 따른 요소 별 연산 모듈을 나타낸 도면이다.
도 10은 일 실시 예에 따른 디코더 모듈을 나타낸 도면이다.
도 11은 일 실시 예에 따른 컨트롤 가능한 변환 모듈, 인코더 모듈, 및 디코더 모듈을 나타낸 도면이다.
도 12는 일 실시 예에 따른 입력 이미지에서 출력 이미지까지 특징 맵들의 흐름을 나타낸 도면이다.
도 13은 일 실시 예에 따른 컨트롤 파라미터에 의해 조절된 양의 화상 깊이 단서를 포함하는 출력 이미지들을 나타낸 도면이다.
도 14는 일 실시 예에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 방법을 나타낸 도면이다.
도 15는 일 실시 예에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 방법을 나타낸 도면이다.FIG. 1 is a diagram illustrating an image with enhanced depth recognized by a viewer according to an embodiment.
FIG. 2 is a diagram illustrating an electronic device for generating an image with enhanced depth perceived by a viewer, according to an embodiment.
Figure 3 is a diagram showing a deep neural network according to one embodiment.
Figure 4 is a diagram for explaining learning of a first neural network according to an embodiment.
Figure 5 is a diagram for explaining learning of a deep neural network according to an embodiment.
FIG. 6 is a diagram illustrating a first neural network, a second neural network, a third neural network, and a controllable transformation module according to an embodiment.
Figure 7 is a diagram showing an encoder module according to an embodiment.
Figure 8 is a diagram showing a controllable conversion module according to an embodiment.
Figure 9 is a diagram showing an operation module for each element according to an embodiment.
Figure 10 is a diagram showing a decoder module according to an embodiment.
Figure 11 is a diagram showing a controllable conversion module, encoder module, and decoder module according to an embodiment.
Figure 12 is a diagram showing the flow of feature maps from an input image to an output image according to an embodiment.
FIG. 13 is a diagram illustrating output images including positive image depth clues adjusted by control parameters according to an embodiment.
Figure 14 is a diagram illustrating a method for generating an image with enhanced depth perceived by a viewer, according to an embodiment.
Figure 15 is a diagram illustrating a method for generating an image with enhanced depth perceived by a viewer, according to an embodiment.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the present invention are general terms that are currently widely used as much as possible while considering the function in the present invention, but this may vary depending on the intention or precedent of a person working in the art, the emergence of new technology, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the relevant invention. Therefore, the terms used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, rather than simply the name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When it is said that a part "includes" a certain element throughout the specification, this means that, unless specifically stated to the contrary, it does not exclude other elements but may further include other elements. Additionally, terms such as “module” used in the specification refer to a unit that processes at least one function or operation, which may be implemented as hardware or software, or as a combination of hardware and software.

명세서 전체에서 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 마찬가지로 명세서 전체에서 복수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 단수의 표현을 포함한다.Throughout the specification, singular expressions include plural expressions, unless the context clearly dictates otherwise. Likewise, throughout the specification, plural expressions include singular expressions, unless the context clearly dictates otherwise.

본 개시에 따른 인공지능과 관련된 기능은 프로세서와 메모리를 통해 동작된다. 프로세서는 하나 또는 복수의 프로세서로 구성될 수 있다. 이때, 하나 또는 복수의 프로세서는 CPU, AP, DSP(Digital Signal Processor) 등과 같은 범용 프로세서, GPU, VPU(Vision Processing Unit)와 같은 그래픽 전용 프로세서 또는 NPU와 같은 인공지능 전용 프로세서일 수 있다. 하나 또는 복수의 프로세서는, 메모리에 저장된 기 정의된 동작 규칙 또는 인공지능 모델에 따라, 입력 데이터를 처리하도록 제어한다. 또는, 하나 또는 복수의 프로세서가 인공지능 전용 프로세서인 경우, 인공지능 전용 프로세서는, 특정 인공지능 모델의 처리에 특화된 하드웨어 구조로 설계될 수 있다. Functions related to artificial intelligence according to the present disclosure are operated through a processor and memory. The processor may consist of one or multiple processors. At this time, one or more processors may be a general-purpose processor such as a CPU, AP, or DSP (Digital Signal Processor), a graphics-specific processor such as a GPU or VPU (Vision Processing Unit), or an artificial intelligence-specific processor such as an NPU. One or more processors control input data to be processed according to predefined operation rules or artificial intelligence models stored in memory. Alternatively, when one or more processors are dedicated artificial intelligence processors, the artificial intelligence dedicated processors may be designed with a hardware structure specialized for processing a specific artificial intelligence model.

기 정의된 동작 규칙 또는 인공지능 모델은 학습을 통해 만들어진 것을 특징으로 한다. 여기서, 학습을 통해 만들어진다는 것은, 기본 인공지능 모델이 학습 알고리즘에 의하여 다수의 학습 데이터들을 이용하여 학습됨으로써, 원하는 특성(또는, 목적)을 수행하도록 설정된 기 정의된 동작 규칙 또는 인공지능 모델이 만들어짐을 의미한다. 이러한 학습은 본 개시에 따른 인공지능이 수행되는 기기 자체에서 이루어질 수도 있고, 별도의 서버 및/또는 시스템을 통해 이루어 질 수도 있다. 학습 알고리즘의 예로는, 지도형 학습(supervised learning), 비지도형 학습(unsupervised learning), 준지도형 학습(semi-supervised learning), 또는 강화 학습(reinforcement learning)이 있으나, 전술한 예에 한정되지 않는다.Predefined operation rules or artificial intelligence models are characterized by being created through learning. Here, being created through learning means that the basic artificial intelligence model is learned using a large number of learning data by a learning algorithm, thereby creating a predefined operation rule or artificial intelligence model set to perform the desired characteristics (or purpose). It means burden. This learning may be performed on the device itself that performs the artificial intelligence according to the present disclosure, or may be performed through a separate server and/or system. Examples of learning algorithms include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but are not limited to the examples described above.

인공지능 모델은, 복수의 뉴럴 네트워크 레이어들로 구성될 수 있다. 복수의 뉴럴 네트워크 레이어들 각각은 복수의 가중치들(weight values)을 갖고 있으며, 이전(previous) 레이어의 연산 결과와 복수의 가중치들 간의 연산을 통해 뉴럴 네트워크 연산을 수행한다. 복수의 뉴럴 네트워크 레이어들이 갖고 있는 복수의 가중치들은 인공지능 모델의 학습 결과에 의해 최적화될 수 있다. 예를 들어, 학습 과정 동안 인공지능 모델에서 획득한 로스(loss) 값 또는 코스트(cost) 값이 감소 또는 최소화되도록 복수의 가중치들이 갱신될 수 있다. 인공 뉴럴 네트워크는 심층 뉴럴 네트워크(DNN:Deep Neural Network)를 포함할 수 있으며, 예를 들어, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network), 또는 심층 Q-네트워크 (Deep Q-Networks) 등이 있으나, 전술한 예에 한정되지 않는다. An artificial intelligence model may be composed of multiple neural network layers. Each of the plurality of neural network layers has a plurality of weight values, and neural network calculation is performed through calculation between the calculation result of the previous layer and the plurality of weights. The multiple weights of multiple neural network layers can be optimized by the learning results of the artificial intelligence model. For example, a plurality of weights may be updated so that loss or cost values obtained from the artificial intelligence model are reduced or minimized during the learning process. Artificial neural networks may include deep neural networks (DNN), for example, Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), and Restricted Boltzmann Machine (RBM). ), DBN (Deep Belief Network), BRDNN (Bidirectional Recurrent Deep Neural Network), or Deep Q-Networks, etc., but are not limited to the examples described above.

도 1은 일 실시 예에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 설명하기 위한 도면이다.FIG. 1 is a diagram illustrating an image with enhanced depth recognized by a viewer according to an embodiment.

화상 깊이 단서(pictorial depth cues)는 뷰어가 3차원의 공간 관계를 추론할 수 있는 2차원의 시각적 표현(visual representations)의 정보(any information)로 정의될 수 있다. 화상 깊이 단서는 흐림(blur), 대비(contrast), 또는 선명도(sharpness) 등을 포함할 수 있다. 또한, 화상 깊이 단서는 선형 원근감(linear perspective), 공중 원근감(aerial perspective), 텍스쳐 그레디언트(texture gradient), 익숙한 크기(familiar size), 또는 빛과 그림자(light and shadow) 등을 포함할 수 있다. 그 외에도 다양한 2차원의 시각적 표현의 정보가 화상 깊이 단서가 될 수 있다.Pictorial depth cues can be defined as any information in two-dimensional visual representations from which a viewer can infer three-dimensional spatial relationships. Image depth clues may include blur, contrast, or sharpness. Additionally, image depth clues may include linear perspective, aerial perspective, texture gradient, familiar size, or light and shadow. In addition, information from various two-dimensional visual expressions can serve as clues to image depth.

뷰어에게 3차원의 공간 관계를 추론할 수 있는 2차원의 시각적 표현의 정보가 제공됨에 따라, 뷰어가 이미지에서 깊이를 인식할 수 있다. 뷰어에게 3차원의 공간 관계를 추론할 수 있는 2차원의 시각적 표현에 관한 더 높은 레벨의 정보를 포함하는 이미지가 제공되면, 뷰어가 이미지에서 인식하는 깊이가 강화될 수 있다. 즉, 뷰어에게 더 높은 레벨의 화상 깊이 단서를 포함하는 이미지가 제공되면, 뷰어가 이미지에서 인식하는 깊이가 강화될 수 있다.As the viewer is provided with two-dimensional visual representation information from which three-dimensional spatial relationships can be inferred, the viewer can perceive depth in the image. When a viewer is presented with an image that contains higher level information about a two-dimensional visual representation from which three-dimensional spatial relationships can be inferred, the depth that the viewer perceives in the image can be enhanced. That is, if the viewer is presented with an image that includes higher level image depth cues, the depth the viewer perceives in the image may be enhanced.

예를 들어, 더 높은 레벨의 화상 깊이 단서는 더 높은 레벨의 흐림, 더 높은 레벨의 대비, 또는 더 높은 레벨의 선명도에 의해 구현될 수 있다. 가우시안 흐림(gaussian blur)의 경우, 더 높은 레벨의 흐림은 표준 편차를 증가시킴으로써 구현될 수 있다. 또는, 더 높은 레벨의 흐림은 블러 메트릭(bluer metric)을 증가시킴으로써 구현될 수 있다. 블러 메트릭은 깊이 맵에서 에지를 검출하고, 에지의 시작과 끝 사이의 폭을 측정함으로써 산출된 흐림의 레벨이다. 더 높은 레벨의 대비는 명암비(contrast ratio)를 증가시킴으로써 구현될 수 있다. 더 높은 레벨의 선명도는 선명도 정도(sharpness degree)를 증가시킴으로써 구현될 수 있다. 선명도 정도는 수학식 1로 표현될 수 있다.For example, higher levels of image depth cues may be implemented by higher levels of blur, higher levels of contrast, or higher levels of sharpness. For Gaussian blur, higher levels of blur can be implemented by increasing the standard deviation. Alternatively, higher levels of blur can be implemented by increasing the blur metric. The blur metric is the level of blur calculated by detecting an edge in the depth map and measuring the width between the start and end of the edge. Higher levels of contrast can be achieved by increasing the contrast ratio. Higher levels of sharpness can be achieved by increasing the sharpness degree. The degree of clarity can be expressed by Equation 1.

수학식 1에서, SD는 선명도 정도를, P(x,y)는 (x,y)에서의 픽셀 값을 나타낸다.In Equation 1, SD represents the degree of sharpness and P(x,y) represents the pixel value at (x,y).

도 1의 상측 이미지들(111, 112)은 대비(contrast)를 화상 깊이 단서로 포함하는 이미지들이다. 우측 이미지(112)는 좌측 이미지(111)보다 더 높은 레벨의 화상 깊이 단서를 포함한다. 즉, 우측 이미지(112)는 좌측 이미지(111)보다 높은 대비를 갖는다. 우측 이미지(112)가 좌측 이미지(111)보다 더 높은 레벨의 화상 깊이 단서를 포함함에 따라, 뷰어는 좌측 이미지(111)보다 우측 이미지(112)에서 깊이를 더 확연하게 인식할 수 있다.The upper images 111 and 112 in FIG. 1 are images that include contrast as an image depth cue. The right image 112 contains a higher level of image depth clues than the left image 111. That is, the right image 112 has higher contrast than the left image 111. As the right image 112 contains a higher level of image depth clues than the left image 111, the viewer may perceive depth more clearly in the right image 112 than in the left image 111.

도 1의 하측 이미지들(121, 122)은 흐림(blur)을 화상 깊이 단서로 포함하는 이미지들이다. 우측 이미지(122)는 좌측 이미지(121)보다 더 높은 레벨의 화상 깊이 단서를 포함한다. 즉, 우측 이미지(122)는 좌측 이미지(121)보다 배경부분이 더 흐리다. 우측 이미지(122)가 좌측 이미지(121)보다 더 높은 레벨의 화상 깊이 단서를 포함함에 따라, 뷰어는 좌측 이미지(121)보다 우측 이미지(122)에서 깊이를 더 확연하게 인식할 수 있다.The lower images 121 and 122 in FIG. 1 are images that include blur as an image depth cue. Right image 122 contains higher level image depth clues than left image 121. That is, the background of the right image 122 is blurrier than the left image 121. As the right image 122 contains a higher level of image depth clues than the left image 121, the viewer may perceive depth more clearly in the right image 122 than in the left image 121.

도 2는 일 실시 예에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 전자 장치(200)를 나타낸 도면이다.FIG. 2 is a diagram illustrating an electronic device 200 for generating an image with enhanced depth perceived by a viewer, according to an embodiment.

일 실시 예에 따른 전자 장치(200)는 적어도 하나의 프로세서(210) 및 적어도 하나의 메모리(220)를 포함한다.The electronic device 200 according to one embodiment includes at least one processor 210 and at least one memory 220.

전자 장치(200)는 뷰어가 인식하는 깊이가 강화된 이미지를 생성하도록 구성될 수 있다. 메모리(220)는 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 하나 이상의 인스트럭션들을 저장하도록 구성되고, 프로세서(210)는 하나 이상의 인스트럭션들을 실행함으로써 뷰어가 인식하는 깊이가 강화된 이미지를 생성하도록 구성될 수 있다.The electronic device 200 may be configured to generate an image with enhanced depth perceived by the viewer. The memory 220 is configured to store one or more instructions for generating an image with enhanced depth perceived by the viewer, and the processor 210 executes one or more instructions to generate an image with enhanced depth perceived by the viewer. It can be configured.

일 실시 예에서 프로세서(210)는 입력 이미지를 수신하고, 출력 이미지에 포함되는 적어도 하나의 화상 깊이 단서의 레벨을 조절하기 위한 컨트롤 파라미터를 수신하고, 딥 뉴럴 네트워크를 이용하여 입력 이미지로부터 컨트롤 파라미터에 의해 조절된 레벨의 적어도 하나의 화상 깊이 단서를 포함하는 출력 이미지를 생성하도록 구성될 수 있다.In one embodiment, the processor 210 receives an input image, receives control parameters for adjusting the level of at least one image depth cue included in the output image, and determines the control parameter from the input image using a deep neural network. and generate an output image that includes at least one image depth cue at a level adjusted by

일 실시 예에서 프로세서(210)는 입력 이미지를 수신하고, 딥 뉴럴 네트워크를 이용하여 입력 이미지로부터 화상 깊이 단서를 포함하는 출력 이미지를 생성하도록 구성될 수 있다.In one embodiment, processor 210 may be configured to receive an input image and generate an output image including image depth clues from the input image using a deep neural network.

프로세서(210)는 딥 뉴럴 네트워크를 이용하여 출력 이미지에 포함된 화상 깊이 단서의 레벨을 조절하도록 구성됨에 따라, 출력 이미지의 생성에 깊이 맵을 이용하지 않을 수 있다. 그에 따라, 깊이 맵의 처리 과정에서 발생하는 오차로 인해 출력 이미지에 잘못된 깊이 정보가 부가되는 것이 방지될 수 있다. 또한, 컨트롤 파라미터에 의해 출력 이미지에 포함된 화상 깊이 단서의 레벨이 조절될 수 있으므로, 뷰어에게 원하는 레벨의 화상 깊이 단서를 갖는 출력 이미지가 제공될 수 있다.Since the processor 210 is configured to adjust the level of image depth clues included in the output image using a deep neural network, the depth map may not be used to generate the output image. Accordingly, incorrect depth information can be prevented from being added to the output image due to errors occurring during depth map processing. Additionally, since the level of image depth clues included in the output image can be adjusted by the control parameter, an output image having a desired level of image depth clues can be provided to the viewer.

입력 이미지 또는 컨트롤 파라미터는 메모리(220)로부터 불려오거나 사용자에 의해 입력될 수 있다. 전자 장치(200)는 사용자로부터 입력 이미지 또는 컨트롤 파라미터를 수신하기 위한 사용자 인터페이스(미도시)를 더 포함할 수 있다.Input images or control parameters may be retrieved from memory 220 or entered by the user. The electronic device 200 may further include a user interface (not shown) for receiving an input image or control parameter from the user.

이하에서는 도 2에 도시된 전자 장치(200), 프로세서(210), 및 메모리(220)를 참조하여 뷰어가 인식하는 깊이가 강화된 이미지를 생성하는 방법의 실시 예들을 설명한다.Hereinafter, embodiments of a method for generating an image with enhanced depth perceived by a viewer will be described with reference to the electronic device 200, processor 210, and memory 220 shown in FIG. 2.

도 3은 일 실시 예에 따른 딥 뉴럴 네트워크(300)를 나타낸 도면이다.Figure 3 is a diagram showing a deep neural network 300 according to an embodiment.

프로세서(210)는 딥 뉴럴 네트워크(300)를 이용하여 입력 이미지(A)로부터 컨트롤 파라미터(σ)에 의해 조절된 레벨의 적어도 하나의 화상 깊이 단서를 포함하는 출력 이미지(A')를 생성할 수 있다. 예를 들어, 프로세서(210)는 흐림의 레벨이 조절된 출력 이미지(A')를 생성할 수 있다. 예를 들어, 프로세서(210)는 선명도의 레벨, 대비의 레벨, 및 흐림의 레벨이 조절된 출력 이미지(A')를 생성할 수 있다.The processor 210 may use the deep neural network 300 to generate an output image A' including at least one image depth cue at a level adjusted by the control parameter σ from the input image A. there is. For example, the processor 210 may generate an output image A' with the level of blur adjusted. For example, the processor 210 may generate an output image A' in which the level of sharpness, level of contrast, and level of blur are adjusted.

딥 뉴럴 네트워크(300)는 제1 뉴럴 네트워크(310), 제2 뉴럴 네트워크(320), 제3 뉴럴 네트워크(330), 및 적어도 하나의 컨트롤 가능한 변환 모듈(340, controllable conversion module)을 포함할 수 있다.The deep neural network 300 may include a first neural network 310, a second neural network 320, a third neural network 330, and at least one controllable conversion module (340). there is.

제1 뉴럴 네트워크(310)는 입력 이미지(A)를 수신하고, 입력 이미지(A)에 대한 특징 벡터를 출력할 수 있다. 즉, 제1 뉴럴 네트워크(310)의 입력은 입력 이미지(A)이고 제1 뉴럴 네트워크(310)의 출력은 입력 이미지(A)에 대한 특징 벡터일 수 있다.The first neural network 310 may receive an input image (A) and output a feature vector for the input image (A). That is, the input of the first neural network 310 may be an input image (A), and the output of the first neural network 310 may be a feature vector for the input image (A).

제1 뉴럴 네트워크(310)는 컨볼루셔널 뉴럴 네트워크(convolutional neural network)일 수 있다. 제1 뉴럴 네트워크(310)는 컨볼류셔널 레이어(convolutional layers), 풀링 레이어(pooling layers), 및 풀리 커넥티드 레이어(fully connected layers)를 포함할 수 있다.The first neural network 310 may be a convolutional neural network. The first neural network 310 may include convolutional layers, pooling layers, and fully connected layers.

제1 뉴럴 네트워크(310)는 컨볼루셔널 뉴럴 네트워크의 모델들에 기초할 수 있다. 예를 들어, 제1 뉴럴 네트워크(310)는 VGG(Visual Geometry Group), GoogLeNet, PReLU-Net(Parametric ReLU-Net), 또는 ResNet(Residual Neural Network) 등에 기초한 컨볼루셔널 뉴럴 네트워크일 수 있다.The first neural network 310 may be based on models of convolutional neural networks. For example, the first neural network 310 may be a convolutional neural network based on Visual Geometry Group (VGG), GoogLeNet, Parametric ReLU-Net (PReLU-Net), or Residual Neural Network (ResNet).

제2 뉴럴 네트워크(320)는 입력 이미지(A)를 수신하고, 입력 이미지(A)의 공간 해상도(spatial resolution)를 감소시킴으로써 특징 맵(feature map)을 출력할 수 있다. 즉, 제2 뉴럴 네트워크(320)의 입력은 입력 이미지(A)이고 출력은 입력 이미지(A)에 대한 특징 맵일 수 있다.The second neural network 320 may receive the input image A and output a feature map by reducing the spatial resolution of the input image A. That is, the input of the second neural network 320 may be an input image (A), and the output may be a feature map for the input image (A).

제2 뉴럴 네트워크(320)는 제1 뉴럴 네트워크(310)에서 출력된 특징 벡터를 수신할 수 있다. 제2 뉴럴 네트워크(320)의 히든 레이어들 중, 적어도 어느 하나의 히든 레이어의 특징 맵과 제1 뉴럴 네트워크(310)로부터 수신한 특징 벡터가 결합될 수 있다.The second neural network 320 may receive the feature vector output from the first neural network 310. Among the hidden layers of the second neural network 320, the feature map of at least one hidden layer and the feature vector received from the first neural network 310 may be combined.

컨트롤 가능한 변환 모듈(340)은 제1 뉴럴 네트워크(310)의 히든 레이어의 특징 맵 및 컨트롤 파라미터(σ)를 수신할 수 있다. 컨트롤 가능한 변환 모듈(340)은 컨트롤 파라미터(σ)에 기초하여 제1 뉴럴 네트워크(310)로부터 수신한 특징 맵을 변환(convert)하여 출력할 수 있다. 즉, 컨트롤 가능한 변환 모듈(340)의 입력은 제1 뉴럴 네트워크(310)의 히든 레이어의 특징 맵이고 출력은 변환된 제1 뉴럴 네트워크(310)의 히든 레이어의 특징 맵일 수 있다.The controllable transformation module 340 may receive the feature map and control parameter (σ) of the hidden layer of the first neural network 310. The controllable conversion module 340 may convert and output the feature map received from the first neural network 310 based on the control parameter σ. That is, the input of the controllable transformation module 340 may be a feature map of the hidden layer of the first neural network 310, and the output may be a feature map of the converted hidden layer of the first neural network 310.

제3 뉴럴 네트워크(330)는 제2 뉴럴 네트워크(320)에서 출력된 특징 맵을 수신하고, 제2 뉴럴 네트워크(320)에서 출력된 특징 맵의 공간 해상도를 증가시킴으로써 출력 이미지(A')를 출력할 수 있다. 즉, 제3 뉴럴 네트워크(330)의 입력은 제2 뉴럴 네트워크(320)의 출력이고 제3 뉴럴 네트워크(330)의 출력은 출력 이미지(A')일 수 있다.The third neural network 330 receives the feature map output from the second neural network 320 and outputs an output image (A') by increasing the spatial resolution of the feature map output from the second neural network 320. can do. That is, the input of the third neural network 330 may be the output of the second neural network 320, and the output of the third neural network 330 may be the output image (A').

제3 뉴럴 네트워크(330)는 컨트롤 가능한 변환 모듈(340)로부터 변환된 제1 뉴럴 네트워크의 히든 레이어의 특징 맵을 수신할 수 있다. 제3 뉴럴 네트워크(330)의 히든 레이어들 중, 적어도 어느 하나의 히든 레이어의 특징 맵과 수신한 특징 맵이 결합될 수 있다.The third neural network 330 may receive the feature map of the hidden layer of the first neural network converted from the controllable transformation module 340. Among the hidden layers of the third neural network 330, the feature map of at least one hidden layer and the received feature map may be combined.

도 4는 일 실시 예에 따른 제1 뉴럴 네트워크(410)의 학습을 설명하기 위한 도면이다.FIG. 4 is a diagram for explaining learning of the first neural network 410 according to an embodiment.

제1 뉴럴 네트워크(410)의 학습에는 비지도 학습(unsupervised learning)이 이용될 수 있다. 제1 뉴럴 네트워크(410)는 동일한(또는 상대적으로 비슷한) 레벨의 화상 깊이 단서를 포함하는 학습 이미지들과 다른(또는 상대적으로 비유사한) 레벨의 화상 깊이 단서를 포함하는 학습 이미지들을 분리하도록 학습될 수 있다.Unsupervised learning may be used to learn the first neural network 410. The first neural network 410 will be trained to separate training images containing image depth cues of the same (or relatively similar) level from training images containing image depth cues of a different (or relatively dissimilar) level. You can.

일 실시 예에서 제1 뉴럴 네트워크(410)의 학습에는 제1 학습 이미지(T1), 제2 학습 이미지(T2), 및 제3 학습 이미지(T3)가 이용될 수 있다.In one embodiment, the first learning image T1, the second learning image T2, and the third learning image T3 may be used to train the first neural network 410.

제2 학습 이미지(T2)는 제1 학습 이미지(T1)와 동일한 이미지이고, 제3 학습 이미지(T3)는 제1 학습 이미지(T1)에 화상 깊이 단서가 부가된 이미지일 수 있다. 이 경우, 제1 뉴럴 네트워크(410)는 제1 학습 이미지(T1)에 대한 제1 특징 벡터(FV1)와 제2 학습 이미지(T2)에 대한 제2 특징 벡터(FV2)의 차이의 크기가 제1 학습 이미지(T1)에 대한 제1 특징 벡터(FV1)와 제3 학습 이미지(T3)에 대한 제3 특징 벡터(FV3)의 차이의 크기보다 작도록 학습될 수 있다.The second learning image T2 may be the same image as the first learning image T1, and the third learning image T3 may be an image to which an image depth clue is added to the first learning image T1. In this case, the first neural network 410 determines the size of the difference between the first feature vector (FV1) for the first training image (T1) and the second feature vector (FV2) for the second training image (T2). 1 It can be learned to be smaller than the size of the difference between the first feature vector (FV1) for the learning image (T1) and the third feature vector (FV3) for the third learning image (T3).

제2 학습 이미지(T2)는 제1 학습 이미지(T1)와 화상 깊이 단서의 레벨 차이가 상대적으로 작은 이미지이고, 제3 학습 이미지(T3)는 제1 학습 이미지(T1)와 화상 깊이 단서의 레벨 차이가 상대적으로 큰 이미지일 수 있다. 이 경우, 제1 뉴럴 네트워크(410)는 제1 특징 벡터(FV1)와 제2 특징 벡터(FV2)의 차이의 크기가 제1 특징 벡터(FV1)와 제3 특징 벡터(FV3)의 차이의 크기보다 작도록 학습될 수 있다. 벡터들의 크기의 계산에는 L2-norm이 사용될 수 있으나, 이에 제한되지 않는다.The second learning image (T2) is an image in which the level difference between the first learning image (T1) and the image depth clue is relatively small, and the third learning image (T3) is an image in which the level difference between the first learning image (T1) and the image depth clue is relatively small. This may be an image in which the difference is relatively large. In this case, the first neural network 410 determines that the size of the difference between the first feature vector (FV1) and the second feature vector (FV2) is the size of the difference between the first feature vector (FV1) and the third feature vector (FV3). It can be learned to be smaller. L2-norm may be used to calculate the size of vectors, but is not limited thereto.

예를 들어, 제2 학습 이미지(T2)는 제1 학습 이미지(T1)와 동일하고 제3 학습 이미지(T3)는 제1 학습 이미지(T1)에서 배경이 흐림 처리된 이미지인 경우, 제1 뉴럴 네트워크(410)는 제1 특징 벡터(FV1)와 제2 특징 벡터(FV2)의 차이의 크기가 제1 특징 벡터(FV1)와 제3 특징 벡터(FV3)의 차이의 크기보다 작도록 학습될 수 있다.For example, if the second learning image (T2) is the same as the first learning image (T1) and the third learning image (T3) is an image with the background blurred in the first learning image (T1), the first neural The network 410 may be trained so that the size of the difference between the first feature vector (FV1) and the second feature vector (FV2) is smaller than the size of the difference between the first feature vector (FV1) and the third feature vector (FV3). there is.

예를 들어, 제1 학습 이미지(T1)의 가우시안 블러의 표준 편차는 1.1이고, 제2 학습 이미지(T2)는 표준 편차가 1.8로 증가된 제1 학습 이미지(T1)이고, 제3 학습 이미지(T3)는 표준 편차가 3으로 증가된 제1 학습 이미지(T1)인 경우, 제1 뉴럴 네트워크(410)는 제1 특징 벡터(FV1)와 제2 특징 벡터(FV2)의 차이의 크기가 제1 특징 벡터(FV1)와 제3 특징 벡터(FV3)의 차이의 크기보다 작도록 학습될 수 있다.For example, the standard deviation of the Gaussian blur of the first training image (T1) is 1.1, the second training image (T2) is the first training image (T1) with the standard deviation increased to 1.8, and the third training image ( When T3) is the first training image (T1) with the standard deviation increased to 3, the first neural network 410 determines that the size of the difference between the first feature vector (FV1) and the second feature vector (FV2) is the first learning image (T1) with the standard deviation increased to 3. It can be learned to be smaller than the size of the difference between the feature vector (FV1) and the third feature vector (FV3).

일 실시 예에 따라 학습된 제1 뉴럴 네트워크(410)에 의해 제1 입력 이미지, 제2 입력 이미지, 및 제3 입력 이미지에 대한 제1 특징 벡터, 제2 특징 벡터, 및 제3 특징 벡터가 추론될 수 있다. 제2 입력 이미지가 제1 입력 이미지와 동일하고, 제3 입력 이미지가 제1 입력 이미지에 화상 깊이 단서가 부가된 이미지인 경우, 추론된 제1 특징 벡터와 추론된 제2 특징 벡터의 차이의 크기는 추론된 제1 특징 벡터와 추론된 제3 특징 벡터의 차이의 크기보다 작다. 제2 입력 이미지는 제1 입력 이미지와 화상 깊이 단서의 레벨 차이가 상대적으로 작은 이미지이고, 제3 입력 이미지는 제1 입력 이미지와 화상 깊이 단서의 레벨 차이가 상대적으로 큰 이미지인 경우, 추론된 제1 특징 벡터와 추론된 제2 특징 벡터의 차이의 크기는 추론된 제1 특징 벡터와 추론된 제3 특징 벡터의 차이의 크기보다 작다.The first feature vector, the second feature vector, and the third feature vector for the first input image, the second input image, and the third input image are inferred by the first neural network 410 learned according to an embodiment. It can be. When the second input image is the same as the first input image and the third input image is an image with an image depth clue added to the first input image, the size of the difference between the inferred first feature vector and the inferred second feature vector is smaller than the size of the difference between the inferred first feature vector and the inferred third feature vector. If the second input image is an image in which the level difference between the first input image and the image depth clue is relatively small, and the third input image is an image in which the level difference between the first input image and the image depth clue is relatively large, the inferred 1 The size of the difference between the feature vector and the inferred second feature vector is smaller than the size of the difference between the inferred first feature vector and the inferred third feature vector.

제1 뉴럴 네트워크(410)가 동일한(또는 상대적으로 비슷한) 레벨의 화상 깊이 단서를 포함하는 학습 이미지들과 다른(또는 상대적으로 비유사한) 레벨의 화상 깊이 단서를 포함하는 학습 이미지들을 분리하도록 학습됨에 따라, 화상 깊이 단서에 관한 정보가 제1 뉴럴 네트워크(410)의 히든 레이어의 특징 맵 및 출력 레이어의 특징 벡터에 투영(projection)될 수 있다. 즉, 깊이 맵과 같이 이미지의 깊이를 직접적으로 추론하는 방식을 이용하지 않고, 제1 뉴럴 네트워크(410)의 특징 맵 및 특징 벡터에 화상 깊이 단서에 관한 정보를 투영시킬 수 있다.The first neural network 410 is trained to separate training images containing image depth cues at a different (or relatively dissimilar) level from training images containing image depth cues at the same (or relatively similar) level. Accordingly, information about image depth clues may be projected onto the feature map of the hidden layer and the feature vector of the output layer of the first neural network 410. That is, information about image depth clues can be projected onto the feature map and feature vector of the first neural network 410 without using a method of directly inferring the depth of the image, such as a depth map.

도 5는 일 실시 예에 따른 딥 뉴럴 네트워크(500)의 학습을 설명하기 위한 도면이다.FIG. 5 is a diagram for explaining learning of a deep neural network 500 according to an embodiment.

딥 뉴럴 네트워크(500)는 컨트롤 파라미터에 따라 다른 레벨의 화상 깊이 단서를 갖는 이미지를 출력하도록 학습될 수 있다. 딥 뉴럴 네트워크(500)의 학습에는 지도 학습(supervised learning)과 비지도 학습의 중간 형태의 학습이 이용될 수 있다. 딥 뉴럴 네트워크(500)의 학습에는 준 지도 학습(semi-supervised learning)이 이용될 수 있다. 준 지도 학습에 따라, 딥 뉴럴 네트워크(500)의 학습에는 몇몇 데이터 포인트들에 대한 진실 라벨들(ground truth labels)이 이용될 수 있다. 딥 뉴럴 네트워크(500)의 학습에는 몇몇 컨트롤 파라미터들에 대한 진실 이미지들(ground truth images)이 이용될 수 있다.The deep neural network 500 can be trained to output images with different levels of image depth clues depending on control parameters. An intermediate form of learning between supervised learning and unsupervised learning may be used to learn the deep neural network 500. Semi-supervised learning may be used to learn the deep neural network 500. According to semi-supervised learning, ground truth labels for several data points may be used to learn the deep neural network 500. Ground truth images for several control parameters may be used to learn the deep neural network 500.

딥 뉴럴 네트워크(500)의 학습에 이용되는 진실 이미지에는 학습 이미지와 동일한 이미지가 사용되거나 학습 이미지에 화상 깊이 단서가 부가된 이미지가 이용될 수 있다. 학습 이미지에 부가되는 화상 깊이 단서의 종류, 개수, 또는 레벨은 제한되지 않는다. 예를 들어, 진실 이미지는 학습 이미지에 흐림이 부가된 이미지이거나, 학습 이미지에 흐림 및 대비가 부가된 이미지일 수 있다. 예를 들어, 진실 이미지는 배경의 가우시안 흐림의 표준 편차가 3인 학습 이미지이거나, 배경의 가우시안 흐림의 표준 편차가 1.2인 학습 이미지일 수 있다.The truth image used for learning of the deep neural network 500 may be the same image as the learning image, or an image with an image depth clue added to the learning image may be used. The type, number, or level of image depth clues added to the training images are not limited. For example, the truth image may be an image with blur added to a learning image, or an image with blur and contrast added to a learning image. For example, the truth image may be a training image with a standard deviation of the background Gaussian blur of 3, or a training image with a standard deviation of the background Gaussian blur of 1.2.

딥 뉴럴 네트워크(500)의 학습에는 기 학습된 제1 뉴럴 네트워크(510)가 이용될 수 있다. 딥 뉴럴 네트워크(500)의 학습에서 제2 뉴럴 네트워크(520), 제3 뉴럴 네트워크(530), 및 컨트롤 가능한 변환 모듈(540)이 학습 대상일 수 있다.The previously learned first neural network 510 may be used to learn the deep neural network 500. In learning the deep neural network 500, the second neural network 520, the third neural network 530, and the controllable transformation module 540 may be learning targets.

일 실시 예에 따른 딥 뉴럴 네트워크(500)의 학습에는 컨트롤 파라미터(σ), 학습 이미지(T4), 제1 진실 이미지(T5), 및 제2 진실 이미지(T6)가 이용될 수 있다. 컨트롤 파라미터(σ)는 단일 변수일 수 있다.A control parameter (σ), a learning image (T4), a first truth image (T5), and a second truth image (T6) may be used to learn the deep neural network 500 according to an embodiment. The control parameter (σ) may be a single variable.

제1 진실 이미지(T5)는 학습 이미지(T4)와 동일할 수 있다. 제2 진실 이미지(T6)는 학습 이미지(T4)에 화상 깊이 단서가 부가된 이미지일 수 있다. 예를 들어, 제1 진실 이미지(T5)는 학습 이미지(T4)와 같고, 제2 진실 이미지(T6)는 배경이 흐림 처리된 학습 이미지(T4)일 수 있다. 또는, 제1 진실 이미지(T5)는 학습 이미지(T4)와 화상 깊이 단서의 레벨 차이가 상대적으로 작은 이미지이고, 제2 진실 이미지(T6)는 학습 이미지(T4)와 화상 깊이 단서의 레벨 차이가 상대적으로 큰 이미지일 수 있다. 예를 들어, 학습 이미지(T4), 제1 진실 이미지(T5), 및 제2 진실 이미지(T6)는 명암비만 다른 이미지들로, 학습 이미지(T4)의 명암비의 값은 500이고, 제1 진실 이미지(T5)의 명암비의 값은 800이고, 제2 진실 이미지(T6)의 명암비의 값은 1200일 수 있다.The first truth image (T5) may be the same as the learning image (T4). The second truth image T6 may be an image to which an image depth clue is added to the learning image T4. For example, the first truth image T5 may be the same as the training image T4, and the second truth image T6 may be the training image T4 with a blurred background. Alternatively, the first truth image (T5) is an image in which the level difference between the training image (T4) and the image depth clue is relatively small, and the second truth image (T6) is an image in which the level difference between the training image (T4) and the image depth clue is relatively small. It may be a relatively large image. For example, the training image (T4), the first truth image (T5), and the second truth image (T6) are images that differ only in contrast ratio, and the contrast ratio of the training image (T4) is 500 and the first truth image (T5) is 500. The contrast ratio of the image T5 may be 800, and the contrast ratio of the second truth image T6 may be 1200.

컨트롤 파라미터(σ)가 v0의 값을 갖는 경우, 딥 뉴럴 네트워크(500)는 학습 이미지(T4)의 입력에 대하여 제1 진실 이미지(T5)의 라벨링에 의해 학습될 수 있다. 컨트롤 파라미터(σ)가 v1의 값을 갖는 경우, 딥 뉴럴 네트워크(500)는 학습 이미지(T4)의 입력에 대하여 제2 진실 이미지(T6)의 라벨링에 의해 학습될 수 있다.When the control parameter σ has a value of v0, the deep neural network 500 can be trained by labeling the first truth image T5 with respect to the input of the training image T4. When the control parameter σ has a value of v1, the deep neural network 500 can be trained by labeling the second truth image T6 with respect to the input of the training image T4.

일 실시 예에 따라 학습된 딥 뉴럴 네트워크(500)는 컨트롤 파라미터(σ)에 따라 서로 다른 레벨의 화상 깊이 단서를 갖는 출력 이미지들을 추론할 수 있다. 간단한 이해를 위해, 학습 이미지(T4)가 테스트 이미지로 이용된 경우를 가정한다. 이 경우, 딥 뉴럴 네트워크(500)에 v0을 갖는 컨트롤 파라미터(σ)가 입력되면 제1 진실 이미지(T5)가 추론되고, 딥 뉴럴 네트워크(500)에 v1을 갖는 컨트롤 파라미터(σ)가 입력되면 제2 진실 이미지(T6)가 추론된다. 나아가, 딥 뉴럴 네트워크(500)에 v0와 v1 사이의 값을 갖는 컨트롤 파라미터(σ)가 입력되면 제1 진실 이미지(T5)의 화상 깊이 단서의 레벨과 제2 진실 이미지(T6)의 화상 깊이 단서의 레벨의 사이의 레벨의 화상 깊이 단서를 갖는 이미지가 추론될 수 있다. The deep neural network 500 learned according to one embodiment can infer output images having different levels of image depth clues according to the control parameter (σ). For simple understanding, assume that the training image (T4) is used as a test image. In this case, when the control parameter (σ) with v0 is input to the deep neural network 500, the first truth image (T5) is inferred, and when the control parameter (σ) with v1 is input to the deep neural network 500, the first truth image (T5) is inferred. A second truth image (T6) is inferred. Furthermore, when the control parameter σ having a value between v0 and v1 is input to the deep neural network 500, the level of the image depth clue of the first truth image T5 and the image depth clue of the second truth image T6 Images with image depth clues of levels between levels of can be inferred.

예를 들어, 딥 뉴럴 네트워크(500)의 학습에 사용된 학습 이미지(T4)의 명암비의 값이 500이고, 제1 진실 이미지(T5)는 학습 이미지(T4)와 동일한 이미지이고, 제2 진실 이미지(T6)는 명암비의 값이 1500으로 증가된 학습 이미지(T4)이고, v0는 0이고, v1은 1인 경우를 가정한다. 또한, 간단한 이해를 위해, 학습 이미지(T4)가 테스트 이미지로 이용된 경우를 가정한다. 이 경우, 컨트롤 파라미터(σ)가 0이면 딥 뉴럴 네트워크(500)는 명암비의 값이 500인 제1 진실 이미지(T5)를 추론한다. 컨트롤 파라미터(σ)가 1이면 딥 뉴럴 네트워크(500)는 명암비의 값이 1500인 제2 진실 이미지(T6)를 추론한다. 나아가, 컨트롤 파라미터(σ)가 0과 1 사이의 값이면 딥 뉴럴 네트워크(500)는 명암비의 값이 500과 1500 사이인 이미지를 추론한다.For example, the contrast ratio value of the training image (T4) used for learning of the deep neural network 500 is 500, the first truth image (T5) is the same image as the training image (T4), and the second truth image (T6) is a learning image (T4) whose contrast ratio has been increased to 1500, v0 is 0, and v1 is 1. Additionally, for simple understanding, it is assumed that the learning image T4 is used as a test image. In this case, if the control parameter (σ) is 0, the deep neural network 500 infers the first truth image (T5) with a contrast ratio value of 500. If the control parameter (σ) is 1, the deep neural network 500 infers the second truth image (T6) with a contrast ratio of 1500. Furthermore, if the control parameter (σ) is a value between 0 and 1, the deep neural network 500 infers an image with a contrast ratio between 500 and 1500.

일 실시 예에 따라 학습된 딥 뉴럴 네트워크(500)에서는, 단일 변수인 컨트롤 파라미터(σ)를 조절함으로써 출력 이미지에 포함된 복수의 화상 깊이 단서들의 레벨이 한 번에 조절될 수 있다.In the deep neural network 500 learned according to one embodiment, the levels of a plurality of image depth clues included in the output image can be adjusted at once by adjusting the control parameter (σ), which is a single variable.

예를 들어, 딥 뉴럴 네트워크(500)의 학습에 사용된 학습 이미지(T4)의 명암비의 값이 500이고 가우시안 흐림의 표준 편차가 0.5이고, 제1 진실 이미지(T5)는 학습 이미지(T4)와 동일한 이미지이고, 제2 진실 이미지(T6)는 명암비의 값이 1500으로 증가되고 가우시안 흐림의 표준 편차가 1.5로 증가된 학습 이미지(T4)이고, v0는 0이고, v1은 1인 경우를 가정한다. 또한, 간단한 이해를 위해, 학습 이미지(T4)가 테스트 이미지로 이용된 경우를 가정한다. 이 경우, 컨트롤 파라미터(σ)가 0이면 딥 뉴럴 네트워크(500)는 명암비의 값이 500이고 가우시안 흐림의 표준 편차가 0.5인 제1 진실 이미지(T5)를 추론한다. 컨트롤 파라미터(σ)가 1이면 딥 뉴럴 네트워크(500)는 명암비의 값이 1500이고 가우시안 흐림의 표준 편차가 1.5인 제2 진실 이미지(T6)를 추론한다. 나아가, 컨트롤 파라미터(σ)가 0과 1 사이의 값이면 딥 뉴럴 네트워크(500)는 명암비의 값이 500과 1500 사이이고 가우시안 흐림의 표준 편차가 0.5와 1.5 사이인 이미지를 추론한다.For example, the contrast ratio of the training image (T4) used for learning the deep neural network 500 is 500, the standard deviation of Gaussian blur is 0.5, and the first truth image (T5) is similar to the training image (T4). It is assumed that it is the same image, the second truth image (T6) is a training image (T4) whose contrast value is increased to 1500 and the standard deviation of Gaussian blur is increased to 1.5, v0 is 0, and v1 is 1. . Additionally, for simple understanding, it is assumed that the learning image T4 is used as a test image. In this case, if the control parameter (σ) is 0, the deep neural network 500 infers the first truth image (T5) with a contrast ratio of 500 and a standard deviation of Gaussian blur of 0.5. If the control parameter (σ) is 1, the deep neural network 500 infers a second truth image (T6) with a contrast ratio of 1500 and a Gaussian blur standard deviation of 1.5. Furthermore, if the control parameter (σ) is a value between 0 and 1, the deep neural network 500 infers an image where the contrast ratio is between 500 and 1500 and the standard deviation of Gaussian blur is between 0.5 and 1.5.

도 5를 참조한 실시 예에서 v0, v1의 2개의 컨트롤 파라미터(σ)에 대하여 딥 뉴럴 네트워크(500)가 학습되었으나, 학습에 이용되는 컨트롤 파라미터(σ)의 개수는 이에 제한되지 않는다. 또한, 0과 1을 갖는 컨트롤 파라미터(σ)에 대하여 딥 뉴럴 네트워크(500)가 학습되었으나, 학습에 이용되는 컨트롤 파라미터(σ)의 값은 이에 제한되지 않는다.In the embodiment referring to FIG. 5, the deep neural network 500 is trained on two control parameters (σ) of v0 and v1, but the number of control parameters (σ) used for learning is not limited thereto. In addition, although the deep neural network 500 was learned with respect to the control parameter (σ) having 0 and 1, the value of the control parameter (σ) used for learning is not limited thereto.

도 6은 일 실시 예에 따른 제1 뉴럴 네트워크(610), 제2 뉴럴 네트워크(620), 제3 뉴럴 네트워크(630), 및 컨트롤 가능한 변환 모듈(640)을 나타낸 도면이다.FIG. 6 is a diagram illustrating a first neural network 610, a second neural network 620, a third neural network 630, and a controllable transformation module 640 according to an embodiment.

제1 뉴럴 네트워크(610)는 입력 이미지(A)를 수신하고 레이어들을 통해 특징 벡터(FV)를 출력할 수 있다. 제1 뉴럴 네트워크(610)의 특징 벡터(FV)와 히든 레이어의 특징 맵(FM)은 화상 깊이 단서와 연관된 정보를 포함할 수 있다.The first neural network 610 may receive an input image (A) and output a feature vector (FV) through the layers. The feature vector (FV) of the first neural network 610 and the feature map (FM) of the hidden layer may include information related to image depth clues.

제2 뉴럴 네트워크(620)는 입력 이미지(A)를 수신하고 입력 이미지(A)의 공간 해상도를 감소시킴으로써 특징 맵을 출력할 수 있다. 제2 뉴럴 네트워크(620)는 입력 이미지(A)의 공간 해상도를 감소시키기 위한 인코더 모듈들의 시퀀스를 포함할 수 있다.The second neural network 620 may receive the input image (A) and output a feature map by reducing the spatial resolution of the input image (A). The second neural network 620 may include a sequence of encoder modules to reduce the spatial resolution of the input image A.

인코더 모듈(622)은 다운샘플링(downsampling)에 기초하여 입력 특징 맵으로부터 출력 특징 맵을 생성하도록 구성될 수 있다. 또한, 인코더 모듈(622)은 입력 특징 맵과 제1 뉴럴 네트워크(610)의 특징 벡터(FV)를 결합하여 출력 특징 맵을 생성하도록 구성될 수 있다.Encoder module 622 may be configured to generate an output feature map from an input feature map based on downsampling. Additionally, the encoder module 622 may be configured to generate an output feature map by combining the input feature map and the feature vector (FV) of the first neural network 610.

인코더 모듈(622)은 입력 특징 맵으로서, 앞선 인코더 모듈(621, preceding encoder module)의 출력 특징 맵을 수신할 수 있다. 인코더 모듈(622)은 적어도 하나의 레이어를 통해, 입력 특징 맵으로부터 중간 특징 맵을 생성할 수 있다. 인코더 모듈(622)은 중간 특징 맵과 특징 벡터(FV)를 결합하여 결합된 특징 맵을 생성하고, 결합된 특징 맵을 다운샘플링 하여 출력 특징 맵을 생성할 수 있다.The encoder module 622 is an input feature map and can receive the output feature map of the preceding encoder module (621). The encoder module 622 may generate an intermediate feature map from the input feature map through at least one layer. The encoder module 622 may generate a combined feature map by combining the intermediate feature map and a feature vector (FV), and generate an output feature map by downsampling the combined feature map.

인코더 모듈(622)의 중간 특징 맵과 제1 뉴럴 네트워크(610)의 특징 벡터(FV)를 결합함에 따라, 특징 벡터(FV)에 포함된 화상 깊이 단서에 대한 정보가 인코더 모듈(622)의 출력 특징 맵에 결합될 수 있다. 또한, 인코더 모듈(622)의 결합된 특징 맵이 다운샘플링 되어 인코더 모듈(622)의 출력 특징 맵이 생성됨에 따라, 입력 특징 맵 대비 공간 해상도가 감소된 출력 특징 맵이 획득될 수 있다.By combining the intermediate feature map of the encoder module 622 and the feature vector (FV) of the first neural network 610, information about the image depth clue included in the feature vector (FV) is output from the encoder module 622. Can be combined into a feature map. Additionally, as the combined feature map of the encoder module 622 is downsampled to generate the output feature map of the encoder module 622, an output feature map with reduced spatial resolution compared to the input feature map may be obtained.

컨트롤 가능한 변환 모듈(640)은 제1 뉴럴 네트워크(610)의 히든 레이어의 특징 맵(FM)을 수신할 수 있다. 컨트롤 가능한 변환 모듈(640)은 컨트롤 파라미터(σ)에 기초하여 특징 맵(FM)을 변환하고, 변환된 특징 맵을 제3 뉴럴 네트워크(630)에 전달할 수 있다.The controllable transformation module 640 may receive the feature map (FM) of the hidden layer of the first neural network 610. The controllable conversion module 640 may convert the feature map (FM) based on the control parameter (σ) and transmit the converted feature map to the third neural network 630.

제3 뉴럴 네트워크(630)는 입력된 특징 맵의 공간 해상도를 증가시킴으로써 출력 이미지(A')를 출력할 수 있다. 이때, 입력된 특징 맵은 제2 뉴럴 네트워크(620)의 출력일 수 있다. 제3 뉴럴 네트워크(630)는 입력된 특징 맵의 공간 해상도를 증가시키기 위한 디코더 모듈들의 시퀀스를 포함할 수 있다.The third neural network 630 may output the output image A' by increasing the spatial resolution of the input feature map. At this time, the input feature map may be the output of the second neural network 620. The third neural network 630 may include a sequence of decoder modules to increase the spatial resolution of the input feature map.

디코더 모듈(632)은 업샘플링(upsampling)에 기초하여 입력 특징 맵으로부터 출력 특징 맵을 생성하도록 구성될 수 있다. 또한, 디코더 모듈(632)은 컨트롤 가능한 변환 모듈(640)로부터 수신한 제1 뉴럴 네트워크(610)의 변환된 특징 맵을 결합하여 출력 특징 맵을 생성하도록 구성될 수 있다.The decoder module 632 may be configured to generate an output feature map from an input feature map based on upsampling. Additionally, the decoder module 632 may be configured to generate an output feature map by combining the transformed feature maps of the first neural network 610 received from the controllable transformation module 640.

디코더 모듈(632)은 입력 특징 맵으로서 앞선 디코더 모듈(631, preceding decoder module)의 출력 특징 맵을 수신할 수 있다. 디코더 모듈(632)은 입력 특징 맵을 업샘플링 할 수 있다. 디코더 모듈(632)은 적어도 하나의 레이어를 통해, 업샘플링 된 입력 특징 맵으로부터 중간 특징 맵을 생성할 수 있다. 업샘플링에는 트랜스포즈 컨볼루션이 이용될 수 있으나, 이에 제한되는 것은 아니다. 디코더 모듈(632)의 중간 특징 맵의 생성에는 인코더 모듈(622)의 특징 맵이 이용될 수 있다. 디코더 모듈(632)의 업샘플링 된 입력 특징 맵과 인코더 모듈(622)의 특징 맵이 연쇄(concatenation)됨으로써 디코더 모듈(632)의 중간 특징 맵이 생성될 수 있다. 디코더 모듈(632)은 중간 특징 맵과 컨트롤 가능한 변환 모듈(640)로부터 수신한 제1 뉴럴 네트워크(610)의 변환된 특징 맵을 결합하여 결합된 특징 맵을 생성할 수 있다. 디코더 모듈(632)은 적어도 하나의 레이어를 통해, 결합된 특징 맵으로부터 출력 특징 맵을 생성할 수 있다.The decoder module 632 may receive the output feature map of the preceding decoder module (631) as an input feature map. Decoder module 632 may upsample the input feature map. The decoder module 632 may generate an intermediate feature map from the upsampled input feature map through at least one layer. Transpose convolution may be used for upsampling, but is not limited thereto. The feature map of the encoder module 622 may be used to generate the intermediate feature map of the decoder module 632. An intermediate feature map of the decoder module 632 may be generated by concatenating the upsampled input feature map of the decoder module 632 and the feature map of the encoder module 622. The decoder module 632 may generate a combined feature map by combining the intermediate feature map and the transformed feature map of the first neural network 610 received from the controllable transformation module 640. The decoder module 632 may generate an output feature map from the combined feature map through at least one layer.

디코더 모듈(632)의 중간 특징 맵과 제1 뉴럴 네트워크(610)의 특징 맵(FM)을 결합함에 따라, 특징 맵(FM)에 포함된 화상 깊이 단서에 대한 정보가 디코더 모듈(632)의 출력 특징 맵에 결합될 수 있다. 또한, 디코더 모듈(632)의 입력 특징 맵이 업샘플링 되어 디코더 모듈(632)의 출력 특징 맵이 생성됨에 따라, 입력 특징 맵 대비 공간 해상도가 증가된 출력 특징 맵이 획득될 수 있다.By combining the intermediate feature map of the decoder module 632 and the feature map (FM) of the first neural network 610, information about the image depth clue included in the feature map (FM) is output from the decoder module 632. Can be combined into a feature map. Additionally, as the input feature map of the decoder module 632 is upsampled to generate the output feature map of the decoder module 632, an output feature map with increased spatial resolution compared to the input feature map can be obtained.

제2 뉴럴 네트워크(620) 및 제3 뉴럴 네트워크(630)의 입력 이미지(A)보다 공간 해상도가 감소된 특징 맵들에 제1 뉴럴 네트워크(610)의 특징 벡터(FV) 및 특징 맵(FM)이 결합됨에 따라, 특징 벡터(FV) 및 특징 맵(FM)에 포함된 화상 깊이 단서에 관한 정보가 효과적으로 출력 이미지(A')에 반영될 수 있다.The feature vector (FV) and feature map (FM) of the first neural network 610 are added to the feature maps with reduced spatial resolution than the input images (A) of the second neural network 620 and the third neural network 630. By combining, information about image depth clues contained in the feature vector (FV) and feature map (FM) can be effectively reflected in the output image (A').

도 7은 일 실시 예에 따른 인코더 모듈(700)을 나타낸 도면이다.Figure 7 is a diagram showing the encoder module 700 according to one embodiment.

일 실시 예에 따른 인코더 모듈(700)은 다운샘플링에 기초하여 입력 특징 맵(711)으로부터 출력 특징 맵(714)을 생성한다. 인코더 모듈(700)은 앞선 인코더 모듈의 출력 특징 맵을 수신할 수 있다. 앞선 인코더 모듈의 출력 특징 맵은 인코더 모듈(700)의 입력 특징 맵(711)으로 이용될 수 있다.The encoder module 700 according to one embodiment generates an output feature map 714 from the input feature map 711 based on downsampling. The encoder module 700 may receive the output feature map of the previous encoder module. The output feature map of the previous encoder module can be used as the input feature map 711 of the encoder module 700.

인코더 모듈(700)은 적어도 하나의 레이어를 통해 입력 특징 맵(711)으로부터 중간 특징 맵(712)을 획득할 수 있다. 예를 들어, 인코더 모듈(700)은 컨볼루셔날 레이어를 통해 입력 특징 맵(711)으로부터 중간 특징 맵(712)을 획득할 수 있다.The encoder module 700 may obtain the intermediate feature map 712 from the input feature map 711 through at least one layer. For example, the encoder module 700 may obtain the intermediate feature map 712 from the input feature map 711 through a convolutional layer.

인코더 모듈(700)은 제1 뉴럴 네트워크에서 출력된 특징 벡터(FV)를 수신할 수 있다. 인코더 모듈(700)은 특징 벡터(FV)의 사이즈를 변환할 수 있다. 인코더 모듈(700)은 중간 특징 맵(712)의 채널 수와 특징 벡터(FV)의 길이가 동일하도록 특징 벡터(FV)의 사이즈를 변환할 수 있다.The encoder module 700 may receive a feature vector (FV) output from the first neural network. The encoder module 700 can convert the size of the feature vector (FV). The encoder module 700 may convert the size of the feature vector (FV) so that the number of channels of the intermediate feature map 712 and the length of the feature vector (FV) are the same.

인코더 모듈(700)은 중간 특징 맵(712)과 사이즈가 변환된 특징 벡터(FV')를 결합함으로써 결합된 특징 맵(713)을 생성할 수 있다. 중간 특징 맵(712)과 사이즈가 변환된 특징 벡터(FV')는 요소 별 곱(element-wise multiplication)에 의해 결합될 수 있다. 인코더 모듈(700)은 결합된 특징 맵(713)을 다운샘플링하여 출력 특징 맵(714)을 생성할 수 있다. 예를 들어, 다운샘플링에는 컨볼루션 또는 풀링이 이용될 수 있으나, 이에 제한되지 않는다.The encoder module 700 may generate a combined feature map 713 by combining the intermediate feature map 712 and the size-converted feature vector (FV'). The intermediate feature map 712 and the size-converted feature vector (FV') can be combined by element-wise multiplication. The encoder module 700 may downsample the combined feature map 713 to generate the output feature map 714. For example, convolution or pooling may be used for downsampling, but is not limited thereto.

도 8은 일 실시 예에 따른 컨트롤 가능한 변환 모듈(800)을 나타낸 도면이다.Figure 8 is a diagram showing a controllable conversion module 800 according to an embodiment.

컨트롤 가능한 변환 모듈(800)은 제1 뉴럴 네트워크의 히든 레이어의 특징 맵(811) 및 컨트롤 파라미터(σ)를 수신할 수 있다.The controllable transformation module 800 may receive the feature map 811 and the control parameter σ of the hidden layer of the first neural network.

컨트롤 가능한 변환 모듈(800)은 수신한 특징 맵(811)으로부터 제1 특징 맵(812)을 획득할 수 있다. 예를 들어, 제1 특징 맵(812)으로서 수신한 특징 맵(811)이 이용될 수 있다. 예를 들어, 컨트롤 가능한 변환 모듈(800)은 적어도 하나의 레이어를 통해, 수신한 특징 맵(811)으로부터 제1 특징 맵(812)을 생성할 수 있다. 예를 들어, 컨트롤 가능한 변환 모듈(800)은 컨볼루셔날 레이어를 통해 수신한 특징 맵(811)으로부터 제1 특징 맵(812)을 생성할 수 있다.The controllable conversion module 800 may obtain the first feature map 812 from the received feature map 811. For example, the received feature map 811 may be used as the first feature map 812. For example, the controllable transformation module 800 may generate the first feature map 812 from the received feature map 811 through at least one layer. For example, the controllable transformation module 800 may generate the first feature map 812 from the feature map 811 received through the convolutional layer.

컨트롤 가능한 변환 모듈(800)은 컨트롤 파라미터(σ)로부터 제1 가중치 벡터(813) 및 제2 가중치 벡터(814)를 생성할 수 있다. 제1 가중치 벡터(813)는 수학식 2와 같이 표현될 수 있고, 제2 가중치 벡터(814)는 수학식 3과 같이 표현될 수 있다.The controllable conversion module 800 may generate a first weight vector 813 and a second weight vector 814 from the control parameter σ. The first weight vector 813 can be expressed as Equation 2, and the second weight vector 814 can be expressed as Equation 3.

수학식 2 및 3에서, σ는 컨트롤 파라미터를, 는 벡터 형태의 가중치를, 는 벡터 형태의 바이어스(bias)를 나타낸다.In equations 2 and 3, σ is the control parameter; is the weight in vector form, represents bias in vector form.

제1 가중치 벡터(813)와 제2 가중치 벡터(814) 각각의 놈의 제곱의 합은 일정할 수 있다. 제1 가중치 벡터(813)와 제2 가중치 벡터(814)가 수학식 2 및 3과 같이 구현된 경우, 제1 가중치 벡터(813)와 제2 가중치 벡터(814) 각각의 놈의 제곱의 합은 1이다.The sum of the squares of the norms of each of the first weight vector 813 and the second weight vector 814 may be constant. When the first weight vector 813 and the second weight vector 814 are implemented as in Equations 2 and 3, the sum of the squares of the norms of the first weight vector 813 and the second weight vector 814 is It is 1.

컨트롤 가능한 변환 모듈(800)은 제1 특징 맵(812)과 제1 가중치 벡터(813)의 요소 별 곱으로부터 제2 특징 맵(815)을 생성하고, 제1 특징 맵(812)과 제2 가중치 벡터(814)의 요소 별 곱으로부터 제3 특징 맵(816)을 생성할 수 있다. 제1 가중치 벡터(813)와 제2 가중치 벡터(814)가 수학식 2 및 3과 같이 구현됨에 따라, 제2 특징 맵(815)은 제1 가중치 벡터(813)에 의해 제1 특징 맵(812)이 회전 변환됨으로써 생성될 수 있다. 또한, 제3 특징 맵(816)은 제2 가중치 벡터(814)에 의해 제1 특징 맵(812)이 회전 변환됨으로써 생성될 수 있다. 또한, 제1 특징 맵(812)이 회전 변환되는 정도는 컨트롤 파라미터(σ)에 의해 조절될 수 있다.The controllable transformation module 800 generates a second feature map 815 from the element-by-element product of the first feature map 812 and the first weight vector 813, and the first feature map 812 and the second weight vector The third feature map 816 can be generated from the element-wise product of the vector 814. As the first weight vector 813 and the second weight vector 814 are implemented as shown in Equations 2 and 3, the second feature map 815 is divided into the first feature map 812 by the first weight vector 813. ) can be created by rotation conversion. Additionally, the third feature map 816 may be generated by rotating the first feature map 812 by the second weight vector 814. Additionally, the degree to which the first feature map 812 is rotated can be adjusted by the control parameter (σ).

컨트롤 가능한 변환 모듈(800)은 제2 특징 맵(815)과 제3 특징 맵(816)을 연쇄(concatenation)할 수 있다. 연쇄된 특징 맵(817)은 변환된 제1 뉴럴 네트워크의 히든 레이어의 특징 맵으로서 제3 뉴럴 네트워크에 제공될 수 있다.The controllable conversion module 800 can concatenate the second feature map 815 and the third feature map 816. The concatenated feature map 817 may be provided to the third neural network as a feature map of the hidden layer of the converted first neural network.

제1 가중치 벡터(813)와 제2 가중치 벡터(814)가 수학식 2 및 3과 같이 구현됨에 따라, 컨트롤 파라미터(σ)가 변하여도 제1 가중치 벡터(813)와 제2 가중치 벡터(814) 각각의 놈(norm)의 제곱의 합은 1로 일정할 수 있다. 그에 따라, 컨트롤 파라미터(σ)의 값과 관계없이, 제1 특징 맵(812)의 놈(norm)과 연쇄된 특징 맵(817)의 놈(norm)은 동일할 수 있다. 그에 따라, 변하는 컨트롤 파라미터(σ)에 대한 딥 뉴럴 네트워크의 학습 및 추론의 안정성이 보장될 수 있다.As the first weight vector 813 and the second weight vector 814 are implemented as in Equations 2 and 3, even if the control parameter (σ) changes, the first weight vector 813 and the second weight vector 814 The sum of the squares of each norm may be constant at 1. Accordingly, regardless of the value of the control parameter σ, the norm of the first feature map 812 and the norm of the concatenated feature map 817 may be the same. Accordingly, the stability of learning and inference of the deep neural network with respect to the changing control parameter (σ) can be guaranteed.

컨트롤 파라미터(σ)가 수학식 2 및 3과 같이 이용됨에 따라, 컨트롤 파라미터(σ)를 증가시킴에 따라 제1 특징 맵(812)이 회전 변환되는 정도는 주기적일 수 있다. 즉, 컨트롤 파라미터(σ)를 증가시킴에 따라 연쇄된 특징 맵(817)은 주기적으로 변할 수 있다. 즉, 컨트롤 파라미터(σ)를 증가시킴에 따라 제1 뉴럴 네트워크의 히든 레이어의 특징 맵은 주기적으로 변환될 수 있다.As the control parameter (σ) is used as shown in Equations 2 and 3, the degree to which the first feature map 812 is rotated may be periodic as the control parameter (σ) increases. That is, as the control parameter σ increases, the chained feature map 817 may change periodically. That is, as the control parameter (σ) increases, the feature map of the hidden layer of the first neural network can be periodically converted.

도 9는 일 실시 예에 따른 요소 별 연산 모듈(900)을 나타낸 도면이다.FIG. 9 is a diagram illustrating an operation module 900 for each element according to an embodiment.

일 실시 예에 따른 요소 별 연산 모듈(900)은 컨디션 특징 맵(912)에 기초하여 입력 특징 맵(911)을 요소 별로 변환함으로써 출력 특징 맵(915)을 생성하도록 구성된다. 예를 들어, 요소 별 연산 모듈(900)은 SFT(spatial feature transform) 레이어로 구성될 수 있다.The element-specific calculation module 900 according to an embodiment is configured to generate the output feature map 915 by converting the input feature map 911 for each element based on the condition feature map 912. For example, the element-specific calculation module 900 may be composed of a spatial feature transform (SFT) layer.

요소 별 연산 모듈(900)은 적어도 하나의 레이어를 통해, 컨디션 특징 맵(912)으로부터 스케일 특징 맵(913)을 생성할 수 있다. 또한, 요소 별 연산 모듈(900)은 적어도 하나의 레이어를 통해, 컨디션 특징 맵(912)으로부터 시프트 특징 맵(914)을 생성할 수 있다. 이때, 적어도 하나의 레이어는 컨볼루셔널 레이어 또는 액티베이션 레이어를 포함할 수 있다.The element-specific calculation module 900 may generate a scale feature map 913 from the condition feature map 912 through at least one layer. Additionally, the element-specific calculation module 900 may generate a shift feature map 914 from the condition feature map 912 through at least one layer. At this time, at least one layer may include a convolutional layer or an activation layer.

요소 별 연산 모듈(900)은 입력 특징 맵(911)에 스케일 특징 맵(913)과 요소 별 곱을 수행한 후 시프트 특징 맵(914)과 요소 별 합을 수행함으로써 출력 특징 맵(915)을 생성할 수 있다.The element-wise operation module 900 generates the output feature map 915 by performing an element-wise product of the input feature map 911 with the scale feature map 913 and then performing an element-wise sum with the shift feature map 914. You can.

요소 별 연산 모듈(900)은 도 9에 도시된 것과 다르게 구성될 수 있다. 예를 들어, 요소 별 연산 모듈(900)은 스케일 특징 맵(913) 및 요소 별 곱의 프로세싱이 생략되도록 구성될 수 있다. 예를 들어, 요소 별 연산 모듈(900)은 시프트 특징 맵(914) 및 요소 별 합의 프로세싱이 생략되도록 구성될 수 있다. 예를 들어, 요소 별 연산 모듈(900)은 추가적인 스케일 특징 맵에 의한 요소 별 곱의 프로세싱 또는 추가적인 시프트 특징 맵에 의한 요소 별 합의 프로세싱을 포함하도록 구성될 수 있다.The calculation module 900 for each element may be configured differently from that shown in FIG. 9 . For example, the per-element operation module 900 may be configured to omit the processing of the scale feature map 913 and the product per element. For example, the operation module 900 for each element may be configured to omit the processing of the shift feature map 914 and the sum for each element. For example, the per-element operation module 900 may be configured to include processing of per-element product by an additional scale feature map or processing of per-element sum by an additional shift feature map.

도 10은 일 실시 예에 따른 디코더 모듈(1000)을 나타낸 도면이다.Figure 10 is a diagram showing a decoder module 1000 according to an embodiment.

일 실시 예에 따른 디코더 모듈(1000)은 업샘플링에 기초하여 입력 특징 맵(1011)으로부터 출력 특징 맵(1016)을 생성할 수 있다. 디코더 모듈(1000)은 앞선 디코더 모듈의 출력 특징 맵을 수신할 수 있다. 앞선 인코더 모듈의 출력 특징 맵은 디코더 모듈(1000)은 입력 특징 맵(1011)으로 이용될 수 있다.The decoder module 1000 according to an embodiment may generate an output feature map 1016 from the input feature map 1011 based on upsampling. The decoder module 1000 may receive the output feature map of the previous decoder module. The output feature map of the preceding encoder module can be used by the decoder module 1000 as the input feature map 1011.

디코더 모듈(1000)은 입력 특징 맵(1011)을 업샘플링할 수 있다. 업샘플링에는 트랜스포즈 컨볼루션이 이용될 수 있으나, 이에 제한되지 않는다.The decoder module 1000 may upsample the input feature map 1011. Transpose convolution may be used for upsampling, but is not limited thereto.

디코더 모듈(1000)은 인코더 모듈의 특징 맵(1012)을 수신할 수 있다. 디코더 모듈(1000)은 업샘플링된 입력 특징 맵(1013)과 인코더 모듈의 특징 맵(1012)을 연쇄함으로써 중간 특징 맵(1014)을 생성할 수 있다. 이때, 인코더 모듈의 특징 맵(1012)은 인코더 모듈의 중간 특징 맵일 수 있으나, 이에 제한되지 않는다.The decoder module 1000 may receive the feature map 1012 of the encoder module. The decoder module 1000 may generate an intermediate feature map 1014 by concatenating the upsampled input feature map 1013 and the feature map 1012 of the encoder module. At this time, the feature map 1012 of the encoder module may be an intermediate feature map of the encoder module, but is not limited thereto.

디코더 모듈(1000)은 컨트롤 가능한 변환 모듈로부터 변환된 제1 뉴럴 네트워크의 히든 레이어의 특징 맵(FM)을 수신할 수 있다. 중간 특징 맵(1014)과 특징 맵(FM')은 요소 별 연산에 의해 결합될 수 있다. 디코더 모듈(1000)은 적어도 하나의 레이어를 통해, 특징 맵(FM')의 사이즈를 변환함으로써 스케일 특징 맵(FM'') 및 시프트 특징 맵(FM''')을 생성할 수 있다. 스케일 특징 맵(FM'') 및 시프트 특징 맵(FM''')의 사이즈는 중간 특징 맵(1014)의 사이즈와 동일할 수 있다. 디코더 모듈(1000)은 중간 특징 맵(1014)에 스케일 특징 맵(FM'')과 요소 별 곱을 수행한 후 시프트 특징 맵(FM''')과 요소 별 합을 수행함으로써 결합된 특징 맵(1015)을 생성할 수 있다. The decoder module 1000 may receive the feature map (FM) of the hidden layer of the first neural network converted from the controllable conversion module. The intermediate feature map 1014 and the feature map FM' can be combined by element-wise operations. The decoder module 1000 may generate a scale feature map (FM'') and a shift feature map (FM''') by converting the size of the feature map (FM') through at least one layer. The size of the scale feature map (FM'') and the shift feature map (FM''') may be the same as the size of the intermediate feature map (1014). The decoder module 1000 performs an element-wise multiplication with the scale feature map (FM'') on the intermediate feature map (1014) and then performs an element-wise sum with the shift feature map (FM''') to produce a combined feature map (1015). ) can be created.

디코더 모듈(1000)은 적어도 하나의 레이어를 통해, 결합된 특징 맵(1015)으로부터 출력 특징 맵(1016)을 생성할 수 있다. 예를 들어, 디코더 모듈(1000)은 컨볼루셔날 레이어를 통해 결합된 특징 맵(1015)으로부터 출력 특징 맵(1016)을 생성할 수 있다.The decoder module 1000 may generate an output feature map 1016 from the combined feature map 1015 through at least one layer. For example, the decoder module 1000 may generate the output feature map 1016 from the feature maps 1015 combined through a convolutional layer.

도 11은 일 실시 예에 따른 컨트롤 가능한 변환 모듈(1110), 인코더 모듈(1120), 및 디코더 모듈(1130)을 나타낸 도면이다.FIG. 11 is a diagram illustrating a controllable conversion module 1110, an encoder module 1120, and a decoder module 1130 according to an embodiment.

컨트롤 가능한 변환 모듈(1110)은 제1 뉴럴 네트워크의 히든 레이어의 특징 맵(FM)을 수신할 수 있다. 컨볼루셔널 레이어들 및 액티베이션 레이어들을 통해 특징 맵(FM)으로부터 제1 특징 맵(1112)이 생성될 수 있다.The controllable transformation module 1110 may receive the feature map (FM) of the hidden layer of the first neural network. The first feature map 1112 may be generated from the feature map (FM) through convolutional layers and activation layers.

컨트롤 가능한 변환 모듈(1110)은 컨트롤 파라미터(σ)를 수신하고 제1 가중치 벡터(1113) 및 제2 가중치 벡터(1114)를 생성할 수 있다. 컨트롤 가능한 변환 모듈(1110)은 제1 특징 맵(1112)과 제1 가중치 벡터(1113)의 요소 별 곱으로부터 제2 특징 맵(1115)을 생성하고, 제1 특징 맵(1112)과 제2 가중치 벡터(1114)의 요소 별 곱으로부터 제3 특징 맵(1116)을 생성할 수 있다. 제1 특징 맵(1112)은 제1, 2 가중치 벡터들(1113, 1114)에 의해 변환될 수 있으며, 변환되는 정도는 컨트롤 파라미터(σ)에 의해 조절될 수 있다.The controllable transformation module 1110 may receive the control parameter σ and generate a first weight vector 1113 and a second weight vector 1114. The controllable transformation module 1110 generates a second feature map 1115 from the element-wise product of the first feature map 1112 and the first weight vector 1113, and the first feature map 1112 and the second weight vector The third feature map 1116 can be generated from the element-by-element product of the vector 1114. The first feature map 1112 can be transformed by the first and second weight vectors 1113 and 1114, and the degree of transformation can be adjusted by the control parameter (σ).

컨트롤 가능한 변환 모듈(1110)은 제2 특징 맵(1115)과 제3 특징 맵(1116)을 연쇄하여 출력 특징 맵(FM')을 생성할 수 있다. 출력 특징 맵(FM')은 컨트롤 가능한 변환 모듈(1110)에 의해 변환된 제1 뉴럴 네트워크의 히든 레이어의 특징 맵일 수 있다.The controllable conversion module 1110 may generate an output feature map (FM') by concatenating the second feature map 1115 and the third feature map 1116. The output feature map (FM') may be a feature map of the hidden layer of the first neural network converted by the controllable transformation module 1110.

인코더 모듈(1120)은 입력 특징 맵(1121)으로서 앞선 인코더 모듈의 출력 특징 맵을 수신할 수 있다. 인코더 모듈(1120)은 컨볼루셔널 레이어, 액티베이션 레이어, 및 배치 정규화를 통해 입력 특징 맵(1121)으로부터 중간 특징 맵(1122)을 생성할 수 있다.The encoder module 1120 may receive the output feature map of the previous encoder module as the input feature map 1121. The encoder module 1120 may generate an intermediate feature map 1122 from the input feature map 1121 through a convolutional layer, an activation layer, and batch normalization.

인코더 모듈(1120)은 제1 뉴럴 네트워크에서 출력된 특징 벡터(FV)를 수신할 수 있다. 인코더 모듈(1120)은 컨볼루셔널 레이어들 및 액티베이션 레이어들을 통해, 중간 특징 맵(1122)의 채널 수에 맞게 특징 벡터(FV)의 사이즈를 변환할 수 있다.The encoder module 1120 may receive a feature vector (FV) output from the first neural network. The encoder module 1120 may convert the size of the feature vector (FV) according to the number of channels of the intermediate feature map 1122 through convolutional layers and activation layers.

인코더 모듈(1120)은 중간 특징 맵(1122)과 사이즈가 변환된 특징 벡터(FV')의 요소 별 곱으로부터 결합된 특징 맵(1123)을 생성할 수 있다. 결합된 특징 맵(1123)은 입력 특징 맵(1121)으로부터 생성된 중간 특징 맵(1122)과 특징 벡터(FV)로부터 생성된 특징 벡터(FV')가 결합된 결과이므로, 결합된 특징 맵(1123)은 입력 특징 맵(1121)과 특징 벡터(FV)가 결합된 결과이다.The encoder module 1120 may generate a combined feature map 1123 from the element-wise product of the intermediate feature map 1122 and the size-converted feature vector (FV'). Since the combined feature map 1123 is the result of combining the intermediate feature map 1122 generated from the input feature map 1121 and the feature vector (FV') generated from the feature vector (FV), the combined feature map 1123 ) is the result of combining the input feature map 1121 and the feature vector (FV).

인코더 모듈(1120)은 결합된 특징 맵(1123)을 다운샘플링하여 출력 특징 맵(1124)을 생성할 수 있다. 다운샘플링에는 컨볼루셔널 레이어 및 액티베이션 레이어가 이용될 수 있다.The encoder module 1120 may downsample the combined feature map 1123 to generate the output feature map 1124. A convolutional layer and an activation layer can be used for downsampling.

디코더 모듈(1130)은 입력 특징 맵(1131)으로서 앞선 디코더 모듈의 출력 특징 맵을 수신할 수 있다. 디코더 모듈(1130)은 트랜스포즈 컨볼루셔널 레이어 및 액티베이션 레이어를 통해 입력 특징 맵(1131)을 업샘플링할 수 있다.The decoder module 1130 may receive the output feature map of the previous decoder module as the input feature map 1131. The decoder module 1130 may upsample the input feature map 1131 through a transpose convolutional layer and an activation layer.

디코더 모듈(1130)은 업샘플링된 입력 특징 맵(1132)과 인코더 모듈(1120)의 특징 맵을 연쇄하여 중간 특징 맵(1133)을 생성할 수 있다. 이때, 인코더 모듈(1120)의 특징 맵으로서 중간 특징 맵(1122)이 이용될 수 있다.The decoder module 1130 may generate an intermediate feature map 1133 by concatenating the upsampled input feature map 1132 and the feature map of the encoder module 1120. At this time, the intermediate feature map 1122 may be used as a feature map of the encoder module 1120.

디코더 모듈(1130)은 컨트롤 가능한 변환 모듈(1110)로부터 변환된 제1 뉴럴 네트워크의 히든 레이어의 특징 맵(FM')을 수신할 수 있다. 디코더 모듈(1130)은 컨볼루셔널 레이어들 및 액티베이션 레이어들을 통해, 특징 맵(FM')으로부터 중간 특징 맵(1133)과 사이즈가 동일한 특징 맵들(FM'', FM''')을 생성할 수 있다. The decoder module 1130 may receive the feature map (FM') of the hidden layer of the first neural network converted from the controllable conversion module 1110. The decoder module 1130 can generate feature maps (FM'', FM''') of the same size as the intermediate feature map 1133 from the feature map (FM') through convolutional layers and activation layers. there is.

디코더 모듈(1130)은 중간 특징 맵(1122)에 특징 맵(FM'')과 요소 별 곱을 수행한 후 특징 맵(FM''')과 요소 별 합을 수행함으로써 결합된 특징 맵(1134)을 생성할 수 있다. 결합된 특징 맵(1134)은 입력 특징 맵(1131)으로부터 생성된 중간 특징 맵(1133)과 특징 맵(FM')으로부터 생성된 특징 맵들(FM'', FM''')이 결합된 결과이므로, 결합된 특징 맵(1134)은 입력 특징 맵(1131)과 특징 맵(FM')이 결합된 결과이다.The decoder module 1130 performs an element-wise multiplication with the feature map (FM'') on the intermediate feature map (1122) and then performs an element-wise sum with the feature map (FM''') to produce the combined feature map (1134). can be created. Since the combined feature map 1134 is the result of combining the intermediate feature map 1133 generated from the input feature map 1131 and the feature maps (FM'', FM''') generated from the feature map (FM'), , The combined feature map 1134 is the result of combining the input feature map 1131 and the feature map (FM').

디코더 모듈(1130)은 컨볼루셔널 레이어, 배치 정규화, 및 액티베이션 레이어를 통해, 결합된 특징 맵(1134)으로부터 출력 특징 맵(1135)을 생성할 수 있다.The decoder module 1130 may generate an output feature map 1135 from the combined feature map 1134 through a convolutional layer, batch normalization, and activation layer.

도 11에 도시된 실시 예에서, 액티베이션 레이어에서 이용되는 활성화함수는 GELU(Gaussian Error Liner Unit) 또는 ReLU(Rectified Linear Unit)일 수 있으나, 이에 제한되지 않는다.In the embodiment shown in FIG. 11, the activation function used in the activation layer may be GELU (Gaussian Error Liner Unit) or ReLU (Rectified Linear Unit), but is not limited thereto.

도 12는 일 실시 예에 따른 입력 이미지에서 출력 이미지까지 특징 맵들의 흐름을 나타낸 도면이다.Figure 12 is a diagram showing the flow of feature maps from an input image to an output image according to an embodiment.

제1 뉴럴 네트워크(1210)에서는 입력 이미지(A)로부터 제1 특징 맵(1211), 제2 특징 맵(1212), 제3 특징 맵(1213), 제4 특징 맵(1214), 및 제5 특징 맵(1215)이 연쇄적으로 생성될 수 있다. 이때, 제1 특징 맵(1211), 제2 특징 맵(1212), 제3 특징 맵(1213), 제4 특징 맵(1214), 및 제5 특징 맵(1215)은 각각의 히든 레이어에서 풀링이 수행되기 전의 특징 맵일 수 있다. 최종적으로 제1 뉴럴 네트워크(1210)는 풀리 커넥티드 레이어를 통해 특징 벡터(1216)를 출력할 수 있다.In the first neural network 1210, a first feature map 1211, a second feature map 1212, a third feature map 1213, a fourth feature map 1214, and a fifth feature are generated from the input image A. Maps 1215 may be created sequentially. At this time, the first feature map 1211, the second feature map 1212, the third feature map 1213, the fourth feature map 1214, and the fifth feature map 1215 are pooled in each hidden layer. It may be a feature map before execution. Finally, the first neural network 1210 can output the feature vector 1216 through the fully connected layer.

제1 뉴럴 네트워크(1210)는 화상 깊이 단서에 따라 특징 벡터를 분류하도록 학습될 수 있다. 그에 따라, 제1 뉴럴 네트워크(1210)의 제1 특징 맵(1211), 제2 특징 맵(1212), 제3 특징 맵(1213), 제4 특징 맵(1214), 제5 특징 맵(1215), 및 특징 벡터(1216)는 화상 깊이 단서와 연관된 정보를 포함할 수 있다.The first neural network 1210 may be trained to classify feature vectors according to image depth clues. Accordingly, the first feature map 1211, the second feature map 1212, the third feature map 1213, the fourth feature map 1214, and the fifth feature map 1215 of the first neural network 1210 , and feature vector 1216 may include information associated with image depth clues.

제2 뉴럴 네트워크(1220)에서는 입력 이미지(A)로부터 특징 맵들(1224, 1225, 1226)이 연쇄적으로 생성될 수 있다. 특징 맵들(1224, 1225, 1226)은 각각 제1 인코더 모듈(1221), 제2 인코더 모듈(1222), 및 제3 인코더 모듈(1223)에 의해 다운샘플링 될 수 있다. 최종적으로 제2 뉴럴 네트워크(1220)는 입력 이미지(A)가 다운샘플링 된 특징 맵(1227)을 출력할 수 있다.In the second neural network 1220, feature maps 1224, 1225, and 1226 may be sequentially generated from the input image A. The feature maps 1224, 1225, and 1226 may be downsampled by the first encoder module 1221, the second encoder module 1222, and the third encoder module 1223, respectively. Finally, the second neural network 1220 may output a feature map 1227 in which the input image A is downsampled.

제1 인코더 모듈(1221), 제2 인코더 모듈(1222), 및 제3 인코더 모듈(1223)은 다운샘플링을 수행하는 과정에서 특징 맵들(1224, 1225, 1226)과 제1 뉴럴 네트워크(1210)의 특징 벡터(1216)를 결합할 수 있다. 특징 맵들(1224, 1225, 1226)이 특징 벡터(1216)와 결합됨에 따라, 최종적으로 생성된 특징 맵(1227)에는 화상 깊이 단서에 관한 정보가 부가될 수 있다.The first encoder module 1221, the second encoder module 1222, and the third encoder module 1223 encode the feature maps 1224, 1225, and 1226 and the first neural network 1210 in the process of performing downsampling. Feature vectors 1216 can be combined. As the feature maps 1224, 1225, and 1226 are combined with the feature vector 1216, information about image depth clues may be added to the finally generated feature map 1227.

제1 컨트롤 가능한 변환 모듈(1241), 제2 컨트롤 가능한 변환 모듈(1242), 제3 컨트롤 가능한 변환 모듈(1243), 및 제4 컨트롤 가능한 변환 모듈(1244)은 각각 제1 뉴럴 네트워크(1210)의 제4 특징 맵(1214), 제3 특징 맵(1213), 제2 특징 맵(1212), 및 제1 특징 맵(1211)을 변환할 수 있다.The first controllable transformation module 1241, the second controllable transformation module 1242, the third controllable transformation module 1243, and the fourth controllable transformation module 1244 are each connected to the first neural network 1210. The fourth feature map 1214, the third feature map 1213, the second feature map 1212, and the first feature map 1211 can be converted.

제3 뉴럴 네트워크(1230)에서는 제2 뉴럴 네트워크(1220)에서 출력된 특징 맵(1227)으로부터 특징 맵들(1235, 1236, 1237, 1238)이 연쇄적으로 생성될 수 있다. 특징 맵(1235)은 요소 별 연산 모듈에 의해 제2 뉴럴 네트워크(1220)에서 출력된 특징 맵(1227)과 제1 컨트롤 가능한 변환 모듈(1241)에서 변환된 제4 특징 맵(1214)이 요소 별 연산됨으로써 생성될 수 있다.In the third neural network 1230, feature maps 1235, 1236, 1237, and 1238 may be sequentially generated from the feature map 1227 output from the second neural network 1220. The feature map 1235 is composed of the feature map 1227 output from the second neural network 1220 by the operation module for each element and the fourth feature map 1214 converted from the first controllable transformation module 1241 for each element. It can be created by calculating.

특징 맵들(1235, 1236, 1237)은 각각 제1 디코더 모듈(1232), 제2 디코더 모듈(1233), 및 제3 인코더 모듈(1234)에 의해 업샘플링 될 수 있다. 최종적으로 제3 뉴럴 네트워크(1230)는 업샘플링 된 특징 맵(1238)으로부터 출력 이미지(A')를 생성할 수 있다.The feature maps 1235, 1236, and 1237 may be upsampled by the first decoder module 1232, the second decoder module 1233, and the third encoder module 1234, respectively. Finally, the third neural network 1230 may generate an output image (A') from the upsampled feature map 1238.

제1 디코더 모듈(1232)은 특징 맵(1235)과 제2 컨트롤 가능한 변환 모듈(1242)에서 변환된 제3 특징 맵(1213)을 결합할 수 있다. 제2 디코더 모듈(1233)은 특징 맵(1236)과 제3 컨트롤 가능한 변환 모듈(1243)에서 변환된 제2 특징 맵(1212)을 결합할 수 있다. 제3 디코더 모듈(1234)은 특징 맵(1237)과 제4 컨트롤 가능한 변환 모듈(1244)에서 변환된 제1 특징 맵(1211)을 결합할 수 있다. 특징 맵들(1235, 1236, 1237)이 제1 특징 맵(1211), 제2 특징 맵(1212), 및 제3 특징 맵(1213)과 결합됨에 따라, 최종적으로 생성된 출력 이미지(A')에는 화상 깊이 단서에 관한 정보가 부가될 수 있다.The first decoder module 1232 may combine the feature map 1235 and the third feature map 1213 converted in the second controllable transformation module 1242. The second decoder module 1233 may combine the feature map 1236 and the second feature map 1212 converted in the third controllable transformation module 1243. The third decoder module 1234 may combine the feature map 1237 and the first feature map 1211 converted in the fourth controllable transformation module 1244. As the feature maps 1235, 1236, and 1237 are combined with the first feature map 1211, the second feature map 1212, and the third feature map 1213, the finally generated output image A' has Information regarding burn depth clues may be added.

도 8을 참조한 설명에서, 컨트롤 파라미터(σ)를 증가시킴에 따라 제1 뉴럴 네트워크의 히든 레이어의 특징 맵은 주기적으로 변환되었다. 마찬가지로, 컨트롤 파라미터(σ)를 증가시킴에 따라 제1 특징 맵(1211), 제2 특징 맵(1212), 및 제3 특징 맵(1213)은 주기적으로 변환될 수 있다. 그에 따라, 컨트롤 파라미터(σ)를 증가시킴에 따라 출력 이미지(A')가 갖는 화상 깊이 단서의 레벨이 주기적으로 변화될 수 있다.In the description referring to FIG. 8, the feature map of the hidden layer of the first neural network was periodically transformed as the control parameter (σ) was increased. Likewise, the first feature map 1211, the second feature map 1212, and the third feature map 1213 may be periodically transformed as the control parameter σ increases. Accordingly, as the control parameter σ increases, the level of the image depth clue of the output image A′ may change periodically.

도 12는 제1 뉴럴 네트워크(1210), 제2 뉴럴 네트워크(1220), 및 제3 뉴럴 네트워크(1230)의 구조가 간략하게 도시된 것으로, 제1 뉴럴 네트워크(1210), 제2 뉴럴 네트워크(1220), 및 제3 뉴럴 네트워크(1230)의 구조는 도 12에 도시된 것에 한정되지 않는다.FIG. 12 briefly shows the structures of the first neural network 1210, the second neural network 1220, and the third neural network 1230. The first neural network 1210 and the second neural network 1220 are shown in FIG. ), and the structure of the third neural network 1230 is not limited to that shown in FIG. 12.

도 13은 일 실시 예에 따른 컨트롤 파라미터에 의해 조절된 양의 화상 깊이 단서를 포함하는 출력 이미지들을 나타낸 도면이다.FIG. 13 is a diagram illustrating output images including positive image depth clues adjusted by control parameters according to an embodiment.

0.0 및 1.0의 컨트롤 파라미터(σ)에 대하여 준 지도 학습된 딥 뉴럴 네트워크를 이용하여 뷰어가 인식하는 깊이가 강화된 이미지들을 생성하였다. 추론에서 컨트롤 파라미터(σ)는 0.0, 0.25, 0.5, 0.75, 및 1.0으로 입력되었으며, 도 13에는 각 경우들에 생성된 출력 이미지들이 도시되어 있다.Images with enhanced depth perceived by the viewer were created using a deep neural network semi-supervised learning for control parameters (σ) of 0.0 and 1.0. In the inference, the control parameter (σ) was input as 0.0, 0.25, 0.5, 0.75, and 1.0, and Figure 13 shows the output images generated in each case.

컨트롤 파라미터(σ)를 변화시킴에 따라, 서로 다른 레벨의 화상 깊이 단서를 갖는 출력 이미지들이 생성되었다. 컨트롤 파라미터(σ)가 0.0인 경우 입력 이미지와 동일한 이미지가 출력되었고, 컨트롤 파라미터(σ)가 1.0인 경우 입력 이미지와 화상 입력 단서의 레벨 차이가 가장 큰 이미지가 출력되었다. 컨트롤 파라미터(σ)를 0.25, 0.5, 0.75, 1.0으로 순차적으로 증가시킴에 따라, 화상 입력 단서의 레벨이 순차적으로 증가된 이미지들이 출력되었다.By varying the control parameter (σ), output images with different levels of image depth clues were generated. When the control parameter (σ) was 0.0, the same image as the input image was output, and when the control parameter (σ) was 1.0, the image with the largest level difference between the input image and the image input clue was output. As the control parameter (σ) was sequentially increased to 0.25, 0.5, 0.75, and 1.0, images with sequentially increased levels of image input clues were output.

도 14는 일 실시 예에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 방법을 나타낸 도면이다.Figure 14 is a diagram illustrating a method for generating an image with enhanced depth perceived by a viewer, according to an embodiment.

단계 S1410에서, 프로세서(210)는 입력 이미지를 수신할 수 있다. 입력 이미지는 사용자에 의해 수신되거나, 메모리(220)로부터 로딩될 수 있다.In step S1410, the processor 210 may receive an input image. Input images may be received by a user or loaded from memory 220.

단계 S1420에서, 프로세서(210)는 출력 이미지에 포함되는 적어도 하나의 화상 깊이 단서의 레벨을 조절하기 위한 컨트롤 파라미터를 수신할 수 있다. 뷰어는 원하는 레벨의 화상 입력 단서를 갖는 출력 이미지를 생성하기 위하여, 컨트롤 파라미터를 선택하여 입력할 수 있다. 프로세서(210)는 사용자에 의해 선택된 컨트롤 파라미터를 수신할 수 있다.In step S1420, the processor 210 may receive a control parameter for adjusting the level of at least one image depth cue included in the output image. The viewer can select and input control parameters to create an output image with a desired level of image input cue. Processor 210 may receive control parameters selected by the user.

단계 S1430에서, 프로세서(210)는 딥 뉴럴 네트워크를 이용하여 입력 이미지로부터 컨트롤 파라미터에 의해 조절된 레벨의 적어도 하나의 화상 깊이 단서를 포함하는 출력 이미지를 생성할 수 있다. 컨트롤 파라미터의 값에 따라 서로 다른 레벨의 화상 깊이 단서를 포함하는 출력 이미지가 생성될 수 있다. 예를 들어, 컨트롤 파라미터의 값이 작은 경우 출력 이미지에 포함되는 화상 깊이 단서의 레벨이 상대적으로 낮을 수 있고, 컨트롤 파라미터의 값이 큰 경우 출력 이미지에 포함되는 화상 깊이 단서의 레벨이 상대적으로 높을 수 있다.In step S1430, the processor 210 may generate an output image including at least one image depth cue at a level adjusted by a control parameter from an input image using a deep neural network. Depending on the values of the control parameters, output images containing different levels of image depth clues can be generated. For example, if the value of the control parameter is small, the level of image depth clues included in the output image may be relatively low, and if the value of the control parameter is large, the level of image depth clues included in the output image may be relatively high. there is.

출력 이미지에는 서로 다른 종류의 화상 깊이 단서들이 포함될 수 있고, 단일 변수인 컨트롤 파라미터를 조절함으로써 출력 이미지에 포함된 서로 다른 종류의 화상 깊이 단서들의 레벨이 조정될 수 있다. 예를 들어, 출력 이미지에는 대비 및 흐림이 포함될 수 있고, 컨트롤 파라미터를 0에서 1로 변화시킴에 따라 출력 이미지에 포함된 대비의 레벨 및 흐림의 레벨이 모두 증가될 수 있다.The output image may contain different types of image depth clues, and the level of different types of image depth clues included in the output image may be adjusted by adjusting a single variable, a control parameter. For example, the output image may include contrast and blur, and by changing the control parameter from 0 to 1, both the level of contrast and the level of blur included in the output image may increase.

도 15는 일 실시 예에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 방법을 나타낸 도면이다.Figure 15 is a diagram illustrating a method for generating an image with enhanced depth perceived by a viewer, according to an embodiment.

컨트롤 파라미터가 고정된 값인 경우, 프로세서(210)는 정해진 레벨의 적어도 하나의 화상 깊이 단서를 포함하는 출력 이미지를 생성할 수 있다. 이 경우, 앞선 실시 예들에서 컨트롤 파라미터가 상수로 고정될 수 있다. 이 경우, 프로세서(210)는 단계 S1510에서 입력 이미지를 수신하고, 단계 S1520에서 딥 뉴럴 네트워크를 이용하여 미리 결정된 레벨의 적어도 하나의 화상 깊이 단서를 포함하는 출력 이미지를 생성할 수 있다.When the control parameter is a fixed value, processor 210 may generate an output image that includes at least one image depth cue at a determined level. In this case, the control parameter may be fixed to a constant in the previous embodiments. In this case, the processor 210 may receive an input image in step S1510 and generate an output image including at least one image depth clue at a predetermined level using a deep neural network in step S1520.

이상의 실시 예들에 의해, 화상 깊이 단서가 부가된 출력 이미지를 생성함에 따라 뷰어가 인식하는 깊이가 강화될 수 있다. 또한, 컨트롤 파라미터를 통해 출력 이미지에 포함되는 화상 깊이 단서의 레벨을 조절함에 따라, 뷰어가 인식하는 깊이가 변화될 수 있다. 또한, 깊이 맵 등과 같이 깊이를 직접적으로 추론하지 않고 화상 깊이 단서를 포함하는 이미지를 출력하도록 구성된 딥 뉴럴 네트워크를 이용함에 따라, 깊이 맵의 처리 과정에서 발생하는 오차로 인해 출력 이미지에 잘못된 깊이 정보가 부가되는 것이 방지될 수 있다.Through the above embodiments, the depth perceived by the viewer can be enhanced by generating an output image to which image depth clues are added. Additionally, by adjusting the level of image depth clues included in the output image through control parameters, the depth perceived by the viewer may change. In addition, as a deep neural network is used to output images containing image depth clues without directly inferring depth, such as a depth map, errors occurring in the depth map processing process may cause incorrect depth information to appear in the output image. Addition can be prevented.

이상의 실시 예들에 따른 전자 장치(200)는 컨트롤 파라미터에 의해 레벨이 조절된 화상 깊이 단서가 부가된 출력 이미지를 생성할 수 있다. 이상의 실시 예들을 토대로, 전자 장치(200)는 컨트롤 파라미터에 의해 조절된 정도로 변경된 출력 이미지를 생성하도록 구성될 수 있다. 도 4의 제3 학습 이미지(T3)가 제1 학습 이미지(T1)가 변경된 이미지이고, 도 5의 제2 진실 이미지(T6)가 학습 이미지(T4)가 변경된 이미지이면, 도 4 및 도 5에 따른 실시 예들에 기초하여 학습된 전자 장치(200)의 딥 뉴럴 네트워크는 입력 이미지를 컨트롤 파라미터에 의해 조절된 정도로 변경하여 출력 이미지를 생성할 수 있다.The electronic device 200 according to the above embodiments may generate an output image to which an image depth cue whose level is adjusted by a control parameter is added. Based on the above embodiments, the electronic device 200 may be configured to generate an output image that is changed to a degree adjusted by a control parameter. If the third learning image T3 in FIG. 4 is an image in which the first learning image T1 has been changed, and the second truth image T6 in FIG. 5 is an image in which the learning image T4 has been changed, in FIGS. 4 and 5 The deep neural network of the electronic device 200 learned based on the following embodiments may generate an output image by changing the input image to a degree adjusted by a control parameter.

일 실시 예에 따른 변경된 이미지를 생성하기 위한 전자 장치(200)는, 하나 이상의 인스트럭션들을 저장하도록 구성된 메모리(220) 및 하나 이상의 프로세서(210)를 포함하며, 하나 이상의 프로세서(210)는 하나 이상의 인스트럭션들을 실행함으로써, 입력 이미지를 수신하고, 입력 이미지로부터 출력 이미지가 변경된 정도를 조절하기 위한 컨트롤 파라미터를 수신하고, 딥 뉴럴 네트워크를 이용하여 입력 이미지로부터 컨트롤 파라미터에 의해 조절된 정도로 변경된 출력 이미지를 생성하고, 딥 뉴럴 네트워크는, 입력 이미지를 수신하고, 입력 이미지에 대한 특징 벡터를 출력하도록 구성된 제1 뉴럴 네트워크, 입력 이미지를 수신하고, 입력 이미지의 공간 해상도를 감소시킴으로써 특징 맵을 출력하도록 구성된 제2 뉴럴 네트워크, 제2 뉴럴 네트워크에서 출력된 특징 맵을 수신하고, 제2 뉴럴 네트워크에서 출력된 특징 맵의 공간 해상도를 증가시킴으로써 출력 이미지를 출력하도록 구성된 제3 뉴럴 네트워크, 및 컨트롤 파라미터에 기초하여 특징 맵을 변경하도록 구성된 적어도 하나의 컨트롤 가능한 변환 모듈을 포함하고, 제2 뉴럴 네트워크는, 제2 뉴럴 네트워크의 적어도 하나의 히든 레이어의 특징 맵과 제1 뉴럴 네트워크에서 출력된 특징 벡터가 결합되도록 구성되고, 적어도 하나의 컨트롤 가능한 변환 모듈은, 제1 뉴럴 네트워크의 히든 레이어의 특징 맵을 수신하고, 컨트롤 파라미터에 기초하여 수신한 제1 뉴럴 네트워크의 히든 레이어의 특징 맵을 변환하도록 구성되고, 제3 뉴럴 네트워크는, 제3 뉴럴 네트워크의 적어도 하나의 히든 레이어의 특징 맵과 변환된 제1 뉴럴 네트워크의 히든 레이어의 특징 맵이 결합되도록 구성될 수 있다. The electronic device 200 for generating a changed image according to an embodiment includes a memory 220 configured to store one or more instructions and one or more processors 210, where the one or more processors 210 store one or more instructions. By executing them, an input image is received, a control parameter for adjusting the degree to which the output image is changed from the input image is received, and a deep neural network is used to generate an output image that is changed from the input image to the degree adjusted by the control parameter. , a deep neural network comprising: a first neural network configured to receive an input image and output a feature vector for the input image, and a second neural network configured to receive an input image and output a feature map by reducing the spatial resolution of the input image. a network, a third neural network configured to receive the feature map output from the second neural network, and output an output image by increasing the spatial resolution of the feature map output from the second neural network, and generate the feature map based on the control parameter. and at least one controllable transformation module configured to change, wherein the second neural network is configured to combine a feature map of at least one hidden layer of the second neural network and a feature vector output from the first neural network, and at least One controllable transformation module is configured to receive the feature map of the hidden layer of the first neural network and transform the received feature map of the hidden layer of the first neural network based on the control parameter, and the third neural network is configured to receive the feature map of the hidden layer of the first neural network. , It may be configured to combine the feature map of at least one hidden layer of the third neural network and the feature map of the converted hidden layer of the first neural network.

예를 들어, 전자 장치(200)에 의한 이미지의 변경으로, 흑백 이미지에서 컬러 이미지로의 변경, 이미지 상의 오브젝트의 윤곽선의 두께 변경, 컬러 변경, 이미지 필터링 등이 수행될 수 있으나, 이에 제한되는 것은 아니며, 다양한 이미지의 변경이 전자 장치(200)에 의해 수행될 수 있다.For example, the image may be changed by the electronic device 200, such as changing from a black-and-white image to a color image, changing the thickness of the outline of an object in the image, changing the color, image filtering, etc., but is not limited thereto. No, various image changes can be performed by the electronic device 200.

일 실시예에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.A method for generating an image with enhanced depth perceived by a viewer according to an embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the medium may be those specifically designed and configured for the present invention, or may be known and usable by those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

또한, 개시된 실시예들에 따른 뷰어가 인식하는 깊이가 강화된 이미지를 생성하기 위한 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다.Additionally, a method for generating an image with enhanced depth perceived by a viewer according to the disclosed embodiments may be provided as included in a computer program product. Computer program products are commodities and can be traded between sellers and buyers.

컴퓨터 프로그램 제품은 S/W 프로그램, S/W 프로그램이 저장된 컴퓨터로 읽을 수 있는 저장 매체를 포함할 수 있다. 예를 들어, 컴퓨터 프로그램 제품은 전자 장치의 제조사 또는 전자 마켓(예, 구글 플레이 스토어, 앱 스토어)을 통해 전자적으로 배포되는 S/W 프로그램 형태의 상품(예, 다운로더블 앱)을 포함할 수 있다. 전자적 배포를 위하여, S/W 프로그램의 적어도 일부는 저장 매체에 저장되거나, 임시적으로 생성될 수 있다. 이 경우, 저장 매체는 제조사의 서버, 전자 마켓의 서버, 또는 SW 프로그램을 임시적으로 저장하는 중계 서버의 저장매체가 될 수 있다.A computer program product may include a S/W program and a computer-readable storage medium in which the S/W program is stored. For example, a computer program product may include a product in the form of a S/W program (e.g., a downloadable app) distributed electronically by the manufacturer of an electronic device or through an electronic marketplace (e.g., Google Play Store, App Store). there is. For electronic distribution, at least part of the S/W program may be stored in a storage medium or temporarily created. In this case, the storage medium may be a manufacturer's server, an electronic market server, or a relay server's storage medium that temporarily stores the SW program.

컴퓨터 프로그램 제품은, 서버 및 클라이언트 장치로 구성되는 시스템에서, 서버의 저장매체 또는 클라이언트 장치의 저장매체를 포함할 수 있다. 또는, 서버 또는 클라이언트 장치와 통신 연결되는 제3 장치(예, 스마트폰)가 존재하는 경우, 컴퓨터 프로그램 제품은 제3 장치의 저장매체를 포함할 수 있다. 또는, 컴퓨터 프로그램 제품은 서버로부터 클라이언트 장치 또는 제3 장치로 전송되거나, 제3 장치로부터 클라이언트 장치로 전송되는 S/W 프로그램 자체를 포함할 수 있다.A computer program product, in a system comprised of a server and a client device, may include a storage medium of a server or a storage medium of a client device. Alternatively, if there is a third device (e.g., a smartphone) in communication connection with the server or client device, the computer program product may include a storage medium of the third device. Alternatively, the computer program product may include the S/W program itself, which is transmitted from a server to a client device or a third device, or from a third device to a client device.

이 경우, 서버, 클라이언트 장치 및 제3 장치 중 하나가 컴퓨터 프로그램 제품을 실행하여 개시된 실시예들에 따른 방법을 수행할 수 있다. 또는, 서버, 클라이언트 장치 및 제3 장치 중 둘 이상이 컴퓨터 프로그램 제품을 실행하여 개시된 실시예들에 따른 방법을 분산하여 실시할 수 있다.In this case, one of the server, the client device, and the third device may execute the computer program product to perform the method according to the disclosed embodiments. Alternatively, two or more of a server, a client device, and a third device may execute the computer program product and perform the methods according to the disclosed embodiments in a distributed manner.

예를 들면, 서버(예로, 클라우드 서버 또는 인공 지능 서버 등)가 서버에 저장된 컴퓨터 프로그램 제품을 실행하여, 서버와 통신 연결된 클라이언트 장치가 개시된 실시예들에 따른 방법을 수행하도록 제어할 수 있다.For example, a server (eg, a cloud server or an artificial intelligence server, etc.) may execute a computer program product stored on the server and control a client device connected to the server to perform the method according to the disclosed embodiments.

이상에서 실시예들에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속한다.Although the embodiments have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements made by those skilled in the art using the basic concept of the present invention defined in the following claims are also included in the scope of the present invention. belongs to

Claims

In the electronic device 200 for generating an image with enhanced depth perceived by a viewer,
a memory 220 configured to store one or more instructions; and
Comprising one or more processors 210, wherein the one or more processors 210 execute the one or more instructions,
Receive an input image,
Receive control parameters for adjusting the level of at least one image depth cue included in the output image,
Using a deep neural network to generate an output image from the input image including at least one image depth cue at a level adjusted by the control parameter,
The deep neural network is,
A first neural network configured to receive the input image and output a feature vector for the input image,
a second neural network configured to receive the input image and output a feature map by reducing the spatial resolution of the input image,
a third neural network configured to receive a feature map output from the second neural network and output the output image by increasing the spatial resolution of the feature map output from the second neural network, and
At least one controllable transformation module configured to transform the feature map based on the control parameters,
The second neural network is,
Configured to combine the feature map of the hidden layer of the second neural network and the feature vector output from the first neural network,
The at least one controllable conversion module is,
Receive a feature map of a hidden layer of the first neural network,
Configured to convert and output a feature map of a hidden layer of the received first neural network based on the control parameters,
The third neural network is,
An electronic device configured to combine the feature map of the hidden layer of the third neural network and the feature map of the converted hidden layer of the first neural network.

According to paragraph 1,
The at least one controllable conversion module is,
Obtaining a first feature map from the feature map of the hidden layer of the received first neural network,
Generating a first weight vector and a second weight vector from the control parameters,
Generating a second feature map from an element-wise product of the first feature map and the first weight vector,
Generating a third feature map from an element-wise product of the first feature map and the second weight vector,
An electronic device configured to output a feature map of a hidden layer of the converted first neural network by concatenating the second feature map and the third feature map.

According to paragraph 2,
The electronic device wherein the norms of the first feature map and the feature map of the converted hidden layer of the first neural network are the same.

According to paragraph 2,
The electronic device wherein the sum of squares of norms of each of the first weight vector and the second weight vector is constant.

According to paragraph 1,
the second neural network comprises a sequence of encoder modules to reduce the spatial resolution of the input image,
At least one of the encoder modules,
An electronic device, configured to generate an output feature map from an input feature map based on downsampling.

According to clause 5,
At least one of the encoder modules,
Receiving an output feature map of a previous encoder module as an input feature map of the at least one encoder module,
Generating an intermediate feature map of the at least one encoder module from an input feature map of the at least one encoder module through at least one layer,
Generating a combined feature map of the at least one encoder module by combining an intermediate feature map of the at least one encoder module and a feature vector output from the received first neural network,
An electronic device configured to generate an output feature map of the at least one encoder module by downsampling the combined feature map of the at least one encoder module.

According to clause 6,
At least one of the encoder modules,
Converting the size of the feature vector output from the received first neural network based on the number of channels of the intermediate feature map of the at least one encoder module,
An electronic device configured to generate a combined feature map of the at least one encoder module from an element-by-element product of an intermediate feature map of the at least one encoder module and a feature vector output from the converted first neural network.

According to clause 5,
The third neural network further comprises a sequence of decoder modules for generating the output image by increasing the spatial resolution of the feature map,
At least one of the decoder modules,
An electronic device, configured to generate an output feature map from an input feature map based on upsampling.

According to clause 8,
At least one of the decoder modules,
Receiving an output feature map of a previous decoder module as an input feature map of the at least one decoder module,
Upsample the input feature map of at least one decoder module,
Generating an intermediate feature map of at least one decoder module from the upsampled input feature map of at least one decoder module through at least one layer,
Receive a feature map of a hidden layer of the transformed first neural network from the at least one controllable transformation module,
Generating a combined feature map of the at least one decoder module by combining an intermediate feature map of the at least one decoder module and a feature map of a hidden layer of the converted first neural network,
An electronic device configured to generate an output feature map of the at least one decoder module from a combined feature map of the at least one decoder module through at least one layer.

According to clause 9,
At least one of the decoder modules,
Converting the size of the feature map of the hidden layer of the converted first neural network to be equal to the size of the intermediate feature map of the at least one decoder module,
Combined features of the at least one decoder module from the result of element wise operation of the intermediate feature map of the at least one decoder module and the feature map of the hidden layer of the size-converted first neural network An electronic device configured to generate a map.

According to clause 8,
The deep neural network includes a first controllable transformation module and a second controllable transformation module,
The first controllable transformation module is configured to receive and transform a feature map of a first hidden layer of the first neural network,
The second controllable transformation module is configured to receive and transform the feature map of the second hidden layer of the first neural network,
The sequence of decoder modules includes a first decoder module and a second decoder module,
The first decoder module is configured to receive a feature map of a first hidden layer of the transformed first neural network from the first controllable transformation module,
The electronic device, wherein the second decoder module is configured to receive a feature map of a second hidden layer of the transformed first neural network from the second controllable transformation module.

According to clause 11,
The first decoder module precedes the second decoder module,
The second hidden layer is preceding the first hidden layer.

According to clause 9,
At least one of the decoder modules,
Receiving a feature map of the at least one encoder module,
An electronic device configured to generate an intermediate feature map of at least one decoder module by concatenating the upsampled input feature map of at least one decoder module and the received feature map of at least one encoder module.

According to paragraph 1,
An electronic device, wherein the hidden layer of the first neural network is a convolutional layer.

According to paragraph 1,
An electronic device in which output images with different levels of image depth clues are output for control parameters having different values.

According to paragraph 1,
The electronic device wherein the level of image depth clues in the output image periodically changes as the value of the control parameter increases.

According to paragraph 1,
wherein the at least one image depth cue includes a first image depth cue and a second image depth cue,
The control parameter is a single variable,
The electronic device wherein the level of the first image depth cue and the level of the second image depth cue of the output image change as the value of the control parameter changes.

According to paragraph 1,
A first feature vector output by the first neural network upon receiving a first input image, a second feature vector output by the first neural network after receiving a second input image identical to the first input image, and 1 In the third feature vector output by the neural network after receiving the third input image, which is an image to which an image depth clue is added to the first input image,
The electronic device wherein the size of the difference between the first feature vector and the second feature vector is smaller than the size of the difference between the first feature vector and the third feature vector.

According to paragraph 1,
The electronic device of claim 1, wherein the at least one image depth cue includes at least one of blur, contrast, and sharpness.

In a method for generating an image with enhanced depth perceived by the viewer,
Receiving an input image (S1410);
Receiving control parameters for adjusting the level of at least one image depth cue included in the output image (S1420); and
Generating an output image including at least one image depth cue at a level adjusted by the control parameter from the input image using a deep neural network (S1430),
The deep neural network is,
A first neural network configured to receive the input image and output a feature vector for the input image,
a second neural network configured to receive the input image and output a feature map by reducing the spatial resolution of the input image,
a third neural network configured to receive a feature map output from the second neural network and output the output image by increasing the spatial resolution of the feature map output from the second neural network, and
At least one controllable transformation module configured to transform the feature map based on the control parameters,
The second neural network is,
Configured to combine the feature map of the hidden layer of the second neural network and the feature vector output from the first neural network,
The at least one controllable conversion module is,
Receive a feature map of a hidden layer of the first neural network,
Configured to convert and output a feature map of a hidden layer of the received first neural network based on the control parameters,
The third neural network is,
A method configured to combine the feature map of the hidden layer of the third neural network and the feature map of the converted hidden layer of the first neural network.